Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Kirill Tkhai
On 07.02.2018 08:02, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
>> On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
>>> So it is OK to kvmalloc() something and pass it to either kfree() or
>>> kvfree(), and it had better be OK to kvmalloc() something and pass it
>>> to kvfree().
>>>
>>> Is it OK to kmalloc() something and pass it to kvfree()?
>>
>> Yes, it absolutely is.
>>
>> void kvfree(const void *addr)
>> {
>> if (is_vmalloc_addr(addr))
>> vfree(addr);
>> else
>> kfree(addr);
>> }
>>
>>> If so, is it really useful to have two different names here, that is,
>>> both kfree_rcu() and kvfree_rcu()?
>>
>> I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
>> vfree_rcu() available in the API for the symmetry of calling kmalloc()
>> / kfree_rcu().
>>
>> Personally, I would like us to rename kvfree() to just free(), and have
>> malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
>> fight yet.
> 
> But why not just have the existing kfree_rcu() API cover both kmalloc()
> and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
> anyone arguing that the RCU API has too few members.  ;-)

People, far from RCU internals, consider kfree_rcu() like an extension
of kfree(). And it's not clear it's need to dive into kfree_rcu() comments,
when someone is looking a primitive to free vmalloc'ed memory.

Also, construction like

obj = kvmalloc();
kfree_rcu(obj);

makes me think it's legitimately to use plain kfree() as pair bracket to 
kvmalloc().

So the significant change of kfree_rcu() behavior will complicate stable 
backporters
life, because they will need to keep in mind such differences between different
kernel versions.

It seems if we are going to use the single primitive for both kmalloc()
and kvmalloc() memory, it has to have another name. But I don't see problems
with having both kfree_rcu() and kvfree_rcu().

Kirill


Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Josh Triplett
On Tue, Feb 06, 2018 at 09:02:00PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> > > So it is OK to kvmalloc() something and pass it to either kfree() or
> > > kvfree(), and it had better be OK to kvmalloc() something and pass it
> > > to kvfree().
> > > 
> > > Is it OK to kmalloc() something and pass it to kvfree()?
> > 
> > Yes, it absolutely is.
> > 
> > void kvfree(const void *addr)
> > {
> > if (is_vmalloc_addr(addr))
> > vfree(addr);
> > else
> > kfree(addr);
> > }
> > 
> > > If so, is it really useful to have two different names here, that is,
> > > both kfree_rcu() and kvfree_rcu()?
> > 
> > I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
> > vfree_rcu() available in the API for the symmetry of calling kmalloc()
> > / kfree_rcu().
> > 
> > Personally, I would like us to rename kvfree() to just free(), and have
> > malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
> > fight yet.
> 
> But why not just have the existing kfree_rcu() API cover both kmalloc()
> and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
> anyone arguing that the RCU API has too few members.  ;-)

I don't have any problem with having just `kvfree_rcu`, but having just
`kfree_rcu` seems confusingly asymmetric.

(Also, count me in favor of having just one "free" function, too.)


Re: [PATCH v3 1/2] drm/virtio: Add window server support

2018-02-06 Thread Tomeu Vizoso

On 02/07/2018 02:09 AM, Michael S. Tsirkin wrote:

On Tue, Feb 06, 2018 at 03:23:02PM +0100, Gerd Hoffmann wrote:

Creation of shareable buffer by guest
-

1. Client requests virtio driver to create a buffer suitable for sharing
with host (DRM_VIRTGPU_RESOURCE_CREATE)


client or guest proxy?


4. QEMU maps that buffer to the guest's address space
(KVM_SET_USER_MEMORY_REGION), passes the guest PFN to the virtio driver


That part is problematic.  The host can't simply allocate something in
the physical address space, because most physical address space
management is done by the guest.  All pci bars are mapped by the guest
firmware for example (or by the guest OS in case of hotplug).


4. QEMU pops data+buffers from the virtqueue, looks up shmem FD for each
resource, sends data + FDs to the compositor with SCM_RIGHTS


If you squint hard, this sounds a bit like a use-case for vhost-user-gpu, does 
it not?


Can you extend on what makes you think that?

As an aside, crosvm runs the virtio-gpu device in a separate, jailed
process, among other virtual devices.

https://chromium.googlesource.com/chromiumos/platform/crosvm/

Regards,

Tomeu


Re: [PATCH v11 00/10] Application Data Integrity feature introduced by SPARC M7

2018-02-06 Thread Eric W. Biederman
Khalid Aziz  writes:

> On 02/01/2018 07:29 PM, ebied...@xmission.com wrote:
>> Khalid Aziz  writes:
>>
>>> V11 changes:
>>> This series is same as v10 and was simply rebased on 4.15 kernel. Can
>>> mm maintainers please review patches 2, 7, 8 and 9 which are arch
>>> independent, and include/linux/mm.h and mm/ksm.c changes in patch 10
>>> and ack these if everything looks good?
>>
>> I am a bit puzzled how this differs from the pkey's that other
>> architectures are implementing to achieve a similar result.
>>
>> I am a bit mystified why you don't store the tag in a vma
>> instead of inventing a new way to store data on page out.
>
> Hello Eric,
>
> As Steven pointed out, sparc sets tags per cacheline unlike pkey. This results
> in much finer granularity for tags that pkey and hence requires larger tag
> storage than what we can do in a vma.

*Nod*   I am a bit mystified where you keep the information in memory.
I would think the tags would need to be stored per cacheline or per
tlb entry, in some kind of cache that could overflow.  So I would be
surprised if swapping is the only time this information needs stored
in memory.  Which makes me wonder if you have the proper data
structures.

I would think an array per vma or something in the page tables would
tend to make sense.

But perhaps I am missing something.

>> Can you please use force_sig_fault to send these signals instead
>> of force_sig_info.  Emperically I have found that it is very
>> error prone to generate siginfo's by hand, especially on code
>> paths where several different si_codes may apply.  So it helps
>> to go through a helper function to ensure the fiddly bits are
>> all correct.  AKA the unused bits all need to be set to zero before
>> struct siginfo is copied to userspace.
>>
>
> What you say makes sense. I followed the same code as other fault handlers for
> sparc. I could change just the fault handlers for ADI related faults. Would it
> make more sense to change all the fault handlers in a separate patch and keep
> the code in arch/sparc/kernel/traps_64.c consistent? Dave M, do you have a
> preference?

It is my intention post -rc1 to start sending out patches to get the
rest of not just sparc but all of the architectures using the new
helpers.  I have the code I just ran out of time befor the merge
window opened to ensure everything had a good thorough review.

So if you can handle the your new changes I expect I will handle the
rest.

Eric


Re: [PATCH 8/8] thermal/drivers/cpu_cooling: Add the combo cpu cooling device

2018-02-06 Thread Viresh Kumar
On 06-02-18, 11:48, Daniel Lezcano wrote:
> On 06/02/2018 05:28, Viresh Kumar wrote:

> > Surely we can do one thing at a time if that's the way we choose to do it.
> 
> Easy to say :)
> 
> The current code is to introduce the feature without impacting the DT
> bindings in order to keep focused on the thermal mitigation aspect.
> 
> There are still a lot of improvements to do after that. You are
> basically asking me to implement the copy-on-write before the memory
> management is complete.

Perhaps I wasn't clear. What I was trying to say is that we can do "one thing at
a time" if we choose to create a "combo device" (the way you proposed). I am not
trying to force you to solve all the problems in one go :)

> Can you give an example? Or your understanding is incorrect or I missed
> the point.

So I tried to write it down and realized I was assuming that different
cooling-maps can be provided for different cooling strategies
(cpufreq/cpuidle) and obviously that's not the case as its per device.
Not sure if it would be correct to explore the possibility of doing
that.

-- 
viresh


Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang

On 02/07/2018 12:34 PM, Michael S. Tsirkin wrote:

On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
  drivers/virtio/virtio_balloon.c | 255 +++-
  include/uapi/linux/virtio_balloon.h |   7 +
  mm/page_poison.c|   6 +
  3 files changed, 232 insertions(+), 36 deletions(-)

Resend Change:
- Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.



OK. I have made them separate patches in v27. Thanks a lot for reviewing 
so many versions, I learned a lot from the comments and discussion.


Best,
Wei



[PATCH v27 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 245 ++--
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 213 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..39ecce3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(&vb->stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, &vb->update_balloon_size_work);
-   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(&vb->stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  &vb->update_balloon_size_work);
+   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, &vb->cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(&vb->stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  &vb->report_free_page_work);
+   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+   vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];

[PATCH v27 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-02-06 Thread Wei Wang
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value in use.

Signed-off-by: Wei Wang 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 39ecce3..76b4853 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -685,6 +685,7 @@ static struct file_system_type balloon_fs = {
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -728,6 +729,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
goto out_del_vqs;
}
INIT_WORK(&vb->report_free_page_work, report_free_page_func);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(&poison_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, &poison_val);
+   }
}
 
vb->nb.notifier_call = virtballoon_oom_notify;
@@ -846,6 +852,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -855,6 +864,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 0c654db..3f97067 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4



[PATCH v27 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-02-06 Thread Wei Wang
In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.

Signed-off-by: Wei Wang 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
---
 mm/page_poison.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index e83fd44..c08d02a 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -30,6 +30,11 @@ bool page_poisoning_enabled(void)
debug_pagealloc_enabled()));
 }
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 static void poison_page(struct page *page)
 {
void *addr = kmap_atomic(page);
@@ -37,6 +42,7 @@ static void poison_page(struct page *page)
memset(addr, PAGE_POISON, PAGE_SIZE);
kunmap_atomic(addr);
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_pages(struct page *page, int n)
 {
-- 
2.7.4



[PATCH v27 0/4] Virtio-balloon: support free page reporting

2018-02-06 Thread Wei Wang
This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization of the 1st round memory transfer -
the hypervisor can skip the transfer of guest free pages in the 1st round.
It is not concerned that the memory pages are used after they are given
to the hypervisor as a hint of the free pages, because they will be
tracked by the hypervisor and transferred in the next round if they are
used and written.

* Tests
- Migration time improvement
Result:
Live migration time is reduced to 14% with this optimization.
Details:
Local live migration of 8GB idle guest, the legacy live migration takes
~1817ms. With this optimization, it takes ~254ms, which reduces the time
to 14%.
- Workload tests
Results:
Running this feature has no impact on the linux compilation workload
running inside the guest.
Details:
Set up a Ping-Pong local live migration, where the guest ceaselessy
migrates between the source and destination. Linux compilation,
i.e. make bzImage -j4, is performed during the Ping-Pong migration. The
legacy case takes 5min14s to finish the compilation. With this
optimization patched, it takes 5min12s.

ChangeLog:
v26->v27:
- add a new patch to expose page_poisoning_enabled to kernel modules
- virtio-balloon: set poison_val to 0x, instead of 0xaa
v25->v26: virtio-balloon changes only
- remove kicking free page vq since the host now polls the vq after
  initiating the reporting
- report_free_page_func: detach all the used buffers after sending
  the stop cmd id. This avoids leaving the detaching burden (i.e.
  overhead) to the next cmd id. Detaching here isn't considered
  overhead since the stop cmd id has been sent, and host has already
  moved formard.
v24->v25:
- mm: change walk_free_mem_block to return 0 (instead of true) on
  completing the report, and return a non-zero value from the
  callabck, which stops the reporting.
- virtio-balloon:
- use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
- avoid __virtio_clear_bit when bailing out;
- a new method to avoid reporting the some cmd id to host twice
- destroy_workqueue can cancel free page work when the feature is
  negotiated;
- fail probe when the free page vq size is less than 2.
v23->v24:
- change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
  VIRTIO_BALLOON_F_FREE_PAGE_HINT
- kick when vq->num_free < half full, instead of "= half full"
- replace BUG_ON with bailing out
- check vb->balloon_wq in probe(), if null, bail out
- add a new feature bit for page poisoning
- solve the corner case that one cmd id being sent to host twice
v22->v23:
- change to kick the device when the vq is half-way full;
- open-code batch_free_page_sg into add_one_sg;
- change cmd_id from "uint32_t" to "__virtio32";
- reserver one entry in the vq for the driver to send cmd_id, instead
  of busywaiting for an available entry;
- add "stop_update" check before queue_work for prudence purpose for
  now, will have a separate patch to discuss this flag check later;
- init_vqs: change to put some variables on stack to have simpler
  implementation;
- add destroy_workqueue(vb->balloon_wq);

v21->v22:
- add_one_sg: some code and comment re-arrangement
- send_cmd_id: handle a cornercase

For previous ChangeLog, please reference
https://lwn.net/Articles/743660/

Wei Wang (4):
  mm: support reporting free page blocks
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 255 +++-
 include/linux/mm.h  |   6 +
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_alloc.c |  96 ++
 mm/page_poison.c|   6 +
 5 files changed, 334 insertions(+), 36 deletions(-)

-- 
2.7.4



[PATCH v27 1/4] mm: support reporting free page blocks

2018-02-06 Thread Wei Wang
This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Michal Hocko 
---
 include/linux/mm.h |  6 
 mm/page_alloc.c| 96 ++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 173d248..1c77d88 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * 
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 
+extern int walk_free_mem_block(void *opaque,
+  int min_order,
+  int (*report_pfn_range)(void *opaque,
+  unsigned long pfn,
+  unsigned long num));
+
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
  * into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c7dd9c8..995ff01 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4906,6 +4906,102 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
show_swap_cache_info();
 }
 
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+static int walk_free_page_list(void *opaque,
+  struct zone *zone,
+  int order,
+  enum migratetype mt,
+  int (*report_pfn_range)(void *,
+  unsigned long,
+  unsigned long))
+{
+   struct page *page;
+   struct list_head *list;
+   unsigned long pfn, flags;
+   int ret = 0;
+
+   spin_lock_irqsave(&zone->lock, flags);
+   list = &zone->free_area[order].free_list[mt];
+   list_for_each_entry(page, list, lru) {
+   pfn = page_to_pfn(page);
+   ret = report_pfn_range(opaque, pfn, 1 << order);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(&zone->lock, flags);
+
+   return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns a non-zero value, stop iterating the list of free
+ * page blocks. Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+int walk_free_mem_block(void *opaque,
+   int min_order,
+   int (*report_pfn_range)(void *opaque,
+

Re: staging: ion: ION allocation fall back order depends on heap linkage order

2018-02-06 Thread Alexey Skidanov


> Yup, you've hit upon a key problem. Having fallbacks be stable
> was always a problem and the recommendation these days is to
> not rely on them. You can specify a heap at a time and fallback
> manually if you want that behavior.
> 
> If you have a proposal to make fallbacks work reliably without
> overly complicating the ABI I'm happy to review it.
> 
> Thanks,
> Laura
> 
I think it's possible to "automate" the "manual fallback" behavior. But
the real issues is using heap id to specify the particular heap object.

Current API (allocation IOCTL) requires to specify the particular heap
object by using heap id. From the other hand, the user space doesn't
control the heaps creation order and heap id assignment. So it may be
tricky, especially when more than one object of the same heap type is
created automatically.

Thanks,
Alexey




[PATCH 6/6] s390: introduce execute-trampolines for branches

2018-02-06 Thread Martin Schwidefsky
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

basr%r14,%r1

is replaced with a PC relative call to a new thunk

brasl   %r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
exrl0,0f
j   .
0:  br  %r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig |  28 +
 arch/s390/Makefile|  12 
 arch/s390/include/asm/lowcore.h   |   6 +-
 arch/s390/include/asm/nospec-branch.h |  18 ++
 arch/s390/kernel/Makefile |   4 ++
 arch/s390/kernel/entry.S  | 113 ++
 arch/s390/kernel/module.c |  62 ---
 arch/s390/kernel/nospec-branch.c  | 100 ++
 arch/s390/kernel/setup.c  |   4 ++
 arch/s390/kernel/smp.c|   1 +
 arch/s390/kernel/vmlinux.lds.S|  14 +
 drivers/s390/char/Makefile|   2 +
 12 files changed, 329 insertions(+), 35 deletions(-)
 create mode 100644 arch/s390/include/asm/nospec-branch.h
 create mode 100644 arch/s390/kernel/nospec-branch.c

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index d514e25..d4a65bf 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -557,6 +557,34 @@ config KERNEL_NOBP
 
  If unsure, say N.
 
+config EXPOLINE
+   def_bool n
+   prompt "Avoid speculative indirect branches in the kernel"
+   help
+ Compile the kernel with the expoline compiler options to guard
+ against kernel-to-user data leaks by avoiding speculative indirect
+ branches.
+ Requires a compiler with -mindirect-branch=thunk support for full
+ protection. The kernel may run slower.
+
+ If unsure, say N.
+
+choice
+   prompt "Expoline default"
+   depends on EXPOLINE
+   default EXPOLINE_FULL
+
+config EXPOLINE_OFF
+   bool "spectre_v2=off"
+
+config EXPOLINE_MEDIUM
+   bool "spectre_v2=auto"
+
+config EXPOLINE_FULL
+   bool "spectre_v2=on"
+
+endchoice
+
 endmenu
 
 menu "Memory setup"
diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index fd691c4..2f925ef 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -78,6 +78,18 @@ ifeq ($(call cc-option-yn,-mwarn-dynamicstack),y)
 cflags-$(CONFIG_WARN_DYNAMIC_STACK) += -mwarn-dynamicstack
 endif
 
+ifdef CONFIG_EXPOLINE
+  ifeq ($(call cc-option-yn,$(CC_FLAGS_MARCH) -mindirect-branch=thunk),y)
+CC_FLAGS_EXPOLINE := -mindirect-branch=thunk
+CC_FLAGS_EXPOLINE += -mfunction-return=thunk
+CC_FLAGS_EXPOLINE += -mindirect-branch-table
+export CC_FLAGS_EXPOLINE
+cflags-y += $(CC_FLAGS_EXPOLINE)
+  else
+$(warning "Your gcc lacks the -mindirect-branch= option")
+  endif
+endif
+
 ifdef CONFIG_FUNCTION_TRACER
 # make use of hotpatch feature if the compiler supports it
 cc_hotpatch:= -mhotpatch=0,3
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index c63986a..5bc 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -136,7 +136,11 @@ struct lowcore {
__u64   vdso_per_cpu_data;  /* 0x03b8 */
__u64   machine_flags;  /* 0x03c0 */
__u64   gmap;   /* 0x03c8 */
-   __u8pad_0x03d0[0x0e00-0x03d0];  /* 0x03d0 */
+   __u8pad_0x03d0[0x0400-0x03d0];  /* 0x03d0 */
+
+   /* br %r1 trampoline */
+   __u16   br_r1_trampoline;   /* 0x0400 */
+   __u8pad_0x0402[0x0e00-0x0402];  /* 0x0402 */
 
/*
 * 0xe00 contains the address of the IPL Parameter Information
diff --git a/arch/s390/include/asm/nospec-branch.h 
b/arch/s390/include/asm/nospec-branch.h
new file mode 100644
index 000..7df48e5
--- /dev/null
+++ b/arch/s390/include/asm/nospec-branch.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_EXPOLINE_H
+#define _ASM_S390_EXPOLINE_H
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+extern int nospec_call_disable;
+extern int nospec_return_disable;
+
+void nospec_init_branches(void);
+void nospec_call_revert(s32 *start, s32 *end);
+void nospec_return_revert(s32 *start, s32 *end);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_S390_EXPOLINE_H */
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefi

[PATCH 1/6] s390: scrub registers on kernel entry and KVM exit

2018-02-06 Thread Martin Schwidefsky
Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

Reviewed-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/kernel/entry.S | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 6cd444d..5d87eda 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -248,6 +248,12 @@ ENTRY(sie64a)
 sie_exit:
lg  %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg%r0,%r13,0(%r14)# save guest gprs 0-13
+   xgr %r0,%r0 # clear guest registers to
+   xgr %r1,%r1 # prevent speculative use
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
lmg %r6,%r14,__SF_GPRS(%r15)# restore kernel registers
lg  %r2,__SF_EMPTY+16(%r15) # return exit reason code
br  %r14
@@ -282,6 +288,8 @@ ENTRY(system_call)
 .Lsysc_vtime:
UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled register to prevent speculative use
+   xgr %r0,%r0
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC
@@ -561,6 +569,15 @@ ENTRY(pgm_check_handler)
 4: lgr %r13,%r11
la  %r11,STACK_FRAME_OVERHEAD(%r15)
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC
@@ -626,6 +643,16 @@ ENTRY(io_int_handler)
lmg %r8,%r9,__LC_IO_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID
@@ -839,6 +866,16 @@ ENTRY(ext_int_handler)
lmg %r8,%r9,__LC_EXT_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
lghi%r1,__LC_EXT_PARAMS2
@@ -1046,6 +1083,16 @@ ENTRY(mcck_int_handler)
 .Lmcck_skip:
lghi%r14,__LC_GPREGS_SAVE_AREA+64
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),0(%r14)
stmg%r8,%r9,__PT_PSW(%r11)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
-- 
2.7.4



[PATCH 3/6] s390/alternative: use a copy of the facility bit mask

2018-02-06 Thread Martin Schwidefsky
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.

Reviewed-by: David Hildenbrand 
Reviewed-by: Cornelia Huck 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/facility.h | 18 ++
 arch/s390/include/asm/lowcore.h  |  3 ++-
 arch/s390/kernel/alternative.c   |  3 ++-
 arch/s390/kernel/early.c |  3 +++
 arch/s390/kernel/setup.c |  4 +++-
 arch/s390/kernel/smp.c   |  4 +++-
 6 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h
index fbe0c4b..99c8ce3 100644
--- a/arch/s390/include/asm/facility.h
+++ b/arch/s390/include/asm/facility.h
@@ -15,6 +15,24 @@
 
 #define MAX_FACILITY_BIT (sizeof(((struct lowcore *)0)->stfle_fac_list) * 8)
 
+static inline void __set_facility(unsigned long nr, void *facilities)
+{
+   unsigned char *ptr = (unsigned char *) facilities;
+
+   if (nr >= MAX_FACILITY_BIT)
+   return;
+   ptr[nr >> 3] |= 0x80 >> (nr & 7);
+}
+
+static inline void __clear_facility(unsigned long nr, void *facilities)
+{
+   unsigned char *ptr = (unsigned char *) facilities;
+
+   if (nr >= MAX_FACILITY_BIT)
+   return;
+   ptr[nr >> 3] &= ~(0x80 >> (nr & 7));
+}
+
 static inline int __test_facility(unsigned long nr, void *facilities)
 {
unsigned char *ptr;
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index ec6592e..c63986a 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -151,7 +151,8 @@ struct lowcore {
__u8pad_0x0e20[0x0f00-0x0e20];  /* 0x0e20 */
 
/* Extended facility list */
-   __u64   stfle_fac_list[32]; /* 0x0f00 */
+   __u64   stfle_fac_list[16]; /* 0x0f00 */
+   __u64   alt_stfle_fac_list[16]; /* 0x0f80 */
__u8pad_0x1000[0x11b0-0x1000];  /* 0x1000 */
 
/* Pointer to the machine check extended save area */
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 574e776..1abf4f3 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -75,7 +75,8 @@ static void __init_or_module __apply_alternatives(struct 
alt_instr *start,
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
 
-   if (!test_facility(a->facility))
+   if (!__test_facility(a->facility,
+S390_lowcore.alt_stfle_fac_list))
continue;
 
if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) {
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 497a920..510f218 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -193,6 +193,9 @@ static noinline __init void setup_facility_list(void)
 {
stfle(S390_lowcore.stfle_fac_list,
  ARRAY_SIZE(S390_lowcore.stfle_fac_list));
+   memcpy(S390_lowcore.alt_stfle_fac_list,
+  S390_lowcore.stfle_fac_list,
+  sizeof(S390_lowcore.alt_stfle_fac_list));
 }
 
 static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 793da97..bcd2a4a 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -340,7 +340,9 @@ static void __init setup_lowcore(void)
lc->preempt_count = S390_lowcore.preempt_count;
lc->stfl_fac_list = S390_lowcore.stfl_fac_list;
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
-  MAX_FACILITY_BIT/8);
+  sizeof(lc->stfle_fac_list));
+   memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+  sizeof(lc->alt_stfle_fac_list));
nmi_alloc_boot_cpu(lc);
vdso_alloc_boot_cpu(lc);
lc->sync_enter_timer = S390_lowcore.sync_enter_timer;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index a919b2f..fc28c95 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -266,7 +266,9 @@ static void pcpu_prepare_secondary(struct pcpu *pcpu, int 
cpu)
__ctl_store(lc->cregs_save_area, 0, 15);
save_access_regs((unsigned int *) lc->access_regs_save_area);
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
-  MAX_FACILITY_BIT/8);
+  sizeof(lc->stfle_fac_list));
+   memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+  sizeof(lc->alt_stfle_fac_list));
arch_spin_lock_setup(cpu);
 }
 
-- 
2.7.4



[PATCH 0/6] s390: improve speculative execution handling v3

2018-02-06 Thread Martin Schwidefsky
Version 3 of the speculative execution improvements for s390.

Changes to v2:

* Dropped the prctl to introduce the PR_ISOLATE_BP control and simply
  added two exported functions s390_isolate_bp and s390_isolate_bp_guest.
  There is currently no caller for these functions, for now an out-of-tree
  module can be used until an acceptable upstream solution for the user
  space interface is found.

* Added an optimized version for the the array_index_mask_nospec
  function based on subtract with borrow for the spectre v1 defense.

* Introduce "expoline", the s390 version of a retpoline. As s390 does
  not have a return instruction and the associate return stack we use
  an execute-type instruction on an indirect branch to get unpredicatable
  branches. This requires gcc support for -mindirect-branch=thunk /
  -mfunction-return=thunk.  To be able to disable expolines there is
  another gcc option -mindirect-branch-table to keep a list of PC relative
  locations of calls to the execute thunks. With spectre_v2=off the call
  will be replaced with the original indirect branch and a nop.

Martin Schwidefsky (6):
  s390: scrub registers on kernel entry and KVM exit
  s390: add optimized array_index_mask_nospec
  s390/alternative: use a copy of the facility bit mask
  s390: add options to change branch prediction behaviour for the kernel
  s390: run user space and KVM guests with modified branch prediction
  s390: introduce execute-trampolines for branches

 arch/s390/Kconfig |  45 ++
 arch/s390/Makefile|  12 ++
 arch/s390/include/asm/barrier.h   |  24 
 arch/s390/include/asm/facility.h  |  18 +++
 arch/s390/include/asm/lowcore.h   |   9 +-
 arch/s390/include/asm/nospec-branch.h |  18 +++
 arch/s390/include/asm/processor.h |   4 +
 arch/s390/include/asm/thread_info.h   |   4 +
 arch/s390/kernel/Makefile |   4 +
 arch/s390/kernel/alternative.c|  26 +++-
 arch/s390/kernel/early.c  |   5 +
 arch/s390/kernel/entry.S  | 249 ++
 arch/s390/kernel/ipl.c|   1 +
 arch/s390/kernel/module.c |  62 +++--
 arch/s390/kernel/nospec-branch.c  | 100 ++
 arch/s390/kernel/processor.c  |  18 +++
 arch/s390/kernel/setup.c  |   8 +-
 arch/s390/kernel/smp.c|   7 +-
 arch/s390/kernel/vmlinux.lds.S|  14 ++
 drivers/s390/char/Makefile|   2 +
 20 files changed, 591 insertions(+), 39 deletions(-)
 create mode 100644 arch/s390/include/asm/nospec-branch.h
 create mode 100644 arch/s390/kernel/nospec-branch.c

-- 
2.7.4



[PATCH 5/6] s390: run user space and KVM guests with modified branch prediction

2018-02-06 Thread Martin Schwidefsky
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.

To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/processor.h   |  3 +++
 arch/s390/include/asm/thread_info.h |  4 +++
 arch/s390/kernel/entry.S| 51 +
 arch/s390/kernel/processor.c| 18 +
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 5f37f9c..7f2953c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -378,6 +378,9 @@ extern void memcpy_absolute(void *, void *, size_t);
memcpy_absolute(&(dest), &__tmp, sizeof(__tmp));\
 } while (0)
 
+extern int s390_isolate_bp(void);
+extern int s390_isolate_bp_guest(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_S390_PROCESSOR_H */
diff --git a/arch/s390/include/asm/thread_info.h 
b/arch/s390/include/asm/thread_info.h
index 25d6ec3..83ba575 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -58,6 +58,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define TIF_GUARDED_STORAGE4   /* load guarded storage control block */
 #define TIF_PATCH_PENDING  5   /* pending live patching update */
 #define TIF_PGSTE  6   /* New mm's will use 4K page tables */
+#define TIF_ISOLATE_BP 8   /* Run process with isolated BP */
+#define TIF_ISOLATE_BP_GUEST   9   /* Run KVM guests with isolated BP */
 
 #define TIF_31BIT  16  /* 32bit process */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
@@ -78,6 +80,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define _TIF_UPROBE_BITUL(TIF_UPROBE)
 #define _TIF_GUARDED_STORAGE   _BITUL(TIF_GUARDED_STORAGE)
 #define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING)
+#define _TIF_ISOLATE_BP_BITUL(TIF_ISOLATE_BP)
+#define _TIF_ISOLATE_BP_GUEST  _BITUL(TIF_ISOLATE_BP_GUEST)
 
 #define _TIF_31BIT _BITUL(TIF_31BIT)
 #define _TIF_SINGLE_STEP   _BITUL(TIF_SINGLE_STEP)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index e6d7550..53145b5 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -107,6 +107,7 @@ _PIF_WORK   = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
aghi%r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE)
j   3f
 1: UPDATE_VTIME %r14,%r15,\timer
+   BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
 2: lg  %r15,__LC_ASYNC_STACK   # load async stack
 3: la  %r11,STACK_FRAME_OVERHEAD(%r15)
.endm
@@ -187,6 +188,40 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
.popsection
.endm
 
+   .macro BPENTER tif_ptr,tif_mask
+   .pushsection .altinstr_replacement, "ax"
+662:   .word   0xc004, 0x, 0x  # 6 byte nop
+   .word   0xc004, 0x, 0x  # 6 byte nop
+   .popsection
+664:   TSTMSK  \tif_ptr,\tif_mask
+   jz  . + 8
+   .long   0xb2e8d000
+   .pushsection .altinstructions, "a"
+   .long 664b - .
+   .long 662b - .
+   .word 82
+   .byte 12
+   .byte 12
+   .popsection
+   .endm
+
+   .macro BPEXIT tif_ptr,tif_mask
+   TSTMSK  \tif_ptr,\tif_mask
+   .pushsection .altinstr_replacement, "ax"
+662:   jnz . + 8
+   .long   0xb2e8d000
+   .popsection
+664:   jz  . + 8
+   .long   0xb2e8c000
+   .pushsection .altinstructions, "a"
+   .long 664b - .
+   .long 662b - .
+   .word 82
+   .byte 8
+   .byte 8
+   .popsection
+   .endm
+
.section .kprobes.text, "ax"
 .Ldummy:
/*
@@ -240,9 +275,11 @@ ENTRY(__switch_to)
  */
 ENTRY(sie64a)
stmg%r6,%r14,__SF_GPRS(%r15)# save kernel registers
+   lg  %r12,__LC_CURRENT
stg %r2,__SF_EMPTY(%r15)# save control block pointer
stg %r3,__SF_EMPTY+8(%r15)  # save guest register save area
xc  __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # reason code = 0
+   mvc __SF_EMPTY+24(8,%r15),__TI_flags(%r12) # copy thread flags
TSTMSK  __LC_CPU_FLAGS,_CIF_FPU # load guest fp/vx registers ?
jno .Lsie_load_guest_gprs
brasl   %r14,load_fpu_regs  # load guest fp/vx regs
@@ -259,11 +296,12 @@ ENTRY(sie64a)
jnz .Lsie_skip
TSTMSK  __LC_CPU_FLAGS,_CIF_FPU
jo  .Lsie_skip  # exit if fp/vx regs changed
-   BPON
+   BPEXIT  __SF_EMPTY+24(%r15),(_

[PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled

2018-02-06 Thread Huang, Ying
From: Huang Ying 

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur
in random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 7fc08889ae0d sp 7ffc73a7fc40 
error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x7fc08889ae0d _int_malloc (libc.so.6)
 #1  0x7fc08889c2f3 malloc (libc.so.6)
 #2  0x560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x560e6005e75c n/a (urxvt)
 #4  0x560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez 
(urxvt)
 #5  0x560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x560e6005cb55 ev_run (urxvt)
 #9  0x560e6003b9b9 main (urxvt)
 #10 0x7fc08883af4a __libc_start_main (libc.so.6)
 #11 0x560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is
bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
out").

The root cause is as follow.

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages
instead to improve the performance.  But zswap (frontswap) will treat
THP as normal page, so only the head page is saved.  After swapping
in, tail pages will not be restored to its original contents, so cause
the memory corruption in the applications.

This is fixed via splitting THP before writing the page to swap device
if frontswap is enabled.  To deal with the situation where frontswap
is enabled at runtime, whether the page is THP is checked before using
frontswap during swapping out too.

Reported-and-tested-by: Sergey Senozhatsky 
Signed-off-by: "Huang, Ying" 
Cc: Konrad Rzeszutek Wilk 
Cc: Dan Streetman 
Cc: Seth Jennings 
Cc: Minchan Kim 
Cc: Tetsuo Handa 
Cc: Shaohua Li 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Shakeel Butt 
Cc: sta...@vger.kernel.org # 4.14
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped 
out")

Changelog:

v2:

- Move frontswap check into swapfile.c to avoid to make vmscan.c
  depends on frontswap.
---
 mm/page_io.c  | 2 +-
 mm/swapfile.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index b41cf9644585..6dca817ae7a0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct 
writeback_control *wbc)
unlock_page(page);
goto out;
}
-   if (frontswap_store(page) == 0) {
+   if (!PageTransHuge(page) && frontswap_store(page) == 0) {
set_page_writeback(page);
unlock_page(page);
end_page_writeback(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 006047b16814..0b7c7883ce64 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t 
swp_entries[])
 
/* Only single cluster request supported */
WARN_ON_ONCE(n_goal > 1 && cluster);
+   /* Frontswap doesn't support THP */
+   if (frontswap_enabled() && cluster)
+   goto noswap;
 
avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
if (avail_pgs <= 0)
-- 
2.15.1



[PATCH 2/6] s390: add optimized array_index_mask_nospec

2018-02-06 Thread Martin Schwidefsky
Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/barrier.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 1043260..f9eddbc 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -49,6 +49,30 @@ do { 
\
 #define __smp_mb__before_atomic()  barrier()
 #define __smp_mb__after_atomic()   barrier()
 
+/**
+ * array_index_mask_nospec - generate a mask for array_idx() that is
+ * ~0UL when the bounds check succeeds and 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ */
+#define array_index_mask_nospec array_index_mask_nospec
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+   unsigned long size)
+{
+   unsigned long mask;
+
+   if (__builtin_constant_p(size) && size > 0) {
+   asm("   clgr%2,%1\n"
+   "   slbgr   %0,%0\n"
+   :"=d" (mask) : "d" (size-1), "d" (index) :"cc");
+   return mask;
+   }
+   asm("   clgr%1,%2\n"
+   "   slbgr   %0,%0\n"
+   :"=d" (mask) : "d" (size), "d" (index) :"cc");
+   return ~mask;
+}
+
 #include 
 
 #endif /* __ASM_BARRIER_H */
-- 
2.7.4



[PATCH 4/6] s390: add options to change branch prediction behaviour for the kernel

2018-02-06 Thread Martin Schwidefsky
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Cornelia Huck 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig | 17 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kernel/alternative.c| 23 +++
 arch/s390/kernel/early.c  |  2 ++
 arch/s390/kernel/entry.S  | 48 +++
 arch/s390/kernel/ipl.c|  1 +
 arch/s390/kernel/smp.c|  2 ++
 7 files changed, 94 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 0105ce2..d514e25 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -540,6 +540,23 @@ config ARCH_RANDOM
 
  If unsure, say Y.
 
+config KERNEL_NOBP
+   def_bool n
+   prompt "Enable modified branch prediction for the kernel by default"
+   help
+ If this option is selected the kernel will switch to a modified
+ branch prediction mode if the firmware interface is available.
+ The modified branch prediction mode improves the behaviour in
+ regard to speculative execution.
+
+ With the option enabled the kernel parameter "nobp=0" or "nospec"
+ can be used to run the kernel in the normal branch prediction mode.
+
+ With the option disabled the modified branch prediction mode is
+ enabled with the "nobp=1" kernel parameter.
+
+ If unsure, say N.
+
 endmenu
 
 menu "Memory setup"
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index bfbfad4..5f37f9c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -91,6 +91,7 @@ void cpu_detect_mhz_feature(void);
 extern const struct seq_operations cpuinfo_op;
 extern int sysctl_ieee_emulation_warnings;
 extern void execve_tail(void);
+extern void __bpon(void);
 
 /*
  * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 1abf4f3..2247613 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -15,6 +15,29 @@ static int __init disable_alternative_instructions(char *str)
 
 early_param("noaltinstr", disable_alternative_instructions);
 
+static int __init nobp_setup_early(char *str)
+{
+   bool enabled;
+   int rc;
+
+   rc = kstrtobool(str, &enabled);
+   if (rc)
+   return rc;
+   if (enabled && test_facility(82))
+   __set_facility(82, S390_lowcore.alt_stfle_fac_list);
+   else
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+   return 0;
+}
+early_param("nobp", nobp_setup_early);
+
+static int __init nospec_setup_early(char *str)
+{
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+   return 0;
+}
+early_param("nospec", nospec_setup_early);
+
 struct brcl_insn {
u16 opc;
s32 disp;
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 510f218..ac707a9 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -196,6 +196,8 @@ static noinline __init void setup_facility_list(void)
memcpy(S390_lowcore.alt_stfle_fac_list,
   S390_lowcore.stfle_fac_list,
   sizeof(S390_lowcore.alt_stfle_fac_list));
+   if (!IS_ENABLED(CONFIG_KERNEL_NOBP))
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
 }
 
 static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 5d87eda..e6d7550 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -159,6 +159,34 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
tm  off+\addr, \mask
.endm
 
+   .macro BPOFF
+   .pushsection .altinstr_replacement, "ax"
+660:   .long   0xb2e8c000
+   .popsection
+661:   .long   0x4700
+   .pushsection .altinstructions, "a"
+   .long 661b - .
+   .long 660b - .
+   .word 82
+   .byte 4
+   .byte 4
+   .popsection
+   .endm
+
+   .macro BPON
+   .pushsection .altinstr_replacement, "ax"
+662:   .long   0xb2e8d000
+   .popsection
+663:   .long   0x4700
+   .pushsection .altinstructions, "a"
+   .long 663b - .
+   .long 662b - .
+   .word 82
+   .byte 4
+   .byte 4
+   .popsection
+   .endm
+
.section .kprobes.text, "ax"
 .Ldummy:
/*
@@ -171,6 +199,11 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
 */
nop 0
 
+ENTRY(__bpon)
+   .globl __bpon
+   BPON
+   br  %r14
+
 /*
  * Scheduler res

linux/drivers/cpuidle: cpuidle_enter_state() issue

2018-02-06 Thread Li Wang
Hi Kernel-developers,

The flowing call trace was catch from kernel-v4.15, could anyone help
to analysis the cpuidle problem?
or, if you need any more detail info pls let me know.

Test Env:
IBM KVM Guest on ibm-p8-kvm-03
POWER8E (raw), altivec supported
9216 MB memory, 107 GB disk space

8<
[15002.722413] swapper/15: page allocation failure: order:0,
mode:0x1080020(GFP_ATOMIC), nodemask=(null)
[15002.853793] swapper/15 cpuset=/ mems_allowed=0
[15002.853932] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.15.0 #1
[15002.854019] Call Trace:
[15002.854129] [c0023ff77650] [c0940b50]
.dump_stack+0xac/0xfc (unreliable)
[15002.854285] [c0023ff776e0] [c026c678] .warn_alloc+0xe8/0x180
[15002.854376] [c0023ff777a0] [c026d50c]
.__alloc_pages_nodemask+0xd6c/0xf90
[15002.854490] [c0023ff77980] [c02e9cc0]
.alloc_pages_current+0x90/0x120
[15002.854624] [c0023ff77a10] [c07990cc]
.skb_page_frag_refill+0x8c/0x120
[15002.854746] [c0023ff77aa0] [d3a561a8]
.try_fill_recv+0x368/0x620 [virtio_net]
[15003.422855] [c0023ff77ba0] [d3a568ec]
.virtnet_poll+0x25c/0x380 [virtio_net]
[15003.423864] [c0023ff77c70] [c07c18d0] .net_rx_action+0x330/0x4a0
[15003.424024] [c0023ff77d90] [c0960d50] .__do_softirq+0x150/0x3a8
[15003.424197] [c0023ff77e90] [c00ff608] .irq_exit+0x198/0x1b0
[15003.424342] [c0023ff77f10] [c0015504] .__do_irq+0x94/0x1f0
[15003.424485] [c0023ff77f90] [c0026d5c] .call_do_irq+0x14/0x24
[15003.424627] [c0023bc63820] [c00156ec] .do_IRQ+0x8c/0x100
[15003.424776] [c0023bc638c0] [c0008b34]
hardware_interrupt_common+0x114/0x120
[15003.424963] --- interrupt: 501 at .snooze_loop+0xa4/0x1c0
LR = .snooze_loop+0x60/0x1c0
[15003.425164] [c0023bc63bb0] [c0023bc63c50]
0xc0023bc63c50 (unreliable)
[15003.425346] [c0023bc63c30] [c075104c]
.cpuidle_enter_state+0xac/0x390
[15003.425534] [c0023bc63ce0] [c0157adc] .call_cpuidle+0x3c/0x70
[15003.425669] [c0023bc63d50] [c0157e90] .do_idle+0x2a0/0x300
[15003.425815] [c0023bc63e20] [c01580ac]
.cpu_startup_entry+0x2c/0x40
[15003.425995] [c0023bc63ea0] [c0045790]
.start_secondary+0x4d0/0x520
[15003.426170] [c0023bc63f90] [c000aa70]
start_secondary_prolog+0x10/0x14
-8<---

Any response will be appreciated!

-- 
Regards,
Li Wang
Email: wangli.a...@gmail.com


Re: WARNING: kmalloc bug in tun_device_event

2018-02-06 Thread Jason Wang



On 2018年02月07日 06:58, syzbot wrote:

Hello,

syzbot hit the following crash on net-next commit
617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
Merge tag 'usercopy-v4.16-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux


So far this crash happened 5 times on net-next, upstream.
C reproducer is attached.
syzkaller reproducer is attached.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.

IMPORTANT: if you fix the bug, please add the following tag to the 
commit:

Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for 
details.

If you forward the report, please keep this part and the footer.

WARNING: CPU: 1 PID: 4134 at mm/slab_common.c:1012 
kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012

Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4134 Comm: syzkaller993072 Not tainted 4.15.0+ #221
Hardware name: Google Google Compute Engine/Google Compute Engine, 
BIOS Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x211/0x2d0 lib/bug.c:184
 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
RSP: 0018:8801ba7ceb20 EFLAGS: 00010246
RAX:  RBX:  RCX: 83b88bed
RDX:  RSI:  RDI: 00040008
RBP: 8801ba7ceb20 R08: 1100374f9cd7 R09: 
R10:  R11:  R12: 00040008
R13: dc00 R14: 014080c0 R15: 8801b5d52080
 __do_kmalloc mm/slab.c:3700 [inline]
 __kmalloc+0x25/0x760 mm/slab.c:3714
 kmalloc_array include/linux/slab.h:631 [inline]
 kcalloc include/linux/slab.h:642 [inline]
 __ptr_ring_init_queue_alloc include/linux/ptr_ring.h:469 [inline]
 ptr_ring_resize_multiple include/linux/ptr_ring.h:629 [inline]
 tun_queue_resize drivers/net/tun.c:3319 [inline]
 tun_device_event+0x471/0xec0 drivers/net/tun.c:3338
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
 call_netdevice_notifiers net/core/dev.c:1725 [inline]
 dev_change_tx_queue_len+0x117/0x220 net/core/dev.c:7065
 do_setlink+0xba7/0x3bb0 net/core/rtnetlink.c:2341
 rtnl_newlink+0xf1c/0x1a20 net/core/rtnetlink.c:2915
 rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4587
 netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2442
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4605
 netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
 netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
 sock_sendmsg_nosec net/socket.c:630 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:640
 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
 __sys_sendmsg+0xe5/0x210 net/socket.c:2080
 SYSC_sendmsg net/socket.c:2091 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2087
 entry_SYSCALL_64_fastpath+0x29/0xa0
RIP: 0033:0x4463c9
RSP: 002b:7ffe63916e68 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 004a7af2 RCX: 004463c9
RDX:  RSI: 20504000 RDI: 0004
RBP: 7ffe63916f08 R08:  R09: 004a7af2
R10:  R11: 0246 R12: 7ffe63916f08
R13: 00403890 R14:  R15: 
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is 
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug 
report.
Note: all commands must start from beginning of the line in the email 
body.


Looks like we need cap the maximum size that ptr_ring could allocate.

Will post a patch soon.

Thanks


RE: [Patch v13 0/4] This patchset is to remove PPCisms for QEIC

2018-02-06 Thread Qiang Zhao
Hi all,

Is there any comments on this patchset?

Best Regards
Qiang Zhao

-Original Message-
From: Zhao Qiang [mailto:qiang.z...@nxp.com] 
Sent: 2017年11月10日 11:31
To: t...@linutronix.de; marc.zyng...@arm.com; ja...@lakedaemon.net
Cc: linux-kernel@vger.kernel.org; Qiang Zhao 
Subject: [Patch v13 0/4] This patchset is to remove PPCisms for QEIC

QEIC is an interrupt controller for QE, was put under drivers/soc/fsl/qe, and 
now move to driver/irqchip.
And QEIC is supported more than just powerpc boards, so remove PPCisms.

changelog:
Changes for v8:
- use IRQCHIP_DECLARE() instead of subsys_initcall in qeic driver
- remove include/soc/fsl/qe/qe_ic.h
Changes for v9:
- rebase 
- fix the compile issue when apply the second patch, in fact, there was 
no compile issue 
  when apply all the patches of this patchset
Changes for v10:
- simplify codes, remove duplicated codes 
Changes for v11:
- rebase
Changes for v13:
- rewrite single-bit constants to BIT(x) to make the code more readable

Zhao Qiang (4):
  irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
Changes for v2:
- modify the subject and commit msg
Changes for v3:
- merge .h file to .c, rename it with irq-qeic.c
Changes for v4:
- modify comments
Changes for v5:
- disable rename detection
Changes for v6:
- rebase
Changes for v7:
- na

  irqchip/qeic: merge qeic init code from platforms to a common function
Changes for v2:
- modify subject and commit msg
- add check for qeic by type
Changes for v3:
- na
Changes for v4:
- na
Changes for v5:
- na
Changes for v6:
- rebase
Changes for v7:
- na
Changes for v8:
- use IRQCHIP_DECLARE() instead of subsys_initcall

  irqchip/qeic: merge qeic_of_init into qe_ic_init
Changes for v2:
- modify subject and commit msg
- return 0 and add put node when return in qe_ic_init
Changes for v3:
- na
Changes for v4:
- na
Changes for v5:
- na
Changes for v6:
- rebase
Changes for v7:
- na
Changes for v12:
- remove unused code

  irqchip/qeic: remove PPCisms for QEIC
Changes for v6:
- new added
Changes for v7:
- fix warning
Changes for v8:
- remove include/soc/fsl/qe/qe_ic.h

Zhao Qiang (4):
  irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
  irqchip/qeic: merge qeic init code from platforms to a common function
  irqchip/qeic: merge qeic_of_init into qe_ic_init
  irqchip/qeic: remove PPCisms for QEIC

 MAINTAINERS|   6 +
 arch/powerpc/platforms/83xx/km83xx.c   |   1 -
 arch/powerpc/platforms/83xx/misc.c |  16 -
 arch/powerpc/platforms/83xx/mpc832x_mds.c  |   1 -
 arch/powerpc/platforms/83xx/mpc832x_rdb.c  |   1 -
 arch/powerpc/platforms/83xx/mpc836x_mds.c  |   1 -
 arch/powerpc/platforms/83xx/mpc836x_rdk.c  |   1 -
 arch/powerpc/platforms/85xx/corenet_generic.c  |  10 -
 arch/powerpc/platforms/85xx/mpc85xx_mds.c  |  15 -
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |  17 -
 arch/powerpc/platforms/85xx/twr_p102x.c|  15 -
 drivers/irqchip/Makefile   |   1 +
 drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} | 423 +++--
 drivers/soc/fsl/qe/Makefile|   2 +-
 drivers/soc/fsl/qe/qe_ic.h | 103 -
 include/soc/fsl/qe/qe_ic.h | 139 ---
 16 files changed, 231 insertions(+), 521 deletions(-)  rename 
drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} (53%)  delete mode 100644 
drivers/soc/fsl/qe/qe_ic.h  delete mode 100644 include/soc/fsl/qe/qe_ic.h

--
2.14.1



Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-06 Thread Cong Wang
On Tue, Feb 6, 2018 at 6:27 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
> Merge tag 'usercopy-v4.16-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
>
> So far this crash happened 5 times on net-next, upstream.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+03218bcdba6aa7644...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> x_tables: ip_tables: osf match: only valid for protocol 6
> x_tables: ip_tables: osf match: only valid for protocol 6
> x_tables: ip_tables: osf match: only valid for protocol 6
> [ cut here ]
> proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370
> fs/proc/generic.c:329
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 4152 Comm: syzkaller851476 Not tainted 4.15.0+ #221
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
> RIP: 0010:proc_register+0x2a4/0x370 fs/proc/generic.c:329
> RSP: 0018:8801cbd6ee20 EFLAGS: 00010286
> RAX: dc08 RBX: 8801d2181038 RCX: 815a57ae
> RDX:  RSI: 1100397add74 RDI: 1100397add49
> RBP: 8801cbd6ee70 R08: 1100397add0b R09: 
> R10: 8801cbd6ecd8 R11:  R12: 8801b2bb1cc0
> R13: dc00 R14: 8801b0d8dbc8 R15: 8801b2bb1d81
>  proc_create_data+0xf8/0x180 fs/proc/generic.c:494
>  clusterip_config_init net/ipv4/netfilter/ipt_CLUSTERIP.c:250 [inline]

I think there is probably a race condition between clusterip_config_entry_put()
and clusterip_config_init(), after we release the spinlock, a new proc
with the same IP could be created therefore triggers this warning

I am not sure if it is enough to just move the proc_remove() under
spinlock...


diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3a84a60f6b39..1ff72b87a066 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -107,12 +107,6 @@ clusterip_config_entry_put(struct net *net,
struct clusterip_config *c)

local_bh_disable();
if (refcount_dec_and_lock(&c->entries, &cn->lock)) {
-   list_del_rcu(&c->list);
-   spin_unlock(&cn->lock);
-   local_bh_enable();
-
-   unregister_netdevice_notifier(&c->notifier);
-
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
 * so it's safe to remove the entry even if it's in use. */
@@ -120,6 +114,12 @@ clusterip_config_entry_put(struct net *net,
struct clusterip_config *c)
if (cn->procdir)
proc_remove(c->pde);
 #endif
+   list_del_rcu(&c->list);
+   spin_unlock(&cn->lock);
+   local_bh_enable();
+
+   unregister_netdevice_notifier(&c->notifier);
+
return;
}
local_bh_enable();


>  clusterip_tg_check+0xf9c/0x16d0 net/ipv4/netfilter/ipt_CLUSTERIP.c:488
>  xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:850
>  check_target net/ipv4/netfilter/ip_tables.c:513 [inline]
>  find_check_entry.isra.8+0x8c8/0xcb0 net/ipv4/netfilter/ip_tables.c:554
>  translate_table+0xed1/0x1610 net/ipv4/netfilter/ip_tables.c:725
>  do_replace net/ipv4/netfilter/ip_tables.c:1141 [inline]
>  do_ipt_set_ctl+0x370/0x5f0 net/ipv4/netfilter/ip_tables.c:1675
>  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>  nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>  ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1259
>  sctp_setsockopt+0x2b6/0x61d0 net/sctp/socket.c:4104
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>  SYSC_setsockopt net/socket.c:1849 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1828
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x446839
> RSP: 002b:7f0309d0fdb8 EFLAGS: 0246 ORIG_RAX: 0036
> RAX: ffda RBX: 00

Re: [PATCH] KVM: X86: Fix SMRAM accessing even if VM is shutdown

2018-02-06 Thread Dmitry Vyukov
On Wed, Feb 7, 2018 at 7:25 AM, Wanpeng Li  wrote:
> From: Wanpeng Li 
>
> Reported by syzkaller:
>
>WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 
> handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
>CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
>RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
>Call Trace:
> vmx_handle_exit+0xbd/0xe20 [kvm_intel]
> kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
> kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
> do_vfs_ioctl+0xa4/0x6a0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x25/0x9c
>
> The syzkaller creates a former thread to issue KVM_SMI ioctl, and then creates
> a latter thread to mmap and operate on the same vCPU, rsm emulation will not 
> be
> executed since there is no something like seabios which implements smi handler
> when running syzkaller directly. This triggers a race condition when running
> the testcase with multiple threads. Sometimes one thread exit w/ SHUTDOWN
> reason, another thread mmaps and operates on the same vCPU, it continues to
> use CS=0x3, IP=0x8000 to access the address of SMI handler which results
> in the above ept misconfig. This patch fixes it by bailing out immediately if
> the vCPU is marked EXIT_SHUTDOWN reason.
>
> Reported-by: Dmitry Vyukov 

This was reported by syzbot:
https://groups.google.com/d/msg/syzkaller-bugs/6GrlY0UcDEk/aMShRKq3AwAJ

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: 
syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4d...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.


> Cc: Dmitry Vyukov 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/kvm/x86.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 786cd00..445e702 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7458,6 +7458,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *kvm_run)
> goto out;
> }
>
> +   if (unlikely(vcpu->run->exit_reason == KVM_EXIT_SHUTDOWN)) {
> +   r = -EINVAL;
> +   goto out;
> +   }
> +
> if (vcpu->run->kvm_dirty_regs) {
> r = sync_regs(vcpu);
> if (r != 0)
> --
> 2.7.4
>


[PATCH] KVM: X86: Fix SMRAM accessing even if VM is shutdown

2018-02-06 Thread Wanpeng Li
From: Wanpeng Li 

Reported by syzkaller:

   WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 
handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
   RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   Call Trace:
vmx_handle_exit+0xbd/0xe20 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
do_vfs_ioctl+0xa4/0x6a0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x25/0x9c

The syzkaller creates a former thread to issue KVM_SMI ioctl, and then creates
a latter thread to mmap and operate on the same vCPU, rsm emulation will not be 
executed since there is no something like seabios which implements smi handler 
when running syzkaller directly. This triggers a race condition when running 
the testcase with multiple threads. Sometimes one thread exit w/ SHUTDOWN 
reason, another thread mmaps and operates on the same vCPU, it continues to 
use CS=0x3, IP=0x8000 to access the address of SMI handler which results 
in the above ept misconfig. This patch fixes it by bailing out immediately if 
the vCPU is marked EXIT_SHUTDOWN reason.

Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 786cd00..445e702 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7458,6 +7458,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
goto out;
}
 
+   if (unlikely(vcpu->run->exit_reason == KVM_EXIT_SHUTDOWN)) {
+   r = -EINVAL;
+   goto out;
+   }
+
if (vcpu->run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
-- 
2.7.4



Re: [PATCH 2/2] usb: chipidea: imx: Fix ULPI on imx53

2018-02-06 Thread Peter Chen
On Tue, Feb 06, 2018 at 04:50:41PM +0100, Sebastian Reichel wrote:
> Hi Peter,
> 
> On Mon, Jan 29, 2018 at 11:33:15AM +0800, Peter Chen wrote:
> > On Wed, Jan 24, 2018 at 06:14:39PM +0100, Sebastian Reichel wrote:
> > > Traditionally, PORTSC should be set before initializing ULPI phys. But
> > > setting PORTSC before powering on the phy results in a kernel freeze
> > > on imx53 based GE PPD. As a workaround this initializes the phy early
> > > in the imx platform code and disables phy power management from the
> > > core.
> > > 
> > > Signed-off-by: Fabien Lahoudere 
> > > Signed-off-by: Sebastian Reichel 
> > > ---
> > >  drivers/usb/chipidea/ci_hdrc_imx.c | 12 
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c 
> > > b/drivers/usb/chipidea/ci_hdrc_imx.c
> > > index de155c80eb70..e431c5aafe35 100644
> > > --- a/drivers/usb/chipidea/ci_hdrc_imx.c
> > > +++ b/drivers/usb/chipidea/ci_hdrc_imx.c
> > > @@ -83,6 +83,7 @@ struct ci_hdrc_imx_data {
> > >   struct clk *clk;
> > >   struct imx_usbmisc_data *usbmisc_data;
> > >   bool supports_runtime_pm;
> > > + bool override_phy_control;
> > >   bool in_lpm;
> > >   /* SoC before i.mx6 (except imx23/imx28) needs three clks */
> > >   bool need_three_clks;
> > > @@ -254,6 +255,7 @@ static int ci_hdrc_imx_probe(struct platform_device 
> > > *pdev)
> > >   int ret;
> > >   const struct of_device_id *of_id;
> > >   const struct ci_hdrc_imx_platform_flag *imx_platform_flag;
> > > + struct device_node *np = pdev->dev.of_node;
> > >  
> > >   of_id = of_match_device(ci_hdrc_imx_dt_ids, &pdev->dev);
> > >   if (!of_id)
> > > @@ -288,6 +290,14 @@ static int ci_hdrc_imx_probe(struct platform_device 
> > > *pdev)
> > >   }
> > >  
> > >   pdata.usb_phy = data->phy;
> > > +
> > > + if (of_device_is_compatible(np, "fsl,imx53-usb") && pdata.usb_phy &&
> > > + of_usb_get_phy_mode(np) == USBPHY_INTERFACE_MODE_ULPI) {
> > > + pdata.flags |= CI_HDRC_OVERRIDE_PHY_CONTROL;
> > > + data->override_phy_control = true;
> > > + usb_phy_init(pdata.usb_phy);
> > > + }
> > > +
> > >   pdata.flags |= imx_platform_flag->flags;
> > >   if (pdata.flags & CI_HDRC_SUPPORTS_RUNTIME_PM)
> > >   data->supports_runtime_pm = true;
> > > @@ -341,6 +351,8 @@ static int ci_hdrc_imx_remove(struct platform_device 
> > > *pdev)
> > >   pm_runtime_put_noidle(&pdev->dev);
> > >   }
> > >   ci_hdrc_remove_device(data->ci_pdev);
> > > + if (data->override_phy_control)
> > > + usb_phy_shutdown(data->phy);
> > >   imx_disable_unprepare_clks(&pdev->dev);
> > >  
> > 
> > Sebastian, I have a question, do you have any USB or generic PHY drivers
> > for ULPI bus, any power controls are needed for your ULPI peripheral?
> 
> The devicetree for GE PPD is available in the mainline kernel:
> 
> $ grep -A9 "usbphy[23] {" arch/arm/boot/dts/imx53-ppd.dts
>   usbphy2: usbphy2 {
>   compatible = "usb-nop-xceiv";
>   reset-gpios = <&gpio4 4 GPIO_ACTIVE_LOW>;
>   clock-names = "main_clk";
>   clock-frequency = <2400>;
>   clocks = <&clks IMX5_CLK_CKO2>;
>   assigned-clocks = <&clks IMX5_CLK_CKO2_SEL>, <&clks 
> IMX5_CLK_OSC>;
>   assigned-clock-parents = <&clks IMX5_CLK_OSC>;
>   };
> 
>   usbphy3: usbphy3 {
>   compatible = "usb-nop-xceiv";
>   reset-gpios = <&gpio2 19 GPIO_ACTIVE_LOW>;
>   clock-names = "main_clk";
> 
>   clock-frequency = <2400>;
>   clocks = <&clks IMX5_CLK_CKO2>;
>   assigned-clocks = <&clks IMX5_CLK_CKO2_SEL>, <&clks 
> IMX5_CLK_OSC>;
>   assigned-clock-parents = <&clks IMX5_CLK_OSC>;
>   };
> 
> So currently the machine only uses drivers/usb/phy/phy-generic.c. Both
> USB phys are actually SMSC USB3315, which is also detected by the kernel:
> 
> root@csmon :~# cat /sys/bus/ulpi/devices/ci_hdrc.*.ulpi/uevent 
> DEVTYPE=ulpi_device
> MODALIAS=ulpi:v0424p0006
> DEVTYPE=ulpi_device
> MODALIAS=ulpi:v0424p0006
> 
> So maybe drivers/usb/phy/phy-ulpi.c should be used, but I don't see
> a simple way to do so and using the generic PHY works.
> 

It is correct you use phy-generic.c if it can let your design
work, thanks.

-- 

Best Regards,
Peter Chen


Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Viresh Kumar
On 07-02-18, 14:16, Sean Wang wrote:
> On Wed, 2018-02-07 at 09:03 +0530, Viresh Kumar wrote:
> > On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
> > >   cpus {
> > >   #address-cells = <2>;
> > >   #size-cells = <0>;
> > > @@ -26,6 +70,10 @@
> > >   device_type = "cpu";
> > >   compatible = "arm,cortex-a53", "arm,armv8";
> > >   reg = <0x0 0x0>;
> > > + clocks = <&infracfg CLK_INFRA_MUX1_SEL>,
> > > +  <&apmixedsys CLK_APMIXED_MAIN_CORE_EN>;
> > > + clock-names = "cpu", "intermediate";
> > > + operating-points-v2 = <&cpu_opp_table>;
> > >   enable-method = "psci";
> > >   clock-frequency = <13>;
> > >   };
> > > @@ -34,6 +82,7 @@
> > >   device_type = "cpu";
> > >   compatible = "arm,cortex-a53", "arm,armv8";
> > >   reg = <0x0 0x1>;
> > > + operating-points-v2 = <&cpu_opp_table>;
> > >   enable-method = "psci";
> > >   clock-frequency = <13>;
> > >   };
> > 
> > Sorry for not picking this earlier, but you should probably add the same 
> > clock
> > related properties for both cpu nodes here. Things will break if CPU1 is 
> > used by
> > the cpufreq core to bring the cpufreq policy online.
> > 
> > This can happen if cpufreq driver is a module, CPU0 is hotplugged out and 
> > then
> > the cpufreq driver is inserted.
> > 
> 
> mt7622 cpu0 does not support hotplug. do I still need to add same clock
> related properties for both cpu nodes here?

Normally we should always add these properties to all the CPUs, as that's the
real scenario hardware configuration wise.

But I am not sure if something else will break if you don't provide clocks in
CPU1.

@Rob @Mark: What do you suggest ?

-- 
viresh


arch/x86/tools/insn_decoder_test: warning: ffffffff810005de: 0f ff e8 ud0 %eax,%ebp

2018-02-06 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   ab2d92ad881da11331280aedf612d82e61cb6d41
commit: 10c91577d5e631773a6394e14cf60125389b71ae x86/tools: Standardize output 
format of insn_decode_test
date:   8 weeks ago
config: x86_64-randconfig-s3-02070914 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 10c91577d5e631773a6394e14cf60125389b71ae
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
>> arch/x86/tools/insn_decoder_test: warning: 810005de: 0f ff e8
>> ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810010d7: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001152: 0f ff bf 09 00 
00 00ud00x9(%rdi),%edi
   arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001275: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810019b2: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001afc: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001c23: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002502: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 8100267e: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810028a0: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002a94: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002b17: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002e20: 0f ff 83 cd 01 
e8 17ud00x17e801cd(%rbx),%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002eea: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report

Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Sean Wang
On Wed, 2018-02-07 at 09:03 +0530, Viresh Kumar wrote:
> On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
> > cpus {
> > #address-cells = <2>;
> > #size-cells = <0>;
> > @@ -26,6 +70,10 @@
> > device_type = "cpu";
> > compatible = "arm,cortex-a53", "arm,armv8";
> > reg = <0x0 0x0>;
> > +   clocks = <&infracfg CLK_INFRA_MUX1_SEL>,
> > +<&apmixedsys CLK_APMIXED_MAIN_CORE_EN>;
> > +   clock-names = "cpu", "intermediate";
> > +   operating-points-v2 = <&cpu_opp_table>;
> > enable-method = "psci";
> > clock-frequency = <13>;
> > };
> > @@ -34,6 +82,7 @@
> > device_type = "cpu";
> > compatible = "arm,cortex-a53", "arm,armv8";
> > reg = <0x0 0x1>;
> > +   operating-points-v2 = <&cpu_opp_table>;
> > enable-method = "psci";
> > clock-frequency = <13>;
> > };
> 
> Sorry for not picking this earlier, but you should probably add the same clock
> related properties for both cpu nodes here. Things will break if CPU1 is used 
> by
> the cpufreq core to bring the cpufreq policy online.
> 
> This can happen if cpufreq driver is a module, CPU0 is hotplugged out and then
> the cpufreq driver is inserted.
> 

mt7622 cpu0 does not support hotplug. do I still need to add same clock
related properties for both cpu nodes here?




Re: [PATCH] selftests/android: Fix line continuation in Makefile

2018-02-06 Thread Pintu Kumar
On Wed, Feb 7, 2018 at 5:22 AM, Daniel Díaz  wrote:
> The Makefile lacks a couple of line continuation backslashes
> in an `if' clause, which can make the subsequent rsync
> command go awry over the whole filesystem (`rsync -a / /`).
>
>   /bin/sh: -c: line 5: syntax error: unexpected end of file
>   make[1]: [all] Error 1 (ignored)
>   TEST=$DIR"_test.sh"; \
>   if [ -e $DIR/$TEST ]; then
>   /bin/sh: -c: line 2: syntax error: unexpected end of file
>   make[1]: [all] Error 1 (ignored)
>   rsync -a $DIR/$TEST $BUILD_TARGET/;
>   [...a myriad of:]
>   [  rsync: readlink_stat("...") failed: Permission denied (13)]
>   [  skipping non-regular file "..."]
>   [  rsync: opendir "..." failed: Permission denied (13)]
>   [and many other errors...]
>   fi
>   make[1]: fi: Command not found
>   make[1]: [all] Error 127 (ignored)
>   done
>   make[1]: done: Command not found
>   make[1]: [all] Error 127 (ignored)
>
> Signed-off-by: Daniel Díaz 
> ---
>  tools/testing/selftests/android/Makefile | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/android/Makefile 
> b/tools/testing/selftests/android/Makefile
> index 1a74922..f6304d2 100644
> --- a/tools/testing/selftests/android/Makefile
> +++ b/tools/testing/selftests/android/Makefile
> @@ -11,11 +11,11 @@ all:
> BUILD_TARGET=$(OUTPUT)/$$DIR;   \
> mkdir $$BUILD_TARGET  -p;   \
> make OUTPUT=$$BUILD_TARGET -C $$DIR $@;\
> -   #SUBDIR test prog name should be in the form: SUBDIR_test.sh
> +   #SUBDIR test prog name should be in the form: SUBDIR_test.sh \
> TEST=$$DIR"_test.sh"; \
> -   if [ -e $$DIR/$$TEST ]; then
> -   rsync -a $$DIR/$$TEST $$BUILD_TARGET/;
> -   fi
> +   if [ -e $$DIR/$$TEST ]; then \
> +   rsync -a $$DIR/$$TEST $$BUILD_TARGET/; \
> +   fi \
> done

Thanks for your patch.
However, I have copied this Makefile from
tools/testing/selftests/futex/Makefile before modifying it.
If there is a problem with backslash then the same problem must be
there in futex Makefile as well.
Can you compare these 2 Makefile and see if there is any problem.

Also is it because of make version ?
Can you check your make version ?

Thank You!
Pintu

>
>  override define RUN_TESTS
> --
> 2.7.4
>


Re: [PATCH] ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204

2018-02-06 Thread Takashi Iwai
On Sat, 03 Feb 2018 15:42:40 +0100,
Lassi Ylikojola wrote:
> 
> Add quirk to ensure a sync endpoint is properly configured.
> This patch is a fix for same symptoms on Behringer UFX1204 as patch
> from Albertto Aquirre on Dec 8 2016 for Axe-Fx II.
> 
> Signed-off-by: Lassi Ylikojola 

The patch doesn't seem applied cleanly to the latest tree.
Could you check it and repost with the proper patch for the latest
Linus tree?


thanks,

Takashi


Re: [PATCH] ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute

2018-02-06 Thread Takashi Iwai
On Mon, 29 Jan 2018 06:37:55 +0100,
Kirill Marinushkin wrote:
> 
> The layout of the UAC2 Control request and response varies depending on
> the request type. With the current implementation, only the Layout 2
> Parameter Block (with the 2-byte sized RANGE attribute) is handled
> properly. For the Control requests with the 1-byte sized RANGE attribute
> (Bass Control, Mid Control, Tremble Control), the response is parsed
> incorrectly.
> 
> This commit:
> * fixes the wLength field value in the request
> * fixes parsing the range values from the response
> 
> Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
> Signed-off-by: Kirill Marinushkin 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: Jaejoong Kim 
> Cc: Bhumika Goyal 
> Cc: Stephen Barber 
> Cc: Julian Scheel 
> Cc: alsa-de...@alsa-project.org
> Cc: linux-kernel@vger.kernel.org

Sorry for the late reply, as I've been (and still) off.

Does this bug actually hit on any real devices, or is it only a
logical error so far?  In the former case, a Cc to stable is
mandatory.

In anyway, I'll review and merge it properly once after I back to
work.


thanks,

Takashi


Re: [PATCH v3 0/2] phy: rockchip-emmc: fixes emmc-phy power on failed with rk3399 SoCs

2018-02-06 Thread Kishon Vijay Abraham I


On Wednesday 07 February 2018 06:47 AM, Caesar Wang wrote:
> Kishon,
> 
> Can you help merge this in your  or next tree?  I'm hoping that we can land
> this somewhere.:-)

sure, I'll merge once -rc1 is tagged.

Thanks
Kishon

> 
> 
> Thanks,
> -Caesar
> 在 2018年01月11日 10:40, Caesar Wang 写道:
>> Hi Kishon,
>>
>> Since the Shawn isn't available, I take over this series patches for now.
>>
>> As the original bug had tracked on https://issuetracker.google.com/71561742.
>> In some cases, the mmc phy power on failed during booting up.
>> The log as below:
>> ...
>> [   2.375333] rockchip_emmc_phy_power: caldone timeout.
>> [2.377815] phy phy-ff77.syscon:phy@f780.4: phy poweron failed --> 
>> -110
>> ...
>> [2.489295] mmc0: mmc_select_hs400es failed, error -110
>> [2.489302] mmc0: error -110 whilst initialising MMC card
>> ..
>>
>> The actual emulate, the wait 5us for calpad busy trimming, that's no enough.
>> We need give the enough margin for it.
>>
>> Verified on url =
>> 
>> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-4.4
>> This series patches can apply and bring up with kernel-next on rk3399
>> chromebook.
>>
>> -Caesar
>>
>>
>> Changes in v3:
>> - As Doug commented on both upstream and gerrit.
>>Change "5, 50" to "0, 50", and the message of print.
>> - As Doug commented on https://patchwork.kernel.org/patch/10154797,
>>Change "1, 50" to "0, 50".
>>
>> Changes in v2:
>> - print the return valut with regmap_read_poll_timeout failing.
>> - As Brian commented on https://patchwork.kernel.org/patch/10139891/,
>>changed the note and added to print error value with
>>regmap_read_poll_timeout API.
>>
>> Shawn Lin (2):
>>phy: rockchip-emmc: retry calpad busy trimming
>>phy: rockchip-emmc: use regmap_read_poll_timeout to poll dllrdy
>>
>>   drivers/phy/rockchip/phy-rockchip-emmc.c | 60 
>> +++-
>>   1 file changed, 28 insertions(+), 32 deletions(-)
>>
> 
> 


Re: [PATCH v3] staging: android: ion: Add implementation of dma_buf_vmap and dma_buf_vunmap

2018-02-06 Thread Alexey Skidanov


On 02/07/2018 01:56 AM, Laura Abbott wrote:
> On 01/31/2018 10:10 PM, Alexey Skidanov wrote:
>>
>> On 01/31/2018 03:00 PM, Greg KH wrote:
>>> On Wed, Jan 31, 2018 at 02:03:42PM +0200, Alexey Skidanov wrote:
 Any driver may access shared buffers, created by ion, using
 dma_buf_vmap and
 dma_buf_vunmap dma-buf API that maps/unmaps previosuly allocated
 buffers into
 the kernel virtual address space. The implementation of these API is
 missing in
 the current ion implementation.

 Signed-off-by: Alexey Skidanov 
 ---
>>>
>>> No review from any other Intel developers? :(
>> Will add.
>>>
>>> Anyway, what in-tree driver needs access to these functions?
>> I'm not sure that there are the in-tree drivers using these functions
>> and ion as> buffer exporter because they are not implemented in ion :)
>> But there are some in-tre> drivers using these APIs (gpu drivers) with
>> other buffer exporters.
> 
> It's still not clear why you need to implement these APIs.
How the importing kernel module may access the content of the buffer? :)
With the current ion implementation it's only possible by dma_buf_kmap,
mapping one page at a time. For pretty large buffers, it might have some
performance impact.
(Probably, the page by page mapping is the only way to access large
buffers on 32 bit systems, where the vmalloc range is very small. By the
way, the current ion dma_map_kmap doesn't really map only 1 page at a
time - it uses the result of vmap() that might fail on 32 bit systems.)

> Are you planning to use Ion with GPU drivers? I'm especially
> interested in this if you have a non-Android use case.
Yes, my use case is the non-Android one. But not with GPU drivers.
> 
> Thanks,
> Laura

Thanks,
Alexey


Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Paul E. McKenney
On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
> On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> > So it is OK to kvmalloc() something and pass it to either kfree() or
> > kvfree(), and it had better be OK to kvmalloc() something and pass it
> > to kvfree().
> > 
> > Is it OK to kmalloc() something and pass it to kvfree()?
> 
> Yes, it absolutely is.
> 
> void kvfree(const void *addr)
> {
> if (is_vmalloc_addr(addr))
> vfree(addr);
> else
> kfree(addr);
> }
> 
> > If so, is it really useful to have two different names here, that is,
> > both kfree_rcu() and kvfree_rcu()?
> 
> I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
> vfree_rcu() available in the API for the symmetry of calling kmalloc()
> / kfree_rcu().
> 
> Personally, I would like us to rename kvfree() to just free(), and have
> malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
> fight yet.

But why not just have the existing kfree_rcu() API cover both kmalloc()
and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
anyone arguing that the RCU API has too few members.  ;-)

Thanx, Paul



Re: [PATCH v3 14/21] fpga: dfl: add fpga manager platform driver for FME

2018-02-06 Thread Wu Hao
On Tue, Feb 06, 2018 at 12:53:44PM -0600, Alan Tull wrote:
> On Tue, Feb 6, 2018 at 12:47 AM, Wu Hao  wrote:
> > On Mon, Feb 05, 2018 at 10:25:54PM -0600, Alan Tull wrote:
> >> On Mon, Feb 5, 2018 at 7:47 PM, Wu Hao  wrote:
> >> > On Mon, Feb 05, 2018 at 10:36:45AM -0800, Luebbers, Enno wrote:
> >> >> Hi Hao,
> >> >>
> >> >> On Sun, Feb 04, 2018 at 05:37:06PM +0800, Wu Hao wrote:
> >> >> > On Fri, Feb 02, 2018 at 04:26:26PM -0800, Luebbers, Enno wrote:
> >> >> > > Hi Hao, Alan,
> >> >> > >
> >> >> > > On Fri, Feb 02, 2018 at 05:42:13PM +0800, Wu Hao wrote:
> >> >> > > > On Thu, Feb 01, 2018 at 04:00:36PM -0600, Alan Tull wrote:
> >> >> > > > > On Mon, Nov 27, 2017 at 12:42 AM, Wu Hao  
> >> >> > > > > wrote:
> >> >> > > > >
> >> >> > > > > Hi Hao,
> >> >> > > > >
> >> >> > > > > A few comments below.   Besides that, looks good.
> >> >> > > > >
> >> >> > > > > > This patch adds fpga manager driver for FPGA Management 
> >> >> > > > > > Engine (FME). It
> >> >> > > > > > implements fpga_manager_ops for FPGA Partial Reconfiguration 
> >> >> > > > > > function.
> >> >> > > > > >
> >> >> > > > > > Signed-off-by: Tim Whisonant 
> >> >> > > > > > Signed-off-by: Enno Luebbers 
> >> >> > > > > > Signed-off-by: Shiva Rao 
> >> >> > > > > > Signed-off-by: Christopher Rauer 
> >> >> > > > > > Signed-off-by: Kang Luwei 
> >> >> > > > > > Signed-off-by: Xiao Guangrong 
> >> >> > > > > > Signed-off-by: Wu Hao 
> >> >> > > > > > 
> >> >> > > > > > v3: rename driver to dfl-fpga-fme-mgr
> >> >> > > > > > implemented status callback for fpga manager
> >> >> > > > > > rebased due to fpga api changes
> >> >> > > > > > ---
> >> >> > > > > >  .../ABI/testing/sysfs-platform-fpga-dfl-fme-mgr|   8 +
> >> >> > > > > >  drivers/fpga/Kconfig   |   6 +
> >> >> > > > > >  drivers/fpga/Makefile  |   1 +
> >> >> > > > > >  drivers/fpga/fpga-dfl-fme-mgr.c| 318 
> >> >> > > > > > +
> >> >> > > > > >  drivers/fpga/fpga-dfl.h|  39 ++-
> >> >> > > > > >  5 files changed, 371 insertions(+), 1 deletion(-)
> >> >> > > > > >  create mode 100644 
> >> >> > > > > > Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > >  create mode 100644 drivers/fpga/fpga-dfl-fme-mgr.c
> >> >> > > > > >
> >> >> > > > > > diff --git 
> >> >> > > > > > a/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr 
> >> >> > > > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > > new file mode 100644
> >> >> > > > > > index 000..2d4f917
> >> >> > > > > > --- /dev/null
> >> >> > > > > > +++ 
> >> >> > > > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > > @@ -0,0 +1,8 @@
> >> >> > > > > > +What:  
> >> >> > > > > > /sys/bus/platform/devices/fpga-dfl-fme-mgr.0/interface_id
> >> >> > > > > > +Date:  November 2017
> >> >> > > > > > +KernelVersion:  4.15
> >> >> > > > > > +Contact:   Wu Hao 
> >> >> > > > > > +Description:   Read-only. It returns interface id of partial 
> >> >> > > > > > reconfiguration
> >> >> > > > > > +   hardware. Userspace could use this 
> >> >> > > > > > information to check if
> >> >> > > > > > +   current hardware is compatible with given 
> >> >> > > > > > image before FPGA
> >> >> > > > > > +   programming.
> >> >> > > > >
> >> >> > > > > I'm a little confused by this.  I can understand that the PR 
> >> >> > > > > bitstream
> >> >> > > > > has a dependency on the FPGA's static image, but I don't 
> >> >> > > > > understand
> >> >> > > > > the dependency of the bistream on the hardware that is used to 
> >> >> > > > > program
> >> >> > > > > the bitstream to the FPGA.
> >> >> > > >
> >> >> > > > Sorry for the confusion, the interface_id is used to indicate the 
> >> >> > > > version of
> >> >> > > > the hardware for partial reconfiguration (it's part of the static 
> >> >> > > > image of
> >> >> > > > the FPGA device). Will improve the description on this.
> >> >> > > >
> >> >> > >
> >> >> > > The interface_id expresses the compatibility of the static region 
> >> >> > > with PR
> >> >> > > bitstreams generated for it. It changes every time a new static 
> >> >> > > region is
> >> >> > > generated.
> >> >> > >
> >> >> > > Would it make more sense to have the interface_id exposed as part 
> >> >> > > of the FME
> >> >> > > device (which represents the static region)? I'm not sure - it kind 
> >> >> > > of also
> >> >> > > makes sense here, where you would have all the information in one 
> >> >> > > place (if the
> >> >> > > interface_id matches, I can use this component to program a 
> >> >> > > bitstream).
> >> >> >
> >> >> > Hi Enno
> >> >> >
> >> >> > Yes, this interface is under fpga-dfl-fme-mgr.0, and 
> >> >> > fpga-dfl-fme-mgr.0 is
> >> >> > under fpga-dfl-fme.0. It's part of the FME device for sure. From 
> >> >> > another
> >> >> > point of view, it means if an

[PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-06 Thread Aishwarya Pant
Add documentation for core and hardware specific infiniband interfaces.
The descriptions have been collected from git commit logs, reading
through code and data sheets. Some drivers have incomplete doc and are
annotated with the comment '[to be documented]'.

Signed-off-by: Aishwarya Pant 
---
Changes in v3:
-  outbound -> inbound in description of port_rcv_constraint_errors
v2:
- Move infiniband interface from testing to stable
- Fix typos
- Update description of cap_mask, port_xmit_constraint_errors and
  port_rcv_constraint_errors
- Add doc for hw_counters
- Remove old documentation

 Documentation/ABI/stable/sysfs-class-infiniband  | 818 +++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 -
 Documentation/infiniband/sysfs.txt   | 129 +---
 3 files changed, 820 insertions(+), 143 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-class-infiniband
 delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband

diff --git a/Documentation/ABI/stable/sysfs-class-infiniband 
b/Documentation/ABI/stable/sysfs-class-infiniband
new file mode 100644
index ..f3acf3713a91
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-class-infiniband
@@ -0,0 +1,818 @@
+sysfs interface common for all infiniband devices
+-
+
+What:  /sys/class/infiniband//node_type
+What:  /sys/class/infiniband//node_guid
+What:  /sys/class/infiniband//sys_image_guid
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   linux-r...@vger.kernel.org
+Description:
+   node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
+   switch or router)
+
+   node_guid:  (RO) Node GUID
+
+   sys_image_guid: (RO) System image GUID
+
+
+What:  /sys/class/infiniband//node_desc
+Date:  Feb, 2006
+KernelVersion: v2.6.17
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RW) Update the node description with information such as the
+   node's hostname, so that IB network management software can tie
+   its view to the real world.
+
+
+What:  /sys/class/infiniband//fw_ver
+Date:  Jun, 2016
+KernelVersion: v4.10
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RO) Display firmware version
+
+
+What:  /sys/class/infiniband//ports//lid
+What:  /sys/class/infiniband//ports//rate
+What:  /sys/class/infiniband//ports//lid_mask_count
+What:  /sys/class/infiniband//ports//sm_sl
+What:  /sys/class/infiniband//ports//sm_lid
+What:  /sys/class/infiniband//ports//state
+What:  /sys/class/infiniband//ports//phys_state
+What:  /sys/class/infiniband//ports//cap_mask
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   linux-r...@vger.kernel.org
+Description:
+
+   lid:(RO) Port LID
+
+   rate:   (RO) Port data rate (active width * active
+   speed)
+
+   lid_mask_count: (RO) Port LID mask count
+
+   sm_sl:  (RO) Subnet manager SL for port's subnet
+
+   sm_lid: (RO) Subnet manager LID for port's subnet
+
+   state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
+   ACTIVE_DEFER)
+
+   phys_state: (RO) Port physical state (Sleep, Polling,
+   LinkUp, etc)
+
+   cap_mask:   (RO) Port capability mask. 2 bits here are
+   settable- IsCommunicationManagementSupported
+   (set when CM module is loaded) and IsSM (set via
+   open of issmN file).
+
+
+What:  /sys/class/infiniband//ports//link_layer
+Date:  Oct, 2010
+KernelVersion: v2.6.37
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RO) Link layer type information (Infiniband or Ethernet type)
+
+
+What:  
/sys/class/infiniband//ports//counters/symbol_error
+What:  
/sys/class/infiniband//ports//counters/port_rcv_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
+What:  
/sys/class/infiniband//ports//counters/link_error_recovery
+What:  
/sys/class/infiniband//ports//counters/port_xmit_constraint_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_contraint_errors
+What:  
/sys/class/infiniband//ports//counters/local_link_integrity_errors
+What:  
/sys/class/infiniband//ports//counters/excessive_buffer_overrun_errors
+What:  
/sys/class/infiniband//ports//counters/port_xmit_data
+What:  
/sys/class/infiniband//ports//counters/port_rcv_data
+What:  
/sys/class/infin

Re: WARNING in kmalloc_slab (3)

2018-02-06 Thread Dmitry Vyukov
On Tue, Dec 12, 2017 at 10:22 PM, Eric Biggers  wrote:
> On Mon, Dec 04, 2017 at 12:26:32PM +0300, Dan Carpenter wrote:
>> On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote:
>> > On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter  
>> > wrote:
>> > > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote:
>> > >> Looks like BLKTRACESETUP doesn't limit the '.buf_nr' parameter, 
>> > >> allowing anyone
>> > >> who can open a block device to cause an extremely large kmalloc.  
>> > >> Here's a
>> > >> simplified reproducer:
>> > >>
>> > >
>> > > There are lots of places which allow people to allocate as much as they
>> > > want.  With Syzcaller, you might want to just hard code a __GFP_NOWARN
>> > > in to disable it.
>> >
>> > Hi,
>> >
>> > Hard code it where?
>>
>> My idea was to just make warn_alloc() a no-op.
>>
>> >
>> > User-controllable allocation are supposed to use __GFP_NOWARN.
>>
>> No that's not right.  What we don't want is unprivileged users to use
>> all the memory and we don't want unprivileged users to spam
>> /var/log/messages.  But you have to have slightly elevated permissions
>> to open block devices right?  The warning is helpful.  Admins should
>> "don't do that" if they don't want the warning.
>
> WARN_ON() should only be used for kernel bugs.  printk can be a different 
> story.
> If it's a "userspace shouldn't do this" kind of thing, then if there is any
> message at all it should be a rate-limited printk that actually explains what
> the problem is, not a random WARN_ON() that can only be interpreted by kernel
> developers.
>
> And yes, the fact that anyone with read access to any block device, even e.g. 
> a
> loop device, can cause the kernel to do an unbounded kmalloc *is* a bug.  It
> needs to have a reasonable limit.  It is not a problem on all systems, but on
> some systems "the admin" might give users read access to some block devices.



#syz fix: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE


Re: WARNING: kmalloc bug in relay_open_buf

2018-02-06 Thread Dmitry Vyukov
On Wed, Feb 7, 2018 at 12:21 AM, Andrew Morton
 wrote:
> On Tue, 06 Feb 2018 14:58:02 -0800 syzbot 
>  wrote:
>
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> e237f98a9c134c3d600353f21e07db915516875b (Mon Feb 5 21:35:56 2018 +)
>> Merge tag 'xfs-4.16-merge-5' of
>> git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
>>
>> C reproducer is attached.
>> syzkaller reproducer is attached.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+7525b19f9531f76b8...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> audit: type=1400 audit(1517939984.452:7): avc:  denied  { map } for
>> pid=4159 comm="syzkaller032522" path="/root/syzkaller032522586" dev="sda1"
>> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
>> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
>> WARNING: CPU: 0 PID: 4159 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
>> mm/slab_common.c:1012
>> Kernel panic - not syncing: panic_on_warn set ...
>
>
> David sent a fix today which I believe will address this.

Thanks
Let's tell syzbot about the fix:
#syz fix: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE

> From: David Rientjes 
> Subject: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
>
> chan->n_subbufs is set by the user and relay_create_buf() does a kmalloc()
> of chan->n_subbufs * sizeof(size_t *).
>
> kmalloc_slab() will generate a warning when this fails if
> chan->subbufs * sizeof(size_t *) > KMALLOC_MAX_SIZE.
>
> Limit chan->n_subbufs to the maximum allowed kmalloc() size.
>
> Link: 
> http://lkml.kernel.org/r/alpine.deb.2.10.1802061216100.122...@chino.kir.corp.google.com
> Fixes: f6302f1bcd75 ("relay: prevent integer overflow in relay_open()")
> Signed-off-by: David Rientjes 
> Reviewed-by: Andrew Morton 
> Cc: Jens Axboe 
> Cc: Dave Jiang 
> Cc: Al Viro 
> Cc: Dan Carpenter 
> Signed-off-by: Andrew Morton 
> ---
>
>  kernel/relay.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -puN kernel/relay.c~kernel-relay-limit-kmalloc-size-to-kmalloc_max_size 
> kernel/relay.c
> --- a/kernel/relay.c~kernel-relay-limit-kmalloc-size-to-kmalloc_max_size
> +++ a/kernel/relay.c
> @@ -163,7 +163,7 @@ static struct rchan_buf *relay_create_bu
>  {
> struct rchan_buf *buf;
>
> -   if (chan->n_subbufs > UINT_MAX / sizeof(size_t *))
> +   if (chan->n_subbufs > KMALLOC_MAX_SIZE / sizeof(size_t *))
> return NULL;
>
> buf = kzalloc(sizeof(struct rchan_buf), GFP_KERNEL);


Re: [PATCH net 1/1 v2] rtnetlink: require unique netns identifier

2018-02-06 Thread kbuild test robot
Hi Christian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-require-unique-netns-identifier/20180207-064207
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817de851: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817de85f: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817decc2: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817decf1: 0f ff c3
ud0%ebx,%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817def6c: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817df332: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e1947: 0f ff 44 8b ad  
ud0-0x53(%rbx,%rcx,4),%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 5 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2552: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2585: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e26d8: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2752: 0f ff 48 8d 
ud0-0x73(%rax),%ecx
   arch/x86/tools/insn_decoder_test: warning: objdump says 4 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2801: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e305e: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e3559: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e3fd8: 0f ff 48 8b 
ud0 

Re: [PATCH v2 2/3] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak


On 02/07/2018 03:25 AM, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jan 31, 2018 at 8:19 AM, Rajendra Nayak  wrote:
>> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
>> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> new file mode 100644
>> index ..02520f19e4ca
>> --- /dev/null
>> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> @@ -0,0 +1,277 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2018, The Linux Foundation. All rights reserved.
>> + */
>> +
>> +#include 
>> +
>> +/ {
>> +   model = "Qualcomm Technologies, Inc. SDM845";
> 
> I'm fairly certain that "model" doesn't belong in the SoC .dtsi file.
> Only in the board .dts file.
> 
> 
>> +   clocks {
>> +   xo_board: xo_board {
> 
> Just to make it explicit: see my comments in patch 3/3 in this series
> about using "_" in node names.  I believe this should be:
> 
>   xo_board: xo-board {
> 
> 
>> +   spmi_bus: qcom,spmi@c44 {
> 
> Drop the qcom in the node name.  AKA, I believe this should be:
> 
> spmi_bus: spmi@c44 {
> 
> Specifically the node name is supposed to be a generic component name
> then with an address.  I see that Rob Herring said the same thing when
> he reviewed v1 of this patch just now (it seems like people are still
> commenting there, so make sure you collect the latest feedback from
> there when re-spinning).

yes, I'll make sure I fix up based on Robs' review of the v1.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alexey Kardashevskiy
On 07/02/18 15:25, Alex Williamson wrote:
> On Wed, 7 Feb 2018 15:09:22 +1100
> Alexey Kardashevskiy  wrote:
>> On 07/02/18 11:08, Alex Williamson wrote:
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index e3301dbd27d4..07966a5f0832 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
>>>  
>>>  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
>>>  
>>> +/**
>>> + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
>>> + *  struct vfio_device_ioeventfd)
>>> + *
>>> + * Perform a write to the device at the specified device fd offset, with
>>> + * the specified data and width when the provided eventfd is triggered.
>>> + *
>>> + * Return: 0 on success, -errno on failure.
>>> + */
>>> +struct vfio_device_ioeventfd {
>>> +   __u32   argsz;
>>> +   __u32   flags;
>>> +#define VFIO_DEVICE_IOEVENTFD_8(1 << 0) /* 1-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_16   (1 << 1) /* 2-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_32   (1 << 2) /* 4-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_64   (1 << 3) /* 8-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK(0xf)
>>> +   __u64   offset; /* device fd offset of write */
>>> +   __u64   data;   /* data to be written */
>>> +   __s32   fd; /* -1 for de-assignment */
>>> +};
>>> +
>>> +#define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 14)  
>>
>>
>> Is this a first ioctl with endianness fixed to little-endian? I'd suggest
>> to comment on that as things like vfio_info_cap_header do use the host
>> endianness.
> 
> Look at our current read and write interface, we call leXX_to_cpu
> before calling iowriteXX there and I think a user would logically
> expect to use the same data format here as they would there.

If the data is "char data[8]" (i.e. bytestream), then it can be expected to
be device/bus endian (i.e. PCI == little endian), but if it is u64 - then I
am not so sure really, and this made me look around. It could be "__le64
data" too.

> Also note
> that iowriteXX does a cpu_to_leXX, so are we really defining the
> interface as little-endian or are we just trying to make ourselves
> endian neutral and counter that implicit conversion?  Thanks,

Defining it LE is fine, I just find it a bit confusing when
vfio_info_cap_header is host endian but vfio_device_ioeventfd is not.


-- 
Alexey


Re: [PATCH 1/2] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak
[]..

>> +
>> +#include 
>> +
>> +/ {
>> +   model = "Qualcomm Technologies, Inc. SDM845";
> 
> This should only be in the board level file.

thanks, will fix.

> 
>> +
>> +   interrupt-parent = <&intc>;
>> +
>> +   #address-cells = <2>;
>> +   #size-cells = <2>;
>> +
>> +   chosen { };
>> +
>> +   memory {
>> +   device_type = "memory";
>> +   /* We expect the bootloader to fill in the reg */
> 
> The start address is variable? If not you should populate the base and
> have a unit-address.

sure, I'll check and update.

> 
>> +   reg = <0 0 0 0>;
>> +   };
>> +

[]..
>> +
>> +   soc: soc {
>> +   #address-cells = <1>;
>> +   #size-cells = <1>;
>> +   ranges = <0 0 0 0x>;
>> +   compatible = "simple-bus";
>> +
>> +   intc: interrupt-controller@17a0 {
>> +   compatible = "arm,gic-v3";
>> +   #interrupt-cells = <3>;
>> +   interrupt-controller;
>> +   #redistributor-regions = <1>;
>> +   redistributor-stride = <0x0 0x2>;
>> +   reg = <0x17a0 0x1>, /* GICD */
>> + <0x17a6 0x10>;/* GICR * 8 */
>> +   interrupts = ;
>> +   };
>> +
>> +   gcc: clock-controller@10 {
>> +   compatible = "qcom,gcc-sdm845";
> 
> sdm845-gcc is the preferred order.

This is still proposed as part of the GCC patch for sdm845 [1]
(which looks like has neither you nor the DT list copied :/ )
Also looking at Documentation/devicetree/bindings/clock/qcom,gcc.txt,
I see we seem to follow the gcc- convention for compatible all along :(

"qcom,gcc-apq8064"
"qcom,gcc-apq8084"
"qcom,gcc-ipq8064"
"qcom,gcc-ipq4019"
"qcom,gcc-ipq8074"
"qcom,gcc-msm8660"
"qcom,gcc-msm8916"
"qcom,gcc-msm8960"
"qcom,gcc-msm8974"
"qcom,gcc-msm8974pro"
"qcom,gcc-msm8974pro-ac"
"qcom,gcc-msm8994"
"qcom,gcc-msm8996"
"qcom,gcc-mdm9615"

[1] https://patchwork.kernel.org/patch/10193895/ 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Michael S. Tsirkin
On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host polls the free
> page vq after sending the starting cmd id, so the guest doesn't need to
> kick after filling an element to the vq.
> 
> Host may also requests the guest to stop the reporting in advance by
> sending the stop cmd id to the guest via the configuration register.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> ---
>  drivers/virtio/virtio_balloon.c | 255 
> +++-
>  include/uapi/linux/virtio_balloon.h |   7 +
>  mm/page_poison.c|   6 +
>  3 files changed, 232 insertions(+), 36 deletions(-)
> 
> Resend Change:
>   - Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.

> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..5476725 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,11 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(&vb->stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, &vb->update_balloon_size_work);
> - spin_unlock_irqrestore(&vb->stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(&vb->stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +&vb->update_balloon_size_work);
> + spin_unlock_irqrestore(&vb->stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, &vb->cmd_id_received);
> + if (vb->cmd_id_received !=
> + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
> + spin_lock_irqsave(&vb->stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(vb->balloon_wq,
> +&vb->report_free_page_work);
> + spin_unlock_irqrestore(&vb->stop_update_lock, flags);
> + }
> + 

Re: [PATCH 2/3] x86/tme: Detect if TME and MKTME is activated by BIOS

2018-02-06 Thread Kai Huang
On Wed, 2018-01-31 at 12:15 +0300, Kirill A. Shutemov wrote:
> IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has
> enabled
> TME and MKTME. It includes which encryption policy/algorithm is
> selected
> for TME or available for MKTME. For MKTME, the MSR also enumerates
> how
> many KeyIDs are available.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  arch/x86/kernel/cpu/intel.c | 83
> +
>  1 file changed, 83 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index 6936d14d4c77..5b95fa484837 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -517,6 +517,86 @@ static void detect_vmx_virtcap(struct
> cpuinfo_x86 *c)
>   }
>  }
>  
> +#define MSR_IA32_TME_ACTIVATE0x982

Should this MSR go into msr-index.h?

> +
> +#define TME_ACTIVATE_LOCKED(x)   (x & 0x1)
> +#define TME_ACTIVATE_ENABLED(x)  (x & 0x2)
> +
> +#define TME_ACTIVATE_POLICY(x)   ((x >> 4) & 0xf)
> /* Bits 7:4 */
> +#define TME_ACTIVATE_POLICY_AES_XTS  0
> +
> +#define TME_ACTIVATE_KEYID_BITS(x)   ((x >> 32) & 0xf)   /
> * Bits 35:32 */
> +
> +#define TME_ACTIVATE_CRYPTO_ALGS(x)  ((x >> 48) & 0x)
> /* Bits 63:48 */
> +#define TME_ACTIVATE_CRYPTO_AES_XTS  1
> +
> +#define MKTME_ENABLED0
> +#define MKTME_DISABLED   1
> +#define MKTME_UNINITIALIZED  2
> +static int mktme_status = MKTME_UNINITIALIZED;
> +
> +static void detect_tme(struct cpuinfo_x86 *c)
> +{
> + u64 tme_activate, tme_policy, tme_crypto_algs;
> + int keyid_bits = 0, nr_keyids = 0;
> + static u64 tme_activate_cpu0 = 0;
> +
> + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
> +
> + if (mktme_status != MKTME_UNINITIALIZED) {
> + /* Broken BIOS? */
> + if (tme_activate != tme_activate_cpu0) {
> + pr_err_once("TME: configuation is
> inconsistent between CPUs\n");
> + mktme_status = MKTME_DISABLED;
> + }
> + goto out;

Why goto out here? If something goes wrong, I think it is pointless to
read keyID bits staff? IMHO if something goes wrong, you should set
mktme_status to disabled, and clear tme_activate_cpu0?

> + }
> +
> + tme_activate_cpu0 = tme_activate;
> +
> + if (!TME_ACTIVATE_LOCKED(tme_activate) ||
> !TME_ACTIVATE_ENABLED(tme_activate)) {
> + pr_info("TME: not enabled by BIOS\n");
> + mktme_status = MKTME_DISABLED;
> + goto out;

I think it is pointless to read keyID bits staff if TME is not even
enabled.

> + }
> +
> + pr_info("TME: enabled by BIOS\n");
> +
> + tme_policy = TME_ACTIVATE_POLICY(tme_activate);
> + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS)
> + pr_warn("TME: Unknown policy is active: %#llx\n",
> tme_policy);
> +
> + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
> + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS)) {
> + pr_err("MKTME: No known encryption algorithm is
> supported: %#llx\n",
> + tme_crypto_algs);
> + mktme_status = MKTME_DISABLED;
> + }

To me it is a little bit confusing about the naming. tme_policy is the
crypto_alg used by TME keyID (0), and tme_crypto_algs is bitmap of
supported crypto_algs for MK-TME. Probably a better naming is needed?
And the naming of TME_ACTIVATE_POLICY(x), TME_ACTIVATE_CRYPTO_ALGS(x)
above as well?

> +out:
> + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
> + nr_keyids = (1UL << keyid_bits) - 1;
> + if (nr_keyids) {
> + pr_info_once("MKTME: enabled by BIOS\n");
> + pr_info_once("MKTME: %d KeyIDs available\n",
> nr_keyids);
> + } else {
> + pr_info_once("MKTME: disabled by BIOS\n");
> + }
> +
> + if (mktme_status == MKTME_UNINITIALIZED) {
> + /* MKTME is usable */
> + mktme_status = MKTME_ENABLED;
> + }
> +
> + /*
> +  * Exclude KeyID bits from physical address bits.
> +  *
> +  * We have to do this even if we are not going to use KeyID
> bits
> +  * ourself. VM guests still have to know that these bits are
> not usable
> +  * for physical address.
> +  */
Currently KVM uses CPUID to get such info directly, but not consulting
c->x86_phys_bits. I think it may be reasonable for KVM to consulting c-
>x86_phys_bits for MK-TME, but IMHO the real reason we need to do this
is this is just the fact, and c->x86_phys_bits needs to reflect the
fact, so probably the comments can be refined.

Thanks,
-Kai

> + c->x86_phys_bits -= keyid_bits;
> +}
> +
>  static void init_intel_energy_perf(struct cpuinfo_x86 *c)
>  {
>   u64 epb;
> @@ -687,6 +767,9 @@ static void init_intel(struct cpuinfo_x86 *c)
>   if (cpu_has(c, X86_FEATURE_VMX))
>   detect_vmx_virtcap(c);
>  
> + if (cpu_has(c, X86_FEATURE_TME))
> + 

Re: linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Stephen Rothwell
Hi Michael,

On Wed, 7 Feb 2018 04:57:42 +0200 "Michael S. Tsirkin"  wrote:
>
> On Wed, Feb 07, 2018 at 01:54:41PM +1100, Stephen Rothwell wrote:
> > 
> > On Wed, 7 Feb 2018 13:04:23 +1100 Stephen Rothwell  
> > wrote:  
> > >
> > > I have used the vhost tree from next-20180206 for today.  
> 
> That's
> commit d25cc43c6775bff6b8e3dad97c747954b805e421
> vhost: don't hold onto file pointer for VHOST_SET_LOG_FD
> 
> Right?

Correct.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v2 3/3] arm64: dts: sdm845: Add serial console support

2018-02-06 Thread Rajendra Nayak
[]..

>> @@ -10,4 +10,46 @@
>>  / {
>> model = "Qualcomm Technologies, Inc. SDM845 MTP";
>> compatible = "qcom,sdm845-mtp";
>> +
>> +   aliases {
>> +   serial0 = &qup_uart2;
>> +   };
>> +
>> +   chosen {
>> +   stdout-path = "serial0";
>> +   };
>> +
>> +   soc {
> 
> I don't know if there's an official position, but in general I'm seen
> people use the actual "soc" alias here.  AKA at the top level of this
> dts, just do:
> 
> &soc {
>   ...
> };
> 
> Normally doing stuff like that is useful so you don't need to
> replicate the whole hierarchy.  In this case that's not a huge
> savings, but it can be nice to be consistent.  In the very least it
> saves you a level of indentation.
> 
> 
>> +   serial@a84000 {
>> +   status = "okay";
>> +   };
> 
> Similarly here you can use the alias from the sdm845.dtsi file to
> avoid replicating the hierarchy.  AKA at the top level do:
> 
> &qup_uart2 {
>   status = "okay";
> };
> 
> In this case it saves you 2 levels of indentation.

Right. Andy/Bjorn, are there any preferences here?
I see we don't do this for the other board files, and I not sure
theres a specific reasoning for how its currently done and if we
need to stick to it.

> 
>> +
>> +   pinctrl@340 {
>> +   qup_uart2_default: qup_uart2_default {
> 
> I'm not sure how persnickety I should be, but according to
> :
> 
>   node names use dash "-" instead of underscore "_"
> 
> ...but, of course, labels can't use dashes (and the same page says to
> use underscore for labels).  This is why, in rk3288 for instance, you
> see:
> 
> i2c2_xfer: i2c2-xfer {
>   rockchip,pins = <6 9 RK_FUNC_1 &pcfg_pull_none>,
>   <6 10 RK_FUNC_1 &pcfg_pull_none>;
> };
> 
> AKA the label and the node name are the same but the label uses "_"
> and the node names use "-".

Sure, I'll fix these up.

[]
>> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
>> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> index 02520f19e4ca..c4ce70840acf 100644
>> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> @@ -4,6 +4,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>
>>  / {
>> model = "Qualcomm Technologies, Inc. SDM845";
>> @@ -273,5 +274,25 @@
>> cell-index = <0>;
>> };
>>
>> +   qup_1: qcom,geni_se@ac {
>> +   compatible = "qcom,geni-se-qup";
>> +   reg = <0xac 0x6000>;
> 
> I think you may have mentioned this in another context, but this
> doesn't match the current bindings.  Some clocks should be here.
> ...and it looks like the uart should be a subnode.

right, these were tested with the v1 set for serial. I will update them.

regards
Rajendra

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alex Williamson
On Wed, 7 Feb 2018 15:09:22 +1100
Alexey Kardashevskiy  wrote:
> On 07/02/18 11:08, Alex Williamson wrote:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index e3301dbd27d4..07966a5f0832 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
> >  
> >  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
> >  
> > +/**
> > + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
> > + *  struct vfio_device_ioeventfd)
> > + *
> > + * Perform a write to the device at the specified device fd offset, with
> > + * the specified data and width when the provided eventfd is triggered.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_ioeventfd {
> > +   __u32   argsz;
> > +   __u32   flags;
> > +#define VFIO_DEVICE_IOEVENTFD_8(1 << 0) /* 1-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_16   (1 << 1) /* 2-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_32   (1 << 2) /* 4-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_64   (1 << 3) /* 8-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK(0xf)
> > +   __u64   offset; /* device fd offset of write */
> > +   __u64   data;   /* data to be written */
> > +   __s32   fd; /* -1 for de-assignment */
> > +};
> > +
> > +#define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 14)  
> 
> 
> Is this a first ioctl with endianness fixed to little-endian? I'd suggest
> to comment on that as things like vfio_info_cap_header do use the host
> endianness.

Look at our current read and write interface, we call leXX_to_cpu
before calling iowriteXX there and I think a user would logically
expect to use the same data format here as they would there.  Also note
that iowriteXX does a cpu_to_leXX, so are we really defining the
interface as little-endian or are we just trying to make ourselves
endian neutral and counter that implicit conversion?  Thanks,

Alex


Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Matthew Wilcox
On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> So it is OK to kvmalloc() something and pass it to either kfree() or
> kvfree(), and it had better be OK to kvmalloc() something and pass it
> to kvfree().
> 
> Is it OK to kmalloc() something and pass it to kvfree()?

Yes, it absolutely is.

void kvfree(const void *addr)
{
if (is_vmalloc_addr(addr))
vfree(addr);
else
kfree(addr);
}

> If so, is it really useful to have two different names here, that is,
> both kfree_rcu() and kvfree_rcu()?

I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
vfree_rcu() available in the API for the symmetry of calling kmalloc()
/ kfree_rcu().

Personally, I would like us to rename kvfree() to just free(), and have
malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
fight yet.


Re: [PATCH 1/2] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak


On 02/07/2018 12:24 AM, Bjorn Andersson wrote:
> On Thu 25 Jan 08:32 PST 2018, Rajendra Nayak wrote:
>> +spmi_bus: qcom,spmi@c44 {
> [..]
>> +};
>> +
> 
> While we have the chance, please remove this empty line.

Will do. Thanks for the review.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


Re: [tip:sched/urgent] sched/rt: Up the root domain ref count when passing it around via IPIs

2018-02-06 Thread Steven Rostedt

I see this was just applied to Linus's tree. This too probably should be
tagged for stable as well.

-- Steve


On Tue, 6 Feb 2018 03:54:42 -0800
"tip-bot for Steven Rostedt (VMware)"  wrote:

> Commit-ID:  364f56653708ba8bcdefd4f0da2a42904baa8eeb
> Gitweb: 
> https://git.kernel.org/tip/364f56653708ba8bcdefd4f0da2a42904baa8eeb
> Author: Steven Rostedt (VMware) 
> AuthorDate: Tue, 23 Jan 2018 20:45:38 -0500
> Committer:  Ingo Molnar 
> CommitDate: Tue, 6 Feb 2018 10:20:33 +0100
> 
> sched/rt: Up the root domain ref count when passing it around via IPIs
> 
> When issuing an IPI RT push, where an IPI is sent to each CPU that has more
> than one RT task scheduled on it, it references the root domain's rto_mask,
> that contains all the CPUs within the root domain that has more than one RT
> task in the runable state. The problem is, after the IPIs are initiated, the
> rq->lock is released. This means that the root domain that is associated to
> the run queue could be freed while the IPIs are going around.
> 
> Add a sched_get_rd() and a sched_put_rd() that will increment and decrement
> the root domain's ref count respectively. This way when initiating the IPIs,
> the scheduler will up the root domain's ref count before releasing the
> rq->lock, ensuring that the root domain does not go away until the IPI round
> is complete.
> 
> Reported-by: Pavan Kondeti 
> Signed-off-by: Steven Rostedt (VMware) 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Andrew Morton 
> Cc: Linus Torvalds 
> Cc: Mike Galbraith 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic")
> Link: 
> http://lkml.kernel.org/r/CAEU1=pkiho35dzna8eqqnskw1fr1y1zrq5y66x117mg06sq...@mail.gmail.com
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/sched/rt.c   |  9 +++--
>  kernel/sched/sched.h|  2 ++
>  kernel/sched/topology.c | 13 +
>  3 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 2fb627d..89a086e 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1990,8 +1990,11 @@ static void tell_cpu_to_push(struct rq *rq)
>  
>   rto_start_unlock(&rq->rd->rto_loop_start);
>  
> - if (cpu >= 0)
> + if (cpu >= 0) {
> + /* Make sure the rd does not get freed while pushing */
> + sched_get_rd(rq->rd);
>   irq_work_queue_on(&rq->rd->rto_push_work, cpu);
> + }
>  }
>  
>  /* Called from hardirq context */
> @@ -2021,8 +2024,10 @@ void rto_push_irq_work_func(struct irq_work *work)
>  
>   raw_spin_unlock(&rd->rto_lock);
>  
> - if (cpu < 0)
> + if (cpu < 0) {
> + sched_put_rd(rd);
>   return;
> + }
>  
>   /* Try the next RT overloaded CPU */
>   irq_work_queue_on(&rd->rto_push_work, cpu);
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 2e95505..fb5fc45 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -691,6 +691,8 @@ extern struct mutex sched_domains_mutex;
>  extern void init_defrootdomain(void);
>  extern int sched_init_domains(const struct cpumask *cpu_map);
>  extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
> +extern void sched_get_rd(struct root_domain *rd);
> +extern void sched_put_rd(struct root_domain *rd);
>  
>  #ifdef HAVE_RT_PUSH_IPI
>  extern void rto_push_irq_work_func(struct irq_work *work);
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 034cbed..519b024 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -259,6 +259,19 @@ void rq_attach_root(struct rq *rq, struct root_domain 
> *rd)
>   call_rcu_sched(&old_rd->rcu, free_rootdomain);
>  }
>  
> +void sched_get_rd(struct root_domain *rd)
> +{
> + atomic_inc(&rd->refcount);
> +}
> +
> +void sched_put_rd(struct root_domain *rd)
> +{
> + if (!atomic_dec_and_test(&rd->refcount))
> + return;
> +
> + call_rcu_sched(&rd->rcu, free_rootdomain);
> +}
> +
>  static int init_rootdomain(struct root_domain *rd)
>  {
>   if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL))



Re: [tip:sched/urgent] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()

2018-02-06 Thread Steven Rostedt

I see this was just applied to Linus's tree. It probably should be
tagged for stable as well.

-- Steve


On Tue, 6 Feb 2018 03:54:16 -0800
"tip-bot for Steven Rostedt (VMware)"  wrote:

> Commit-ID:  ad0f1d9d65938aec72a698116cd73a980916895e
> Gitweb: 
> https://git.kernel.org/tip/ad0f1d9d65938aec72a698116cd73a980916895e
> Author: Steven Rostedt (VMware) 
> AuthorDate: Tue, 23 Jan 2018 20:45:37 -0500
> Committer:  Ingo Molnar 
> CommitDate: Tue, 6 Feb 2018 10:20:33 +0100
> 
> sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
> 
> When the rto_push_irq_work_func() is called, it looks at the RT overloaded
> bitmask in the root domain via the runqueue (rq->rd). The problem is that
> during CPU up and down, nothing here stops rq->rd from changing between
> taking the rq->rd->rto_lock and releasing it. That means the lock that is
> released is not the same lock that was taken.
> 
> Instead of using this_rq()->rd to get the root domain, as the irq work is
> part of the root domain, we can simply get the root domain from the irq work
> that is passed to the routine:
> 
>  container_of(work, struct root_domain, rto_push_work)
> 
> This keeps the root domain consistent.
> 
> Reported-by: Pavan Kondeti 
> Signed-off-by: Steven Rostedt (VMware) 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Andrew Morton 
> Cc: Linus Torvalds 
> Cc: Mike Galbraith 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic")
> Link: 
> http://lkml.kernel.org/r/CAEU1=pkiho35dzna8eqqnskw1fr1y1zrq5y66x117mg06sq...@mail.gmail.com
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/sched/rt.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 862a513..2fb627d 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1907,9 +1907,8 @@ static void push_rt_tasks(struct rq *rq)
>   * the rt_loop_next will cause the iterator to perform another scan.
>   *
>   */
> -static int rto_next_cpu(struct rq *rq)
> +static int rto_next_cpu(struct root_domain *rd)
>  {
> - struct root_domain *rd = rq->rd;
>   int next;
>   int cpu;
>  
> @@ -1985,7 +1984,7 @@ static void tell_cpu_to_push(struct rq *rq)
>* Otherwise it is finishing up and an ipi needs to be sent.
>*/
>   if (rq->rd->rto_cpu < 0)
> - cpu = rto_next_cpu(rq);
> + cpu = rto_next_cpu(rq->rd);
>  
>   raw_spin_unlock(&rq->rd->rto_lock);
>  
> @@ -1998,6 +1997,8 @@ static void tell_cpu_to_push(struct rq *rq)
>  /* Called from hardirq context */
>  void rto_push_irq_work_func(struct irq_work *work)
>  {
> + struct root_domain *rd =
> + container_of(work, struct root_domain, rto_push_work);
>   struct rq *rq;
>   int cpu;
>  
> @@ -2013,18 +2014,18 @@ void rto_push_irq_work_func(struct irq_work *work)
>   raw_spin_unlock(&rq->lock);
>   }
>  
> - raw_spin_lock(&rq->rd->rto_lock);
> + raw_spin_lock(&rd->rto_lock);
>  
>   /* Pass the IPI to the next rt overloaded queue */
> - cpu = rto_next_cpu(rq);
> + cpu = rto_next_cpu(rd);
>  
> - raw_spin_unlock(&rq->rd->rto_lock);
> + raw_spin_unlock(&rd->rto_lock);
>  
>   if (cpu < 0)
>   return;
>  
>   /* Try the next RT overloaded CPU */
> - irq_work_queue_on(&rq->rd->rto_push_work, cpu);
> + irq_work_queue_on(&rd->rto_push_work, cpu);
>  }
>  #endif /* HAVE_RT_PUSH_IPI */
>  



Re: [PATCH 1/2] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak
[]..

>>> + };
>>> +
>>> + gcc: clock-controller@10 {
>>> + compatible = "qcom,gcc-sdm845";
>>> + reg = <0x10 0x1f>;
>>> + #clock-cells = <1>;
>>> + #reset-cells = <1>;
>>> + };
>>> +
>>> + tlmm: pinctrl@0340 {
>>
>> Drop leading zeroes please.
> 
> Build dtbs with W=2 and fix the warnings so reviewers don't have to
> waste their time on these issues.

Noted. Thanks Rob.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


Re: [PATCH 2/2] arm64: dts: sdm845: Add serial console support

2018-02-06 Thread Rajendra Nayak


On 02/07/2018 01:19 AM, Doug Anderson wrote:
> Hi,
> 
> On Tue, Feb 6, 2018 at 11:06 AM, Bjorn Andersson
>  wrote:
>> On Tue 06 Feb 10:37 PST 2018, Doug Anderson wrote:
>>
>>> Hi,
>>>
>>> On Fri, Jan 26, 2018 at 2:18 PM, Stephen Boyd  wrote:
 On 01/25, Rajendra Nayak wrote:
> diff --git a/arch/arm64/boot/dts/qcom/sdm845-pins.dtsi 
> b/arch/arm64/boot/dts/qcom/sdm845-pins.dtsi
> new file mode 100644
> index ..b97f99e6f4b4
> --- /dev/null
> +++ b/arch/arm64/boot/dts/qcom/sdm845-pins.dtsi
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2018, The Linux Foundation. All rights reserved.
> + */
> +
> +&tlmm {

 I'm not the maintainer, but I find this approach to the pins
 really annoying. I have to flip to another file to figure out how
 a board has configured the pins. And we may bring in a bunch of
 settings that we don't ever use on some board too. Why can't we
 put the settings in the board file directly?
>>>
>>> I'm not so familiar with how things work with Qualcomm, but in general
>>> I think putting this in the "board" file is a bad idea.  I'd be OK
>>> with putting this directly in the SoC file (though it might get
>>> unwieldy?), but not moving things to the board file as was done with
>>> v2 of this patch.
>>>
>>> Said another way: nearly board that uses SDM845 that uses UART2 will
>>> have the same definitions for these pins so we shouldn't be
>>> duplicating it across every board, right?
>>>
>>
>> We've run into several cases where different boards uses the same
>> function but requires board specific electrical configuration.
>>
>> So what we decided was to keep the pinmux in the soc-file (where e.g.
>> the uart definition is) and then extend it with the board specific
>> electrical properties (the pinconf), in the board files.
>>
>> This does come with the complexity of having the pinctrl nodes split in
>> two places, but the responsibilities of the two parts is clear and we
>> remove the need for all board files to ensure the appropriate pinmux is
>> in place.
>>
>>
>> NB. We did discuss adding "sane defaults" for the pinconf in the soc
>> dtsi, but we end up spending considerable time debugging issues stemming
>> from not having the right pinconf; so better make this explicit and say
>> that the board has to specify it's config.
> 
> Whoops, saw your responses _after_ I sent my response to v2.  In any
> case this makes sense to me then!  On Rockchip boards I've been
> involved in we often added "sane defaults", but I can see how that
> could be confusing in different ways.  I'm happy with your choice and
> it seems like a happy medium.  The sdm845.dtsi file can have the main
> definition of the nodes and can thus refer to the nodes.  Then you
> just add the extra bit in the board file.
> 
> What you propose is not what happened in v2 of the series
>  though.  In v2 _both_
> the pinconf and the pinmux moved to the board file.  That's wrong.

got it. I'll fix this up in my v3. Thanks for the review.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


linux-next: Tree for Feb 7

2018-02-06 Thread Stephen Rothwell
Hi all,

Please do not add any v4.17 material to your linux-next included branches
until after v4.16-rc1 has been released.

Changes since 20180206:

The btrfs-kdave tree gained conflicts against Linus' tree and a build
failure so I used the version from next-20180206.

The kvm tree gained a conflict against the arm64 tree.

The vhost tree gained a build failure so I used the version from
next-20180206.

Non-merge commits (relative to Linus' tree): 1216
 1208 files changed, 39489 insertions(+), 13744 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 256 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (b46dc8ae17a4 media: videobuf2: fix up for "media: 
annotate ->poll() instances")
Merging fixes/master (b46dc8ae17a4 media: videobuf2: fix up for "media: 
annotate ->poll() instances")
Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from 
.gitignore)
Merging arc-current/for-curr (053823335956 arc: dts: use 'atmel' as 
manufacturer for at24 in axs10x_mb)
Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index)
Merging m68k-current/for-linus (2334b1ac1235 MAINTAINERS: Add NuBus subsystem 
entry)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (1b689a95ce74 powerpc/pseries: include 
linux/types.h in asm/hvcall.h)
Merging sparc/master (aebb48f5e465 sparc64: fix typo in 
CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (176bfb406d73 Merge branch 'be2net-patch-set')
Merging bpf/master (41b0530eca69 Merge branch 'bpf-sockmap-fixes')
Merging ipsec/master (545d8ae7afff xfrm: fix boolean assignment in 
xfrm_get_type_offload)
Merging netfilter/master (992cfc7c5d10 netfilter: nft_flow_offload: no need to 
flush entries on module removal)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a9e6d44ddecc ssb: Do not disable PCI host on 
non-Mips)
Merging mac80211/master (c4de37ee2b55 mac80211: mesh: fix wrong mesh TTL offset 
calculation)
Merging rdma-fixes/for-rc (ae59c3f0b6cf RDMA/mlx5: Fix out-of-bound access 
while querying AH)
Merging sound-current/for-linus (f87a5843cc10 ALSA: hda - Fix headset mic 
detection problem for two Dell machines)
Merging pci-current/for-linus (838cda369707 x86/PCI: Enable AMD 64-bit window 
on resume)
Merging driver-core.current/driver-core-linus (35277995e179 Merge branch 
'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging tty.current/tty-linus (4bf772b14675 Merge tag 'drm-for-v4.16' of 
git://people.freedesktop.org/~airlied/linux)
Merging usb.current/usb-linus (4bf772b14675 Merge tag 'drm-for-v4.16' of 
git://people.freedesktop.org/~airlied/linux)
Merging usb-gadget-fixes/fixes (b2cd1df66037 Linux 4.15-rc7)
Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new 
device ID ELV ALC 8xxx)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c

[PATCH] ACPI/IORT: Remove linker section for IORT entries again

2018-02-06 Thread Jia He
In commit 316ca8804ea8 ("ACPI/IORT: Remove linker section for IORT entries
probing"), iort entries was removed in vmlinux.lds.h. But in
commit 2fcc112af37f ("clocksource/drivers: Rename clksrc table to timer"),
this line was back incorrectly.

It does no harm except for adding some useless symbols, so fix it.

Signed-off-by: Jia He 
---
 include/asm-generic/vmlinux.lds.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 1ab0e52..58b1dab 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -589,7 +589,6 @@
IRQCHIP_OF_MATCH_TABLE()\
ACPI_PROBE_TABLE(irqchip)   \
ACPI_PROBE_TABLE(timer) \
-   ACPI_PROBE_TABLE(iort)  \
EARLYCON_TABLE()
 
 #define INIT_TEXT  \
-- 
2.7.4



Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alexey Kardashevskiy
On 07/02/18 11:08, Alex Williamson wrote:
> The ioeventfd here is actually irqfd handling of an ioeventfd such as
> supported in KVM.  A user is able to pre-program a device write to
> occur when the eventfd triggers.  This is yet another instance of
> eventfd-irqfd triggering between KVM and vfio.  The impetus for this
> is high frequency writes to pages which are virtualized in QEMU.
> Enabling this near-direct write path for selected registers within
> the virtualized page can improve performance and reduce overhead.
> Specifically this is initially targeted at NVIDIA graphics cards where
> the driver issues a write to an MMIO register within a virtualized
> region in order to allow the MSI interrupt to re-trigger.
> 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/vfio/pci/vfio_pci.c |   33 +++
>  drivers/vfio/pci/vfio_pci_private.h |   14 +++
>  drivers/vfio/pci/vfio_pci_rdwr.c|  165 
> ---
>  include/uapi/linux/vfio.h   |   24 +
>  4 files changed, 224 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index f041b1a6cf66..c8e7297a61a3 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -302,6 +302,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
>  {
>   struct pci_dev *pdev = vdev->pdev;
>   struct vfio_pci_dummy_resource *dummy_res, *tmp;
> + struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
>   int i, bar;
>  
>   /* Stop the device from further DMA */
> @@ -311,6 +312,14 @@ static void vfio_pci_disable(struct vfio_pci_device 
> *vdev)
>   VFIO_IRQ_SET_ACTION_TRIGGER,
>   vdev->irq_type, 0, 0, NULL);
>  
> + /* Device closed, don't need mutex here */
> + list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
> +  &vdev->ioeventfds_list, next) {
> + vfio_virqfd_disable(&ioeventfd->virqfd);
> + list_del(&ioeventfd->next);
> + kfree(ioeventfd);
> + }
> +
>   vdev->virq_disabled = false;
>  
>   for (i = 0; i < vdev->num_regions; i++)
> @@ -1039,6 +1048,28 @@ static long vfio_pci_ioctl(void *device_data,
>  
>   kfree(groups);
>   return ret;
> + } else if (cmd == VFIO_DEVICE_IOEVENTFD) {
> + struct vfio_device_ioeventfd ioeventfd;
> + int count;
> +
> + minsz = offsetofend(struct vfio_device_ioeventfd, fd);
> +
> + if (copy_from_user(&ioeventfd, (void __user*)arg, minsz))
> + return -EFAULT;
> +
> + if (ioeventfd.argsz < minsz)
> + return -EINVAL;
> +
> + if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK)
> + return -EINVAL;
> +
> + count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK;
> +
> + if (hweight8(count) != 1 || ioeventfd.fd < -1)
> + return -EINVAL;
> +
> + return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
> +   ioeventfd.data, count, ioeventfd.fd);
>   }
>  
>   return -ENOTTY;
> @@ -1217,6 +1248,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   vdev->irq_type = VFIO_PCI_NUM_IRQS;
>   mutex_init(&vdev->igate);
>   spin_lock_init(&vdev->irqlock);
> + mutex_init(&vdev->ioeventfds_lock);
> + INIT_LIST_HEAD(&vdev->ioeventfds_list);
>  
>   ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
>   if (ret) {
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index f561ac1c78a0..23797622396e 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -29,6 +29,15 @@
>  #define PCI_CAP_ID_INVALID   0xFF/* default raw access */
>  #define PCI_CAP_ID_INVALID_VIRT  0xFE/* default virt access 
> */
>  
> +struct vfio_pci_ioeventfd {
> + struct list_headnext;
> + struct virqfd   *virqfd;
> + loff_t  pos;
> + uint64_tdata;
> + int bar;
> + int count;
> +};
> +
>  struct vfio_pci_irq_ctx {
>   struct eventfd_ctx  *trigger;
>   struct virqfd   *unmask;
> @@ -95,6 +104,8 @@ struct vfio_pci_device {
>   struct eventfd_ctx  *err_trigger;
>   struct eventfd_ctx  *req_trigger;
>   struct list_headdummy_resources_list;
> + struct mutexioeventfds_lock;
> + struct list_headioeventfds_list;
>  };
>  
>  #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> @@ -120,6 +131,9 @@ extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device 
> *vdev, char __user *buf,
>  extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user 
> *buf,
> 

[PATCH 1/2] staging: android: ion: Cleanup ion_page_pool_alloc_pages

2018-02-06 Thread Yisheng Xie
ion_page_pool_alloc_pages calls alloc_pages to allocate pages for page
pools. If alloc_pages return NULL, it will return NULL, or it will
return the pages allocate from alloc_pages. So we can just return
alloc_pages without any judgement.

Signed-off-by: Yisheng Xie 
---
 drivers/staging/android/ion/ion_page_pool.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/android/ion/ion_page_pool.c 
b/drivers/staging/android/ion/ion_page_pool.c
index e3a6e32..6d2caf0 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -11,13 +11,9 @@
 
 #include "ion.h"
 
-static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
+static inline struct page *ion_page_pool_alloc_pages(struct ion_page_pool 
*pool)
 {
-   struct page *page = alloc_pages(pool->gfp_mask, pool->order);
-
-   if (!page)
-   return NULL;
-   return page;
+   return alloc_pages(pool->gfp_mask, pool->order);
 }
 
 static void ion_page_pool_free_pages(struct ion_page_pool *pool,
-- 
1.7.12.4



[PATCH 2/2] staging: android: ion: Combine cache and uncache pools

2018-02-06 Thread Yisheng Xie
Now we call dma_map in the dma_buf API callbacks and handle explicit
caching by the dma_buf sync API, which make cache and uncache pools
in the same handling flow, which can be combined.

Signed-off-by: Yisheng Xie 
---
 drivers/staging/android/ion/ion.c |  5 --
 drivers/staging/android/ion/ion.h | 13 +
 drivers/staging/android/ion/ion_page_pool.c   |  5 +-
 drivers/staging/android/ion/ion_system_heap.c | 76 +--
 4 files changed, 16 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c 
b/drivers/staging/android/ion/ion.c
index 461b193..c094be2 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -33,11 +33,6 @@
 static struct ion_device *internal_dev;
 static int heap_id;
 
-bool ion_buffer_cached(struct ion_buffer *buffer)
-{
-   return !!(buffer->flags & ION_FLAG_CACHED);
-}
-
 /* this function should only be called while dev->lock is held */
 static void ion_buffer_add(struct ion_device *dev,
   struct ion_buffer *buffer)
diff --git a/drivers/staging/android/ion/ion.h 
b/drivers/staging/android/ion/ion.h
index 1bc443f..ea08978 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -185,14 +185,6 @@ struct ion_heap {
 };
 
 /**
- * ion_buffer_cached - this ion buffer is cached
- * @buffer:buffer
- *
- * indicates whether this ion buffer is cached
- */
-bool ion_buffer_cached(struct ion_buffer *buffer);
-
-/**
  * ion_device_add_heap - adds a heap to the ion device
  * @heap:  the heap to add
  */
@@ -302,7 +294,6 @@ size_t ion_heap_freelist_shrink(struct ion_heap *heap,
  * @gfp_mask:  gfp_mask to use from alloc
  * @order: order of pages in the pool
  * @list:  plist node for list of pools
- * @cached:it's cached pool or not
  *
  * Allows you to keep a pool of pre allocated pages to use from your heap.
  * Keeping a pool of pages that is ready for dma, ie any cached mapping have
@@ -312,7 +303,6 @@ size_t ion_heap_freelist_shrink(struct ion_heap *heap,
 struct ion_page_pool {
int high_count;
int low_count;
-   bool cached;
struct list_head high_items;
struct list_head low_items;
struct mutex mutex;
@@ -321,8 +311,7 @@ struct ion_page_pool {
struct plist_node list;
 };
 
-struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, unsigned int order,
-  bool cached);
+struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, unsigned int order);
 void ion_page_pool_destroy(struct ion_page_pool *pool);
 struct page *ion_page_pool_alloc(struct ion_page_pool *pool);
 void ion_page_pool_free(struct ion_page_pool *pool, struct page *page);
diff --git a/drivers/staging/android/ion/ion_page_pool.c 
b/drivers/staging/android/ion/ion_page_pool.c
index 6d2caf0..db8f614 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -123,8 +123,7 @@ int ion_page_pool_shrink(struct ion_page_pool *pool, gfp_t 
gfp_mask,
return freed;
 }
 
-struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, unsigned int order,
-  bool cached)
+struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, unsigned int order)
 {
struct ion_page_pool *pool = kmalloc(sizeof(*pool), GFP_KERNEL);
 
@@ -138,8 +137,6 @@ struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, 
unsigned int order,
pool->order = order;
mutex_init(&pool->mutex);
plist_node_init(&pool->list, order);
-   if (cached)
-   pool->cached = true;
 
return pool;
 }
diff --git a/drivers/staging/android/ion/ion_system_heap.c 
b/drivers/staging/android/ion/ion_system_heap.c
index bc19cdd..701eb9f 100644
--- a/drivers/staging/android/ion/ion_system_heap.c
+++ b/drivers/staging/android/ion/ion_system_heap.c
@@ -41,31 +41,16 @@ static inline unsigned int order_to_size(int order)
 
 struct ion_system_heap {
struct ion_heap heap;
-   struct ion_page_pool *uncached_pools[NUM_ORDERS];
-   struct ion_page_pool *cached_pools[NUM_ORDERS];
+   struct ion_page_pool *pools[NUM_ORDERS];
 };
 
-/**
- * The page from page-pool are all zeroed before. We need do cache
- * clean for cached buffer. The uncached buffer are always non-cached
- * since it's allocated. So no need for non-cached pages.
- */
 static struct page *alloc_buffer_page(struct ion_system_heap *heap,
  struct ion_buffer *buffer,
  unsigned long order)
 {
-   bool cached = ion_buffer_cached(buffer);
-   struct ion_page_pool *pool;
-   struct page *page;
+   struct ion_page_pool *pool = heap->pools[order_to_index(order)];
 
-   if (!cached)
-   pool = heap->uncached_pools[order_to_index(order)];
-   else
-

Re: [PATCH] crypto: s5p-sss.c: Fix kernel Oops in AES-ECB mode

2018-02-06 Thread Anand Moon
Hi Kamil

On 6 February 2018 at 22:40, Kamil Konieczny
 wrote:
>
> On 06.02.2018 17:48, Anand Moon wrote:
>> Hi Kamil,
>>
>> Thanks for providing the fix to this issue.
>>
>> On 5 February 2018 at 23:10, Kamil Konieczny
>>  wrote:
>>>
>>> In AES-ECB mode crypt is done with key only, so any use of IV
>>> can cause kernel Oops, as reported by Anand Moon.
>>
>> If possible could you avoid the name in commit message.
>
> This is added after '---' line, so it will not appear in commit message.
>

I know about '---' delimiter, but to be precise this will be part of
commit message.

>>> Fixed it by using IV only in AES-CBC and AES-CTR.
>>>
>>> Signed-off-by: Kamil Konieczny 
>>> Reported-by: Anand Moon 
>>
>> [snip]
>>
>> Please add my. Tested on Odroid HC2
>>
>> Tested-by: Anand Moon 
>
> This will add you name in commit message,
> additionally with 'Reported-by:' line.
>
>> Below are the result at my end.
>>
>> aes-cbc-essiv:sha256 (128 bit key)
>> WRITE:
>> 100+0 records in
>> 100+0 records out
>> 838860800 bytes (839 MB, 800 MiB) copied, 11.7225 s, 71.6 MB/s
>> [...]
>
> is it from 'cryptsetup benchmark' ? benchmark did not cause oops.
> Please test with luksFormat, ie. use
>
> cryptsetup luksFormat --debug -q -d /tmp/testkey.key \
>   --cipher aes-cbc-essiv:sha256 -h sha256 -s 128 /tmp/test.bin
>
[snip]

Below is the out put of aes-cbc-essiv:sha256 and aes-ctr-plain

root@odroid:~# fallocate -l 128MiB /tmp/test.bin
root@odroid:~# dd if=/dev/urandom of=/tmp/testkey.key bs=128 count=1
1+0 records in
1+0 records out
128 bytes copied, 0.000231043 s, 554 kB/s
root@odroid:~# sync
root@odroid:~# cryptsetup luksFormat --debug -q -d /tmp/testkey.key \
>   --cipher aes-cbc-essiv:sha256 -h sha256 -s 128 /tmp/test.bin
# cryptsetup 1.6.6 processing "cryptsetup luksFormat --debug -q -d
/tmp/testkey.key --cipher aes-cbc-essiv:sha256 -h sha256 -s 128
/tmp/test.bin"
# Running command luksFormat.
# Locking memory.
# Installing SIGINT/SIGTERM handler.
# Unblocking interruption on signal.
# Allocating crypt device /tmp/test.bin context.
# Trying to open and read device /tmp/test.bin.
# Initialising device-mapper backend library.
# Timeout set to 0 miliseconds.
# Iteration time set to 1000 miliseconds.
# File descriptor passphrase entry requested.
# Formatting device /tmp/test.bin as type LUKS1.
# Crypto backend (gcrypt 1.6.5) initialized.
# Detected kernel Linux 4.15.0-xu4krck armv7l.
# Topology info for /tmp/test.bin not supported, using default offset
1048576 bytes.
# Checking if cipher aes-cbc-essiv:sha256 is usable.
# Using userspace crypto wrapper to access keyslot area.
# Generating LUKS header version 1 using hash sha256, aes,
cbc-essiv:sha256, MK 16 bytes
# KDF pbkdf2, hash sha256: 160824 iterations per second.
# Data offset 2048, UUID fe5c0d54-9add-4454-a4cd-98eed8f2b75c, digest
iterations 19625
# Updating LUKS header of size 1024 on device /tmp/test.bin
# Key length 16, device size 262144 sectors, header size 1029 sectors.
# Reading LUKS header of size 1024 from device /tmp/test.bin
# Key length 16, device size 262144 sectors, header size 1029 sectors.
# Adding new keyslot -1 using volume key.
# Calculating data for key slot 0
# KDF pbkdf2, hash sha256: 161220 iterations per second.
# Key slot 0 use 78720 password iterations.
# Using hash sha256 for AF in key slot 0, 4000 stripes
# Updating key slot 0 [0x1000] area.
# Using userspace crypto wrapper to access keyslot area.
# Key slot 0 was enabled in LUKS header.
# Updating LUKS header of size 1024 on device /tmp/test.bin
# Key length 16, device size 262144 sectors, header size 1029 sectors.
# Reading LUKS header of size 1024 from device /tmp/test.bin
# Key length 16, device size 262144 sectors, header size 1029 sectors.
# Releasing crypt device /tmp/test.bin context.
# Releasing device-mapper backend.
# Unlocking memory.
Command successful.
root@odroid:~#
root@odroid:~#
root@odroid:~# fallocate -l 128MiB /tmp/test.bin
root@odroid:~# dd if=/dev/urandom of=/tmp/testkey.key bs=128 count=1
1+0 records in
1+0 records out
128 bytes copied, 0.000324001 s, 395 kB/s
root@odroid:~# sync
root@odroid:~# cryptsetup luksFormat --debug -q -d /tmp/testkey.key \
>   --cipher aes-ctr-plain -h sha256 -s 128 /tmp/test.bin
# cryptsetup 1.6.6 processing "cryptsetup luksFormat --debug -q -d
/tmp/testkey.key --cipher aes-ctr-plain -h sha256 -s 128
/tmp/test.bin"
# Running command luksFormat.
# Locking memory.
# Installing SIGINT/SIGTERM handler.
# Unblocking interruption on signal.
# Allocating crypt device /tmp/test.bin context.
# Trying to open and read device /tmp/test.bin.
# Initialising device-mapper backend library.
# Timeout set to 0 miliseconds.
# Iteration time set to 1000 miliseconds.
# File descriptor passphrase entry requested.
# Formatting device /tmp/test.bin as type LUKS1.
# Crypto backend (gcrypt 1.6.5) initialized.
# Detected kernel Linux 4.15.0-xu4krck armv7l.
# Topology info for /tmp/test.bin not supported, using default offset
1048576 bytes.
# Checking if cipher aes-ctr-p

Re: [PATCH 12/15] ARM: dts: ipq4019: Add qcom-ipq4019-ap.dk07.1-c2 board file

2018-02-06 Thread Sricharan R
Hi Abhishek,



>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2017, The Linux Foundation. All rights reserved.
>> +
>> +#include "qcom-ipq4019-ap.dk07.1.dtsi"
>> +
>> +/ {
>> +    model = "Qualcomm Technologies, Inc. IPQ40xx/AP-DK07.1-C2";
> 
>  s/IPQ40xx/IPQ4019
> 
 ok

>> +
>> +    soc {
>> +    pcie0: pci@4000 {
>> +    status = "disabled";
>> +    };
> 
>  We can disable in base dtsi itself.
> 

 hmm, as mentioned in the previous patch, feels better to enable
 it only in the board file specifically and not to touch this here
 and the common dtsi.

>> +
>> +    pinctrl@100 {
>> +    serial_1_pins: serial1_pinmux {
>> +    mux {
>> +    pins = "gpio8", "gpio9";
>> +    function = "blsp_uart1";
>> +    bias-disable;
>> +    };
>> +    };
>> +
>> +    spi_0_pins: spi_0_pinmux {
>> +    mux {
>> +    pins = "gpio13", "gpio14",
>> "gpio15";
>> +    function = "blsp_spi0";
>> +    bias-disable;
>> +    };
>> +    cs1 {
>> +    pins = "gpio12";
>> +    function = "gpio";
>> +    };
>> +    host_int1 {
>> +    pins = "gpio10";
>> +    function = "gpio";
>> +    input;
>> +    };
>> +    cs2 {
>> +    pins = "gpio45";
>> +    function = "gpio";
>> +    };
>> +    host_int2 {
>> +    pins = "gpio61";
>> +    function = "gpio";
>> +    input;
>> +    };
>> +    rst {
>> +    pins = "gpio36";
>> +    function = "gpio";
>> +    output-high;
>> +    };
> 
>  Normally spi pins should contains spi protocol related pins
>  could you please explain what is the role of host_pin and rst
>  pins and which driver will use these.
> 

 hmm, the additional pins were required for zigbee connected as the
 spidev device. So the right probably is to have the additional
 pins required for the device populated under the spi's child node.
 
>> +    };
>> +    };
>> +
>> +    serial@78b {
>> +    pinctrl-0 = <&serial_1_pins>;
>> +    pinctrl-names = "default";
>> +    status = "ok";
>> +    };
>> +
>> +    spi_0: spi@78b5000 { /* BLSP1 QUP1 */
>> +    pinctrl-0 = <&spi_0_pins>;
>> +    pinctrl-names = "default";
>> +    status = "ok";
> 
>  From pinmux, it looks like multiple gpio based cs are being
>  used so do we need to specify cs-gpios like dk01-c2.
> 

 ok, let me check.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation


[PATCH] memremap: fix softlockup reports at teardown

2018-02-06 Thread Dan Williams
The cond_resched() currently in the setup path needs to be duplicated in
the teardown path. Rather than require each instance of
for_each_device_pfn() to open code the same sequence, embed it in the
helper.

Link: https://github.com/intel/ixpdimm_sw/issues/11
Cc: "Jérôme Glisse" 
Cc: Michal Hocko 
Cc: Christoph Hellwig 
Cc: 
Fixes: 71389703839e ("mm, zone_device: Replace {get, 
put}_zone_device_page()...")
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4849be5f9b3c..4dd4274cabe2 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -275,8 +275,15 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap)
return (res->start + resource_size(res)) >> PAGE_SHIFT;
 }
 
+static unsigned long pfn_next(unsigned long pfn)
+{
+   if (pfn % 1024 == 0)
+   cond_resched();
+   return pfn + 1;
+}
+
 #define for_each_device_pfn(pfn, map) \
-   for (pfn = pfn_first(map); pfn < pfn_end(map); pfn++)
+   for (pfn = pfn_first(map); pfn < pfn_end(map); pfn = pfn_next(pfn))
 
 static void devm_memremap_pages_release(void *data)
 {
@@ -337,10 +344,10 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
resource_size_t align_start, align_size, align_end;
struct vmem_altmap *altmap = pgmap->altmap_valid ?
&pgmap->altmap : NULL;
+   struct resource *res = &pgmap->res;
unsigned long pfn, pgoff, order;
pgprot_t pgprot = PAGE_KERNEL;
-   int error, nid, is_ram, i = 0;
-   struct resource *res = &pgmap->res;
+   int error, nid, is_ram;
 
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -409,8 +416,6 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
list_del(&page->lru);
page->pgmap = pgmap;
percpu_ref_get(pgmap->ref);
-   if (!(++i % 1024))
-   cond_resched();
}
 
devm_add_action(dev, devm_memremap_pages_release, pgmap);



Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Viresh Kumar
On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
>   cpus {
>   #address-cells = <2>;
>   #size-cells = <0>;
> @@ -26,6 +70,10 @@
>   device_type = "cpu";
>   compatible = "arm,cortex-a53", "arm,armv8";
>   reg = <0x0 0x0>;
> + clocks = <&infracfg CLK_INFRA_MUX1_SEL>,
> +  <&apmixedsys CLK_APMIXED_MAIN_CORE_EN>;
> + clock-names = "cpu", "intermediate";
> + operating-points-v2 = <&cpu_opp_table>;
>   enable-method = "psci";
>   clock-frequency = <13>;
>   };
> @@ -34,6 +82,7 @@
>   device_type = "cpu";
>   compatible = "arm,cortex-a53", "arm,armv8";
>   reg = <0x0 0x1>;
> + operating-points-v2 = <&cpu_opp_table>;
>   enable-method = "psci";
>   clock-frequency = <13>;
>   };

Sorry for not picking this earlier, but you should probably add the same clock
related properties for both cpu nodes here. Things will break if CPU1 is used by
the cpufreq core to bring the cpufreq policy online.

This can happen if cpufreq driver is a module, CPU0 is hotplugged out and then
the cpufreq driver is inserted.

-- 
viresh


[PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 255 +++-
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_poison.c|   6 +
 3 files changed, 232 insertions(+), 36 deletions(-)

Resend Change:
- Expose page_poisoning_enabled to kernel modules

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..5476725 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(&vb->stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, &vb->update_balloon_size_work);
-   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(&vb->stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  &vb->update_balloon_size_work);
+   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, &vb->cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(&vb->stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  &vb->report_free_page_work);
+   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err

Re: Can RCU stall lead to hard lockups?

2018-02-06 Thread Paul E. McKenney
On Tue, Feb 06, 2018 at 08:55:04PM -0600, Serge E. Hallyn wrote:
> On Tue, Feb 06, 2018 at 06:53:37PM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 06, 2018 at 08:33:03PM -0600, Serge E. Hallyn wrote:
> > > On Sat, Feb 03, 2018 at 12:50:32PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Feb 02, 2018 at 05:44:30PM -0600, Serge E. Hallyn wrote:
> > > > > Quoting Paul E. McKenney (paul...@linux.vnet.ibm.com):
> > > > > > On Tue, Jan 09, 2018 at 06:11:14AM -0800, Tejun Heo wrote:
> > > > > > > Hello, Paul.
> > > > > > > 
> > > > > > > On Mon, Jan 08, 2018 at 08:24:25PM -0800, Paul E. McKenney wrote:
> > > > > > > > > I don't know the RCU code at all but it *looks* like the 
> > > > > > > > > first CPU is
> > > > > > > > > taking a sweet while flushing printk buffer while holding a 
> > > > > > > > > lock (the
> > > > > > > > > console is IPMI serial console, which faithfully emulates 
> > > > > > > > > 115200 baud
> > > > > > > > > rate), and everyone else seems stuck waiting for that 
> > > > > > > > > spinlock in
> > > > > > > > > rcu_check_callbacks().
> > > > > > > > > 
> > > > > > > > > Does this sound possible?
> > > > > > > > 
> > > > > > > > 115200 baud?  Ouch!!!  That -will- result in trouble from 
> > > > > > > > console
> > > > > > > > printing, and often also in RCU CPU stall warnings.
> > > > > > > 
> > > > > > > It could even be slower than 115200, and we occassionally see RCU
> > > > > > > stall warnings caused by printk storms, for example, while the 
> > > > > > > kernel
> > > > > > > is trying to dump a lot of info after an OOM.  That's an issue we
> > > > > > > probably want to improve from printk side; however, they don't 
> > > > > > > usually
> > > > > > > lead to NMI hard lockup detector kicking in and crashing the 
> > > > > > > machine,
> > > > > > > which is the peculiarity here.
> > > > > > > 
> > > > > > > Hmmm... show_state_filter(), the function which dumps all task
> > > > > > > backtraces, share a similar problem and it avoids it by explicitly
> > > > > > > calling touch_nmi_watchdog().  Maybe we can do something like the
> > > > > > > following from RCU too?
> > > > > > 
> > > > > > If this fixes things for you, I would welcome such a patch.
> > > > > 
> > > > > Hi - would this also be relevant to 4.9-stable and 4.4-stable, or
> > > > > has something elsewhere changed after 4.9 that actually triggers this?
> > > > 
> > > > As far as I can tell, slow console lines have been prone to RCU CPU 
> > > > stall
> > > > warnings for a very long time.
> > > 
> > > Ok, thanks Paul.
> > > 
> > > Tejun were you going to push this?
> > 
> > I have it queued for the next merge window.  3eea9623926f ("rcu: Call
> > touch_nmi_watchdog() while printing stall warnings") in -rcu.
> 
> D'oh - thanks!

Not a problem at all!  Had I lost this commit, it would not have been the
first time.  ;-)

Thanx, Paul



Re: [PATCH] arm64: Enable SPRD_TIMER

2018-02-06 Thread Baolin Wang
On 7 February 2018 at 10:31, Chunyan Zhang  wrote:
> Hi Baolin,
>
> On 6 February 2018 at 18:36, Baolin Wang  wrote:
>> Enable Spreadtrum timer driver for Spreadtrum plaform, which will be used
>> as tick broadcast device.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>>  arch/arm64/Kconfig.platforms |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
>> index fbedbd8..3e0bbb0 100644
>> --- a/arch/arm64/Kconfig.platforms
>> +++ b/arch/arm64/Kconfig.platforms
>> @@ -224,6 +224,7 @@ config ARCH_TEGRA
>>
>>  config ARCH_SPRD
>> bool "Spreadtrum SoC platform"
>> +   select SPRD_TIMER
>
> Do we have to select SPRD_TIMER here? SC9836, SC9860 have been working
> with a minimum system without SPRD TIMER.

Yes, we need register this timer as the tick broadcast device,
otherwise we will use one hrtimer to be registered the broadcast
device, which will affect the No-Hz of CPU attaching the
broadcast-hrtimer.

-- 
Baolin.wang
Best Regards


Re: [PATCH 01/18] tracing: Add function based events

2018-02-06 Thread Steven Rostedt
On Mon, 5 Feb 2018 10:00:50 -0500
Steven Rostedt  wrote:

> On Mon, 5 Feb 2018 09:24:23 +0100
> Jiri Olsa  wrote:
> 
> 
> > should this be done under 'func_event_mutex' ?  
> 
> Probably.

I think we only need to add the list.

> 
> > 
> > I tried and crashed the system by running 2 scripts with:
> > 
> >   echo 'ip_rcv(u64 skb, u64 dev)' > 
> > /sys/kernel/debug/tracing/function_events
> >   echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' >> 
> > /sys/kernel/debug/tracing/function_events
> >   echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' >> 
> > /sys/kernel/debug/tracing/function_events
> >  
> 

There's no reason that we can't have more than one function  event
attached to the same function. I'm adding this:

diff --git a/kernel/trace/trace_event_ftrace.c 
b/kernel/trace/trace_event_ftrace.c
index b145639eac45..928168fc2025 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -1275,12 +1275,6 @@ static int create_function_event(int argc, char **argv)
if (state != FUNC_STATE_END)
goto fail;
 
-   ret = -EALREADY;
-   list_for_each_entry(fe, &func_events, list) {
-   if (strcmp(fe->func, func_event->func) == 0)
-   goto fail;
-   }
-
ret = ftrace_set_filter(&func_event->ops, func_event->func,
strlen(func_event->func), 0);
if (ret < 0)
@@ -1290,7 +1284,9 @@ static int create_function_event(int argc, char **argv)
if (ret < 0)
goto fail;
 
+   mutex_lock(&func_event_mutex);
list_add_tail(&func_event->list, &func_events);
+   mutex_unlock(&func_event_mutex);
return 0;
  fail:
free_func_event(func_event);

-- Steve


Re: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout

2018-02-06 Thread Ivan Khoronzhuk
On Tue, Feb 06, 2018 at 07:17:06PM -0600, Grygorii Strashko wrote:
> It was discovered that simple program which indefinitely sends 200b UDP
> packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network
> watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog
> timeout is triggered due to race between cpsw_ndo_start_xmit() and
> cpsw_tx_handler() [NAPI]
> 
> cpsw_ndo_start_xmit()
>   if (unlikely(!cpdma_check_free_tx_desc(txch))) {
>   txq = netdev_get_tx_queue(ndev, q_idx);
>   netif_tx_stop_queue(txq);
> 
> ^^ as per [1] barier has to be used after set_bit() otherwise new value
> might not be visible to other cpus
>   }
> 
> cpsw_tx_handler()
>   if (unlikely(netif_tx_queue_stopped(txq)))
>   netif_tx_wake_queue(txq);
> 
> and when it happens ndev TX queue became disabled forever while driver's HW
> TX queue is empty.
I'm sure it fixes test case somehow but there is some strangeness.
(I've thought about this some X months ago):
1. If no free desc, then there is bunch of descs on the queue ready to be sent
2. If one of this desc while this process was missed then next will wake queue,
because there is bunch of them on the fly. So, if desc on top of the sent queue
missed to enable the queue, then next one more likely will enable it anyway..
then how it could happen? The described race is possible only on last
descriptor, yes, packets are small the speed is hight, possibility is very small
.but then next situation is also possible:
- packets are sent fast
- all packets were sent, but no any descriptors are freed now by sw interrupt 
(NAPI)
- when interrupt had started NAPI, the queue was enabled, all other next 
interrupts are throttled once NAPI not finished it's work yet.
- when new packet submitted, no free descs are present yet (NAPI has not freed
any yet), but all packets are sent, so no one can awake tx queue, as interrupt 
will not arise when NAPI is started to free first descriptor interrupts are 
disabled.because h/w queue to be sent is empty...
- how it can happen as submitting packet and handling packet operations is 
under 
channel lock? Not exactly, a period between handling and freeing the descriptor
to the pool is not under channel lock, here:

spin_unlock_irqrestore(&chan->lock, flags);
if (unlikely(status & CPDMA_DESC_TD_COMPLETE))
cb_status = -ENOSYS;
else
cb_status = status;

__cpdma_chan_free(chan, desc, outlen, cb_status);
return status;

unlock_ret:
spin_unlock_irqrestore(&chan->lock, flags);
return status;

And:
__cpdma_chan_free(chan, desc, outlen, cb_status);
-> cpdma_desc_free(pool, desc, 1);

As result, queue deadlock as you've described.
Just thought, not checked, but theoretically possible.
What do you think?

> 
> Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue()
> calls and double check for free TX descriptors after stopping ndev TX queue
> - if there are free TX descriptors wake up ndev TX queue.
> 
> [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html
> Signed-off-by: Grygorii Strashko 
> ---
>  drivers/net/ethernet/ti/cpsw.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 10d7cbe..3805b13 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -1638,6 +1638,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff 
> *skb,
>   q_idx = q_idx % cpsw->tx_ch_num;
>  
>   txch = cpsw->txv[q_idx].ch;
> + txq = netdev_get_tx_queue(ndev, q_idx);
>   ret = cpsw_tx_packet_submit(priv, skb, txch);
>   if (unlikely(ret != 0)) {
>   cpsw_err(priv, tx_err, "desc submit failed\n");
> @@ -1648,15 +1649,26 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff 
> *skb,
>* tell the kernel to stop sending us tx frames.
>*/
>   if (unlikely(!cpdma_check_free_tx_desc(txch))) {
> - txq = netdev_get_tx_queue(ndev, q_idx);
>   netif_tx_stop_queue(txq);
> +
> + /* Barrier, so that stop_queue visible to other cpus */
> + smp_mb__after_atomic();
> +
> + if (cpdma_check_free_tx_desc(txch))
> + netif_tx_wake_queue(txq);
>   }
>  
>   return NETDEV_TX_OK;
>  fail:
>   ndev->stats.tx_dropped++;
> - txq = netdev_get_tx_queue(ndev, skb_get_queue_mapping(skb));
>   netif_tx_stop_queue(txq);
> +
> + /* Barrier, so that stop_queue visible to other cpus */
> + smp_mb__after_atomic();
> +
> + if (cpdma_check_free_tx_desc(txch))
> + netif_tx_wake_queue(txq);
> +
>   return NETDEV_TX_BUSY;
>  }
>  
> -- 
> 2.10.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majord...@vger.kernel.org
> More majordomo 

Re: [PATCH] atm: idt77252: Replace mdelay with usleep_range in idt77252_preset

2018-02-06 Thread Maciej W. Rozycki
On Fri, 26 Jan 2018, Jia-Ju Bai wrote:

> diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
> index 0277f36..cea4bf2 100644
> --- a/drivers/atm/idt77252.c
> +++ b/drivers/atm/idt77252.c
> @@ -3563,7 +3563,7 @@ static int idt77252_preset(struct idt77252_dev *card)
>  
>   /* Software reset */
>   writel(SAR_CFG_SWRST, SAR_REG_CFG);
> - mdelay(1);
> + usleep_range(500, 1000);
>   writel(0, SAR_REG_CFG);
>  
>   IPRINTK("%s: Software resetted.\n", card->name);

 This is only called from the driver's ->probe method, so it looks to me 
indeed safe to sleep here.  A similar, more extensive clean-up seems due 
for 77252 older brother's driver nicstar.c.

 Out of curiosity I have looked up the SAR manual and it requires the 
SWRST bit to be asserted for at least 2 PCI clock cycles for the reset to 
be valid, so having the lower bound of .5ms still looks completely safe if 
not an overkill to me for real world applications where PCI is driven in 
the MHz clock range.

Reviewed-by: Maciej W. Rozycki 

  Maciej


[GIT PULL] gcc-plugins updates for v4.16-rc1

2018-02-06 Thread Kees Cook
Hi Linus,

Please pull these gcc-plugins changes for v4.16-rc1. This is a small
set of changes entirely in support of the coming gcc 8 release.

Thanks!

-Kees

The following changes since commit d8a5b80568a9cb66810e75b182018e9edb68e8ff:

  Linux 4.15 (2018-01-28 13:20:33 -0800)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/gcc-plugins-v4.16-rc1

for you to fetch changes up to b86729109c5fd0a480300f40608aac68764b5adf:

  gcc-plugins: Use dynamic initializers (2018-02-05 17:27:46 -0800)


- update includes for gcc 8 (Valdis Kletnieks)
- update initializers for gcc 8


Kees Cook (1):
  gcc-plugins: Use dynamic initializers

Valdis Kletnieks (1):
  gcc-plugins: Add include required by GCC release 8

 scripts/gcc-plugins/gcc-common.h  |  4 ++
 scripts/gcc-plugins/latent_entropy_plugin.c   | 17 ++
 scripts/gcc-plugins/randomize_layout_plugin.c | 75 ---
 scripts/gcc-plugins/structleak_plugin.c   | 19 +++
 4 files changed, 37 insertions(+), 78 deletions(-)

-- 
Kees Cook
Pixel Security


Re: linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Michael S. Tsirkin
On Wed, Feb 07, 2018 at 01:54:41PM +1100, Stephen Rothwell wrote:
> Hi Michael,
> 
> On Wed, 7 Feb 2018 13:04:23 +1100 Stephen Rothwell  
> wrote:
> >
> > After merging the vhost tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> 
> ERROR: "page_poisoning_enabled" [drivers/virtio/virtio_balloon.ko] undefined!
> 
> > Caused by commit
> > 
> >   96bcd04462b9 ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> > 
> > I have used the vhost tree from next-20180206 for today.

That's
commit d25cc43c6775bff6b8e3dad97c747954b805e421
vhost: don't hold onto file pointer for VHOST_SET_LOG_FD

Right?
Sounds good, and I reverted by tree to the same hash.

> -- 
> Cheers,
> Stephen Rothwell


Re: Can RCU stall lead to hard lockups?

2018-02-06 Thread Serge E. Hallyn
On Tue, Feb 06, 2018 at 06:53:37PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:33:03PM -0600, Serge E. Hallyn wrote:
> > On Sat, Feb 03, 2018 at 12:50:32PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 02, 2018 at 05:44:30PM -0600, Serge E. Hallyn wrote:
> > > > Quoting Paul E. McKenney (paul...@linux.vnet.ibm.com):
> > > > > On Tue, Jan 09, 2018 at 06:11:14AM -0800, Tejun Heo wrote:
> > > > > > Hello, Paul.
> > > > > > 
> > > > > > On Mon, Jan 08, 2018 at 08:24:25PM -0800, Paul E. McKenney wrote:
> > > > > > > > I don't know the RCU code at all but it *looks* like the first 
> > > > > > > > CPU is
> > > > > > > > taking a sweet while flushing printk buffer while holding a 
> > > > > > > > lock (the
> > > > > > > > console is IPMI serial console, which faithfully emulates 
> > > > > > > > 115200 baud
> > > > > > > > rate), and everyone else seems stuck waiting for that spinlock 
> > > > > > > > in
> > > > > > > > rcu_check_callbacks().
> > > > > > > > 
> > > > > > > > Does this sound possible?
> > > > > > > 
> > > > > > > 115200 baud?  Ouch!!!  That -will- result in trouble from console
> > > > > > > printing, and often also in RCU CPU stall warnings.
> > > > > > 
> > > > > > It could even be slower than 115200, and we occassionally see RCU
> > > > > > stall warnings caused by printk storms, for example, while the 
> > > > > > kernel
> > > > > > is trying to dump a lot of info after an OOM.  That's an issue we
> > > > > > probably want to improve from printk side; however, they don't 
> > > > > > usually
> > > > > > lead to NMI hard lockup detector kicking in and crashing the 
> > > > > > machine,
> > > > > > which is the peculiarity here.
> > > > > > 
> > > > > > Hmmm... show_state_filter(), the function which dumps all task
> > > > > > backtraces, share a similar problem and it avoids it by explicitly
> > > > > > calling touch_nmi_watchdog().  Maybe we can do something like the
> > > > > > following from RCU too?
> > > > > 
> > > > > If this fixes things for you, I would welcome such a patch.
> > > > 
> > > > Hi - would this also be relevant to 4.9-stable and 4.4-stable, or
> > > > has something elsewhere changed after 4.9 that actually triggers this?
> > > 
> > > As far as I can tell, slow console lines have been prone to RCU CPU stall
> > > warnings for a very long time.
> > 
> > Ok, thanks Paul.
> > 
> > Tejun were you going to push this?
> 
> I have it queued for the next merge window.  3eea9623926f ("rcu: Call
> touch_nmi_watchdog() while printing stall warnings") in -rcu.

D'oh - thanks!

-serge


Re: linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Stephen Rothwell
Hi Michael,

On Wed, 7 Feb 2018 13:04:23 +1100 Stephen Rothwell  
wrote:
>
> After merging the vhost tree, today's linux-next build (x86_64
> allmodconfig) failed like this:

ERROR: "page_poisoning_enabled" [drivers/virtio/virtio_balloon.ko] undefined!

> Caused by commit
> 
>   96bcd04462b9 ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> 
> I have used the vhost tree from next-20180206 for today.

-- 
Cheers,
Stephen Rothwell


Re: Can RCU stall lead to hard lockups?

2018-02-06 Thread Paul E. McKenney
On Tue, Feb 06, 2018 at 08:33:03PM -0600, Serge E. Hallyn wrote:
> On Sat, Feb 03, 2018 at 12:50:32PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 02, 2018 at 05:44:30PM -0600, Serge E. Hallyn wrote:
> > > Quoting Paul E. McKenney (paul...@linux.vnet.ibm.com):
> > > > On Tue, Jan 09, 2018 at 06:11:14AM -0800, Tejun Heo wrote:
> > > > > Hello, Paul.
> > > > > 
> > > > > On Mon, Jan 08, 2018 at 08:24:25PM -0800, Paul E. McKenney wrote:
> > > > > > > I don't know the RCU code at all but it *looks* like the first 
> > > > > > > CPU is
> > > > > > > taking a sweet while flushing printk buffer while holding a lock 
> > > > > > > (the
> > > > > > > console is IPMI serial console, which faithfully emulates 115200 
> > > > > > > baud
> > > > > > > rate), and everyone else seems stuck waiting for that spinlock in
> > > > > > > rcu_check_callbacks().
> > > > > > > 
> > > > > > > Does this sound possible?
> > > > > > 
> > > > > > 115200 baud?  Ouch!!!  That -will- result in trouble from console
> > > > > > printing, and often also in RCU CPU stall warnings.
> > > > > 
> > > > > It could even be slower than 115200, and we occassionally see RCU
> > > > > stall warnings caused by printk storms, for example, while the kernel
> > > > > is trying to dump a lot of info after an OOM.  That's an issue we
> > > > > probably want to improve from printk side; however, they don't usually
> > > > > lead to NMI hard lockup detector kicking in and crashing the machine,
> > > > > which is the peculiarity here.
> > > > > 
> > > > > Hmmm... show_state_filter(), the function which dumps all task
> > > > > backtraces, share a similar problem and it avoids it by explicitly
> > > > > calling touch_nmi_watchdog().  Maybe we can do something like the
> > > > > following from RCU too?
> > > > 
> > > > If this fixes things for you, I would welcome such a patch.
> > > 
> > > Hi - would this also be relevant to 4.9-stable and 4.4-stable, or
> > > has something elsewhere changed after 4.9 that actually triggers this?
> > 
> > As far as I can tell, slow console lines have been prone to RCU CPU stall
> > warnings for a very long time.
> 
> Ok, thanks Paul.
> 
> Tejun were you going to push this?

I have it queued for the next merge window.  3eea9623926f ("rcu: Call
touch_nmi_watchdog() while printing stall warnings") in -rcu.

Thanx, Paul

> > > thanks,
> > > -serge
> > > 
> > > > Thanx, Paul
> > > > 
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index db85ca3..3c4c4d3 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -561,8 +561,14 @@ static void 
> > > > > rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
> > > > >   }
> > > > >   t = list_entry(rnp->gp_tasks->prev,
> > > > >  struct task_struct, rcu_node_entry);
> > > > > - list_for_each_entry_continue(t, &rnp->blkd_tasks, 
> > > > > rcu_node_entry)
> > > > > + list_for_each_entry_continue(t, &rnp->blkd_tasks, 
> > > > > rcu_node_entry) {
> > > > > + touch_nmi_watchdog();
> > > > > + /*
> > > > > +  * We could be printing a lot of these messages while
> > > > > +  * holding a spinlock.  Avoid triggering hard lockup.
> > > > > +  */
> > > > >   sched_show_task(t);
> > > > > + }
> > > > >   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > >  }
> > > > > 
> > > > > @@ -1678,6 +1684,12 @@ static void print_cpu_stall_info(struct 
> > > > > rcu_state *rsp, int cpu)
> > > > >   char *ticks_title;
> > > > >   unsigned long ticks_value;
> > > > > 
> > > > > + /*
> > > > > +  * We could be printing a lot of these messages while holding a
> > > > > +  * spinlock.  Avoid triggering hard lockup.
> > > > > +  */
> > > > > + touch_nmi_watchdog();
> > > > > +
> > > > >   if (rsp->gpnum == rdp->gpnum) {
> > > > >   ticks_title = "ticks this GP";
> > > > >   ticks_value = rdp->ticks_this_gp;
> > > > > 
> > > 
> 



Re: linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Michael S. Tsirkin
On Wed, Feb 07, 2018 at 01:04:23PM +1100, Stephen Rothwell wrote:
> Hi Michael,
> 
> After merging the vhost tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> 
> Caused by commit
> 
>   96bcd04462b9 ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> 
> I have used the vhost tree from next-20180206 for today.
> 
> -- 
> Cheers,
> Stephen Rothwell

Thanks, I'll revert to that too.

-- 
MST


Re: Can RCU stall lead to hard lockups?

2018-02-06 Thread Serge E. Hallyn
On Sat, Feb 03, 2018 at 12:50:32PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 02, 2018 at 05:44:30PM -0600, Serge E. Hallyn wrote:
> > Quoting Paul E. McKenney (paul...@linux.vnet.ibm.com):
> > > On Tue, Jan 09, 2018 at 06:11:14AM -0800, Tejun Heo wrote:
> > > > Hello, Paul.
> > > > 
> > > > On Mon, Jan 08, 2018 at 08:24:25PM -0800, Paul E. McKenney wrote:
> > > > > > I don't know the RCU code at all but it *looks* like the first CPU 
> > > > > > is
> > > > > > taking a sweet while flushing printk buffer while holding a lock 
> > > > > > (the
> > > > > > console is IPMI serial console, which faithfully emulates 115200 
> > > > > > baud
> > > > > > rate), and everyone else seems stuck waiting for that spinlock in
> > > > > > rcu_check_callbacks().
> > > > > > 
> > > > > > Does this sound possible?
> > > > > 
> > > > > 115200 baud?  Ouch!!!  That -will- result in trouble from console
> > > > > printing, and often also in RCU CPU stall warnings.
> > > > 
> > > > It could even be slower than 115200, and we occassionally see RCU
> > > > stall warnings caused by printk storms, for example, while the kernel
> > > > is trying to dump a lot of info after an OOM.  That's an issue we
> > > > probably want to improve from printk side; however, they don't usually
> > > > lead to NMI hard lockup detector kicking in and crashing the machine,
> > > > which is the peculiarity here.
> > > > 
> > > > Hmmm... show_state_filter(), the function which dumps all task
> > > > backtraces, share a similar problem and it avoids it by explicitly
> > > > calling touch_nmi_watchdog().  Maybe we can do something like the
> > > > following from RCU too?
> > > 
> > > If this fixes things for you, I would welcome such a patch.
> > 
> > Hi - would this also be relevant to 4.9-stable and 4.4-stable, or
> > has something elsewhere changed after 4.9 that actually triggers this?
> 
> As far as I can tell, slow console lines have been prone to RCU CPU stall
> warnings for a very long time.
> 
>   Thanx, Paul

Ok, thanks Paul.

Tejun were you going to push this?

> > thanks,
> > -serge
> > 
> > >   Thanx, Paul
> > > 
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index db85ca3..3c4c4d3 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -561,8 +561,14 @@ static void rcu_print_detail_task_stall_rnp(struct 
> > > > rcu_node *rnp)
> > > > }
> > > > t = list_entry(rnp->gp_tasks->prev,
> > > >struct task_struct, rcu_node_entry);
> > > > -   list_for_each_entry_continue(t, &rnp->blkd_tasks, 
> > > > rcu_node_entry)
> > > > +   list_for_each_entry_continue(t, &rnp->blkd_tasks, 
> > > > rcu_node_entry) {
> > > > +   touch_nmi_watchdog();
> > > > +   /*
> > > > +* We could be printing a lot of these messages while
> > > > +* holding a spinlock.  Avoid triggering hard lockup.
> > > > +*/
> > > > sched_show_task(t);
> > > > +   }
> > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  }
> > > > 
> > > > @@ -1678,6 +1684,12 @@ static void print_cpu_stall_info(struct 
> > > > rcu_state *rsp, int cpu)
> > > > char *ticks_title;
> > > > unsigned long ticks_value;
> > > > 
> > > > +   /*
> > > > +* We could be printing a lot of these messages while holding a
> > > > +* spinlock.  Avoid triggering hard lockup.
> > > > +*/
> > > > +   touch_nmi_watchdog();
> > > > +
> > > > if (rsp->gpnum == rdp->gpnum) {
> > > > ticks_title = "ticks this GP";
> > > > ticks_value = rdp->ticks_this_gp;
> > > > 
> > 


Re: [PATCH] arm64: Enable SPRD_TIMER

2018-02-06 Thread Chunyan Zhang
Hi Baolin,

On 6 February 2018 at 18:36, Baolin Wang  wrote:
> Enable Spreadtrum timer driver for Spreadtrum plaform, which will be used
> as tick broadcast device.
>
> Signed-off-by: Baolin Wang 
> ---
>  arch/arm64/Kconfig.platforms |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
> index fbedbd8..3e0bbb0 100644
> --- a/arch/arm64/Kconfig.platforms
> +++ b/arch/arm64/Kconfig.platforms
> @@ -224,6 +224,7 @@ config ARCH_TEGRA
>
>  config ARCH_SPRD
> bool "Spreadtrum SoC platform"
> +   select SPRD_TIMER

Do we have to select SPRD_TIMER here? SC9836, SC9860 have been working
with a minimum system without SPRD TIMER.

Thanks,
Chunyan

> help
>   Support for Spreadtrum ARM based SoCs
>
> --
> 1.7.9.5
>


[PATCH v2] x86/nospec: Fixup array_index_nospec_mask() asm constraint

2018-02-06 Thread Dan Williams
Allow the compiler to handle @size as an immediate value or memory
directly rather than allocating a register.

Reported-by: Linus Torvalds 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Thomas Gleixner 
Signed-off-by: Dan Williams 
---
v2: use the 'g' constraint since CMP handles memory targets (Linus)

 arch/x86/include/asm/barrier.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 30d406146016..e1259f043ae9 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -40,7 +40,7 @@ static inline unsigned long array_index_mask_nospec(unsigned 
long index,
 
asm ("cmp %1,%2; sbb %0,%0;"
:"=r" (mask)
-   :"r"(size),"r" (index)
+   :"g"(size),"r" (index)
:"cc");
return mask;
 }



Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

2018-02-06 Thread Huang, Ying
Minchan Kim  writes:

> On Tue, Feb 06, 2018 at 09:34:44PM +0800, huang ying wrote:
>> On Tue, Feb 6, 2018 at 5:02 PM, Minchan Kim  wrote:
>> > On Tue, Feb 06, 2018 at 04:39:18PM +0800, Huang, Ying wrote:
>> >> Hi, Minchan,
>> >>
>> >> Minchan Kim  writes:
>> >>
>> >> > Hi Huang,
>> >> >
>> >> > On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
>> >> >> From: Huang Ying 
>> >> >>
>> >> >> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> >> >> Page) and frontswap (via zswap) are both enabled, when memory goes low
>> >> >> so that swap is triggered, segfault and memory corruption will occur
>> >> >> in random user space applications as follow,
>> >> >>
>> >> >> kernel: urxvt[338]: segfault at 20 ip 7fc08889ae0d sp 
>> >> >> 7ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>> >> >>  #0  0x7fc08889ae0d _int_malloc (libc.so.6)
>> >> >>  #1  0x7fc08889c2f3 malloc (libc.so.6)
>> >> >>  #2  0x560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>> >> >>  #3  0x560e6005e75c n/a (urxvt)
>> >> >>  #4  0x560e6007d9f1 
>> >> >> _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>> >> >>  #5  0x560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>> >> >>  #6  0x560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>> >> >>  #7  0x560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>> >> >>  #8  0x560e6005cb55 ev_run (urxvt)
>> >> >>  #9  0x560e6003b9b9 main (urxvt)
>> >> >>  #10 0x7fc08883af4a __libc_start_main (libc.so.6)
>> >> >>  #11 0x560e6003f9da _start (urxvt)
>> >> >>
>> >> >> After bisection, it was found the first bad commit is
>> >> >> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> >> >> out").
>> >> >>
>> >> >> The root cause is as follow.
>> >> >>
>> >> >> When the pages are written to storage device during swapping out in
>> >> >> swap_writepage(), zswap (fontswap) is tried to compress the pages
>> >> >> instead to improve the performance.  But zswap (frontswap) will treat
>> >> >> THP as normal page, so only the head page is saved.  After swapping
>> >> >> in, tail pages will not be restored to its original contents, so cause
>> >> >> the memory corruption in the applications.
>> >> >>
>> >> >> This is fixed via splitting THP at the begin of swapping out if
>> >> >> frontswap is enabled.  To avoid frontswap to be enabled at runtime,
>> >> >> whether the page is THP is checked before using frontswap during
>> >> >> swapping out too.
>> >> >
>> >> > Nice catch, Huang. However, before the adding a new dependency between
>> >> > frontswap and vmscan that I want to avoid if it is possible, let's think
>> >> > whether frontswap can support THP page or not.
>> >> > Can't we handle it with some loop to handle all of subpages of THP page?
>> >> > It might be not hard?
>> >>
>> >> Yes.  That could be an optimization over this patch.  This patch is just
>> >> a simple fix to make things work and be suitable for stable tree.
>> >
>> > Yub, it would be more complex than this patch. However, this patch 
>> > introduces
>> > a new dependency to vmscan.c. IOW, we have been good without knowing 
>> > frontswap
>> > in vmscan.c but from now on, we should be aware of that, which is 
>> > unfortunate.
>> >
>> > Can't we simple do like that if you want to make it simple and rely on 
>> > someone
>> > who makes frontswap THP-aware later?
>> >
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 42fe5653814a..4bf1725407aa 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, 
>> > swp_entry_t swp_entries[])
>> >
>> > /* Only single cluster request supported */
>> > WARN_ON_ONCE(n_goal > 1 && cluster);
>> > +#ifdef CONFIG_FRONTSWAP
>> > +   /* Now, frontswap doesn't support THP page */
>> > +   if (frontswap_enabled() && cluster)
>> > +   return;
>> > +#endif
>> > avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
>> > if (avail_pgs <= 0)
>> > goto noswap;
>> >
>> 
>> This can avoid introduce dependency on frontswap in vmscan.c.  But
>> IMHO it doesn't look like the right place to place the logic.
>> vmscan.c is the place we put policy to determine whether to split THP.
>
> It adds split policy in vmscan.c like you said.
>
> shrink_page_list already relies on swap_file.c to decide split a THP page.
> IOW, if a THP swap stuff is not avilable, split a thp.
> It's totally same logic. I don't see any difference at all.
>
> shrink_page_list:
>
> if (!add_to_swap(page)) {
>   if (PageTransHuge(page))
>   goto activate_locked;
>   if (split_huge_page_to_list(page, page_list))
>   goto activate_locked;
>   count_vm_event(THP_SWPOUT_FALLBACK);
>   if (!add_to_swap(page))
>   goto activate_locked;
> }

OK.  I will change the code as you suggested.

Best Regards,
Huang, Ying


Re: [GIT PULL REQUEST] watchdog - v4.16 merge window

2018-02-06 Thread Guenter Roeck

On 02/05/2018 10:42 AM, Guenter Roeck wrote:

On Mon, Feb 05, 2018 at 09:50:54AM -0800, Linus Torvalds wrote:

On Mon, Feb 5, 2018 at 2:20 AM, Wim Van Sebroeck  wrote:


   git://www.linux-watchdog.org/linux-watchdog.git


Hmm. I really want to know why I should pull this. You have the
shortlog and the diffstat, but you don't actually have a descriptive
blurb about _what_ I'm pulling.

Please give me a summary of what this contains so that I can have a
good merge message.


How about the following ? It matches Wim's tree, which I pulled and
compared against my own copy. Only difference is the signed tag
with a brief summary on top.



Ok, for my part I am out of ideas. Any suggestion how to proceed, anyone,
let me know.

Guenter


Thanks,
Guenter

---
Subject: [GIT PULL] watchdog updates for v4.16

Hi Linus,

Please pull watchdog updates for Linux v4.16 from signed tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
watchdog-for-linus-v4.16

Thanks,
Guenter
--

The following changes since commit f3b5ad89de16f5d42e8ad36fbdf85f705c1ae051:

   Merge tag 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma (2017-12-16 13:43:08 
-0800)

are available in the git repository at:

   git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
tags/watchdog-for-linus-v4.16

for you to fetch changes up to 592a547adf686dbec7687e816ecdf9101fe227f5:

   documentation: watchdog: remove documentation of w83697hf_wdt/w83697ug_wdt 
(2018-02-03 11:09:54 +0100)


watchdog updates for v4.16 commit window

- Converted to use watchdog subsystem: sp5100_tco, i6300esb, xen_wdt
- New drivers: Spreadtrum, Realtek RTD1295
- Added r8a77970 support to renesas-wdt
- Added jz4780 support to jz4740
- Removed at32ap700x_wdt driver
- Removed obsolete documentation
- Various bug fixes in watchdog core
- Cleanup and bug fixes in several drivers
- Added Guenter Roeck as co-maintainer


Andreas Färber (2):
   dt-bindings: watchdog: Add Realtek RTD1295
   watchdog: Add Realtek RTD1295

Andrew Jeffery (4):
   watchdog: aspeed: Retain watchdog enabled state
   watchdog: aspeed: Fix 'Apseed' typo in Kconfig
   watchdog: aspeed: Remove specific reference to AST2400 in Kconfig
   watchdog: aspeed: Move init to arch_initcall

André Draszik (2):
   watchdog: mt7621: set WDOG_HW_RUNNING bit when appropriate
   watchdog: mt7621: switch to using managed devm_watchdog_register_device()

Arnd Bergmann (2):
   watchdog: xen: use time64_t for timeouts
   watchdog: hpwdt: fix unused variable warning

Benjamin Gaignard (1):
   watchdog: stm32: Fix copyright

Chris Packham (1):
   watchdog: orion: fix typo

Christophe Leroy (3):
   watchdog: mpc8xxx: use the core worker function
   watchdog: core: make sure the watchdog worker always works
   watchdog: core: make sure the watchdog_worker is not deferred

Corentin Labbe (6):
   watchdog: sunxi_wdt: use of_device_get_match_data
   watchdog: document watchdog_init_timeout() wdd parameter
   watchdog: remove at32ap700x_wdt
   documentation: watchdog: remove documentation of at32ap700x_wdt
   documentation: watchdog: remove documentation for ixp2000
   documentation: watchdog: remove documentation of 
w83697hf_wdt/w83697ug_wdt

David Lechner (1):
   watchdog: davinci_wdt: add restart function

Eric Long (2):
   dt-bindings: watchdog: Add Spreadtrum watchdog documentation
   watchdog: Add Spreadtrum watchdog driver

Geert Uytterhoeven (1):
   dt-bindings: watchdog: renesas-wdt: Add support for the r8a77970 wdt

Greg Kroah-Hartman (1):
   watchdog: pcwd_usb: remove unneeded DRIVER_LICENSE #define

Guenter Roeck (15):
   watchdog: Fix potential kref imbalance when opening watchdog
   watchdog: Fix kref imbalance seen if handle_boot_enabled=0
   MAINTAINERS: Add Guenter Roeck as co-maintainer of watchdog subsystem
   watchdog: sp5100_tco: Always use SP5100_IO_PM_{INDEX_REG,DATA_REG}
   watchdog: sp5100_tco: Fix watchdog disable bit
   watchdog: sp5100_tco: Use request_muxed_region where possible
   watchdog: sp5100_tco: Use standard error codes
   watchdog: sp5100_tco: Clean up sp5100_tco_setupdevice
   watchdog: sp5100_tco: Match PCI device early
   watchdog: sp5100_tco: Use dev_ print functions where possible
   watchdog: sp5100_tco: Clean up function and variable names
   watchdog: sp5100_tco: Convert to use watchdog subsystem
   watchdog: sp5100_tco: Use bit operations
   watchdog: sp5100-tco: Abort if watchdog is disabled by hardware
   watchdog: sp5100_tco: Add support for recent FCH versions

Gustavo A. R. Silva (9):
   watchdog: advantechwdt: mark expected switch fall-through
   watchdog: alim1535_wdt: mark expected switch fall-through
   watchdog: 

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Paul E. McKenney
On Tue, Feb 06, 2018 at 01:19:29PM +0300, Kirill Tkhai wrote:
> Recent times kvmalloc() begun widely be used in kernel.
> Some of such memory allocations have to be freed after
> rcu grace period, and this patchset introduces a generic
> primitive for doing this.
> 
> Actually, everything is made in [1/2]. Patch [2/2] is just
> added to make new kvfree_rcu() have the first user.
> 
> The patch [1/2] transforms kfree_rcu(), its sub definitions
> and its sub functions into kvfree_rcu() form. The most
> significant change is in __rcu_reclaim(), where kvfree()
> is used instead of kfree(). Since kvfree() is able to
> have a deal with memory allocated via kmalloc(), vmalloc()
> and kvmalloc(); kfree_rcu() and vfree_rcu() may simply
> be defined through this new kvfree_rcu().

Interesting.

So it is OK to kvmalloc() something and pass it to either kfree() or
kvfree(), and it had better be OK to kvmalloc() something and pass it
to kvfree().

Is it OK to kmalloc() something and pass it to kvfree()?

If so, is it really useful to have two different names here, that is,
both kfree_rcu() and kvfree_rcu()?

Also adding Jesper and Rao on CC for their awareness.

Thanx, Paul

> ---
> 
> Kirill Tkhai (2):
>   rcu: Transform kfree_rcu() into kvfree_rcu()
>   mm: Use kvfree_rcu() in update_memcg_params()
> 
> 
>  include/linux/rcupdate.h   |   31 +--
>  include/linux/rcutiny.h|4 ++--
>  include/linux/rcutree.h|2 +-
>  include/trace/events/rcu.h |   12 ++--
>  kernel/rcu/rcu.h   |8 
>  kernel/rcu/tree.c  |   14 +++---
>  kernel/rcu/tree_plugin.h   |   10 +-
>  mm/slab_common.c   |   10 +-
>  8 files changed, 43 insertions(+), 48 deletions(-)
> 
> --
> Signed-off-by: Kirill Tkhai 
> 



Re: [PATCHv2 1/1] ext4: don't put symlink in pagecache into highmem

2018-02-06 Thread Theodore Ts'o
On Tue, Feb 06, 2018 at 03:38:09PM -0800, Eric Biggers wrote:
> I don't think backporting this change for other filesystems is particularly
> important, since if I understand correctly, the reasons that Al made the 
> change
> originally were:
> 
> - to allow following symlinks in RCU mode, but that's not implemented in old
>   kernels

Yup.

> - to prevent a process from using up all kmaps and deadlocking the system, 
> which
>   I'm not sure is a real problem (someone would need to try to put together a
>   reproducer), but if so it would probably just be a local device of service.

.. and *that's* only a problem on 32-bit systems.  And aside from
Android, it's unclear to me how much we need to support 32-bit systems
on upstream LTS kernels.  I suppose there might be Rasperry PI's which
are 32-bits and which might want to use btrfs.  Personally I'm not
sure we should care all that much, but others who care more about LTS
kernels and 32-bit systems might have a different opinion.

> Also if we actually backported the full commit there are follow-on fixes such 
> as
> e8ecde25f5e that would be needed as well but might be missed.

Good point.

- Ted


Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-06 Thread jianchao.wang
Hi Keith

Sorry for bothering you again.

On 02/07/2018 10:03 AM, jianchao.wang wrote:
> Hi Keith
> 
> Thanks for your time and kindly response on this.
> 
> On 02/06/2018 11:13 PM, Keith Busch wrote:
>> On Tue, Feb 06, 2018 at 09:46:36AM +0800, jianchao.wang wrote:
>>> Hi Keith
>>>
>>> Thanks for your kindly response.
>>>
>>> On 02/05/2018 11:13 PM, Keith Busch wrote:
  but how many requests are you letting enter to their demise by
 freezing on the wrong side of the reset?
>>>
>>> There are only two difference with this patch from the original one.
>>> 1. Don't freeze the queue for the reset case. At the moment, the 
>>> outstanding requests will be requeued back to blk-mq queues.
>>>The new entered requests during reset will also stay in blk-mq queues. 
>>> All this requests will not enter into nvme driver layer
>>>due to quiescent request_queues. And they will be issued after the reset 
>>> is completed successfully.
>>> 2. Drain the request queue before nvme_dev_disable. This is nearly same 
>>> with the previous rule which will also unquiesce the queue
>>>and let the requests be able to be drained. The only difference is this 
>>> patch will invoke wait_freeze in nvme_dev_disable instead
>>>of nvme_reset_work.
>>>
>>> We don't sacrifice any request. This patch do the same thing with the 
>>> previous one and make things clearer.
>>
>> No, what you're proposing is quite different.

What's the difference ? Can you please point out.
I have shared my understanding below.
But actually, I don't get the point what's the difference you said.
Or what you refer to is about the 4th patch ? If yes, I also explain this below.

Really appreciate your precious time to explain this. :)
Many thanks
Jianchao

>>
>> By "enter", I'm referring to blk_queue_enter. 
> When a request is allocated, it will hold a request_queue->q_usage_counter 
> until it is freed.
> Please refer to 
> blk_mq_get_request -> blk_queue_enter_live
> blk_mq_free_request -> blk_exit_queue
> 
> Regarding to 'request enters into an hctx', I cannot get the point.
> I think you should mean it enter into nvme driver layer.
> 
>> Once a request enters
>> into an hctx, it can not be backed out to re-enter a new hctx if the
>> original one is invalidated.
> 
> I also cannot get the point here. We certainly will not issue a request which
> has been issued to other hctx.
> What this patch and also the original one does is that disable/shutdown 
> controller, 
> cancel and requeue  or fail the outstanding requests.
> 
> The requeue mechanism will ensure the requests to be inserted to the ctx 
> where req->mq_ctx->cpu
> points to. 
> 
>>
>> Prior to a reset, all requests that have entered the queue are committed
>> to that hctx, 
> 
> A request could be on 
> - blk-mq per-cpu ctx->rq_list, IO scheduler list\
> - hctx->dispatch list or
> - request_queue->requeue_list (will be inserted to 1st case again)
> 
> When requests are issued, they will be dequeued from 1st or 2nd case and 
> submitted to nvme driver layer.
> These requests are _outstanding_ ones.
> 
> When the request queue is quiesced, the request will be stayed in  blk-mq 
> per-cpu ctx->rq_list, IO scheduler list
> or hctx->dispatch list, and cannot be issued to driver layer any more.
> When the request queue is frozen, it will gate the bio out of 
> generic_make_request, so new request cannot enter
> blk-mq layer any more, and certainly the nvme driver layer.
> 
> For the reset case, the nvme controller will be back soon, we needn't freeze 
> the queue, just quiescing is enough.
> The outstanding ones will be canceled and _requeued_ to 
> request_queue->requeue_list, then they will be inserted into
> blk-mq layer again by requeue_work. When reset_work completes and start 
> queues again, all the requests will be
> issued again. :)
> 
> For the shutdown case, freezing and quiescing is safer. Also we will wait 
> them to be completed if the controller is still
> alive. If dead, we need to fail them directly instead of requeue them, 
> otherwise, IO hung will come up, because controller
> will be offline for some time.
>   
> 
> and we can't do anything about that. The only thing we can
>> do is prevent new requests from entering until we're sure that hctx is
>> valid on the other side of the reset.
>>
> yes, that's is what this patch does.
> 
> Add some explaining about the 4th patch nvme-pci: break up nvme_timeout and 
> nvme_dev_disable here.
> Also thanks for your time to look into this. That's really appreciated!
> 
> The key point is blk_abort_request. It will force the request to be expired 
> and then handle the request
> in timeout work context. It is safe to race with the irq completion path. 
> This is the most important reason
> to use blk_abort_request. 
> We don't _complete_ the request or _rearm_ the time, but just set a CANCELED 
> flag. So request will not be freed.
> Then these requests cannot be taken away by irq completion path and time out 
> path is also b

linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Stephen Rothwell
Hi Michael,

After merging the vhost tree, today's linux-next build (x86_64
allmodconfig) failed like this:


Caused by commit

  96bcd04462b9 ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")

I have used the vhost tree from next-20180206 for today.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-06 Thread jianchao.wang
Hi Keith

Thanks for your time and kindly response on this.

On 02/06/2018 11:13 PM, Keith Busch wrote:
> On Tue, Feb 06, 2018 at 09:46:36AM +0800, jianchao.wang wrote:
>> Hi Keith
>>
>> Thanks for your kindly response.
>>
>> On 02/05/2018 11:13 PM, Keith Busch wrote:
>>>  but how many requests are you letting enter to their demise by
>>> freezing on the wrong side of the reset?
>>
>> There are only two difference with this patch from the original one.
>> 1. Don't freeze the queue for the reset case. At the moment, the outstanding 
>> requests will be requeued back to blk-mq queues.
>>The new entered requests during reset will also stay in blk-mq queues. 
>> All this requests will not enter into nvme driver layer
>>due to quiescent request_queues. And they will be issued after the reset 
>> is completed successfully.
>> 2. Drain the request queue before nvme_dev_disable. This is nearly same with 
>> the previous rule which will also unquiesce the queue
>>and let the requests be able to be drained. The only difference is this 
>> patch will invoke wait_freeze in nvme_dev_disable instead
>>of nvme_reset_work.
>>
>> We don't sacrifice any request. This patch do the same thing with the 
>> previous one and make things clearer.
> 
> No, what you're proposing is quite different.
> 
> By "enter", I'm referring to blk_queue_enter. 
When a request is allocated, it will hold a request_queue->q_usage_counter 
until it is freed.
Please refer to 
blk_mq_get_request -> blk_queue_enter_live
blk_mq_free_request -> blk_exit_queue

Regarding to 'request enters into an hctx', I cannot get the point.
I think you should mean it enter into nvme driver layer.

> Once a request enters
> into an hctx, it can not be backed out to re-enter a new hctx if the
> original one is invalidated.

I also cannot get the point here. We certainly will not issue a request which
has been issued to other hctx.
What this patch and also the original one does is that disable/shutdown 
controller, 
cancel and requeue  or fail the outstanding requests.

The requeue mechanism will ensure the requests to be inserted to the ctx where 
req->mq_ctx->cpu
points to. 

> 
> Prior to a reset, all requests that have entered the queue are committed
> to that hctx, 

A request could be on 
- blk-mq per-cpu ctx->rq_list, IO scheduler list\
- hctx->dispatch list or
- request_queue->requeue_list (will be inserted to 1st case again)

When requests are issued, they will be dequeued from 1st or 2nd case and 
submitted to nvme driver layer.
These requests are _outstanding_ ones.

When the request queue is quiesced, the request will be stayed in  blk-mq 
per-cpu ctx->rq_list, IO scheduler list
or hctx->dispatch list, and cannot be issued to driver layer any more.
When the request queue is frozen, it will gate the bio out of 
generic_make_request, so new request cannot enter
blk-mq layer any more, and certainly the nvme driver layer.

For the reset case, the nvme controller will be back soon, we needn't freeze 
the queue, just quiescing is enough.
The outstanding ones will be canceled and _requeued_ to 
request_queue->requeue_list, then they will be inserted into
blk-mq layer again by requeue_work. When reset_work completes and start queues 
again, all the requests will be
issued again. :)

For the shutdown case, freezing and quiescing is safer. Also we will wait them 
to be completed if the controller is still
alive. If dead, we need to fail them directly instead of requeue them, 
otherwise, IO hung will come up, because controller
will be offline for some time.
  

and we can't do anything about that. The only thing we can
> do is prevent new requests from entering until we're sure that hctx is
> valid on the other side of the reset.
> 
yes, that's is what this patch does.

Add some explaining about the 4th patch nvme-pci: break up nvme_timeout and 
nvme_dev_disable here.
Also thanks for your time to look into this. That's really appreciated!

The key point is blk_abort_request. It will force the request to be expired and 
then handle the request
in timeout work context. It is safe to race with the irq completion path. This 
is the most important reason
to use blk_abort_request. 
We don't _complete_ the request or _rearm_ the time, but just set a CANCELED 
flag. So request will not be freed.
Then these requests cannot be taken away by irq completion path and time out 
path is also be avoid (no outstanding requests any more).
So we say 'all the outstanding requests are grabbed'. When we close the 
controller totally, we could complete
these requests safely. This is the core idea of the 4th patch.

Many thanks
Jianchao
 


RE: [RFC, PATCH v1] platform/x86: intel-vbtn: Convert to pure ACPI driver

2018-02-06 Thread Mario.Limonciello
>Yeah, that's fixed in the branch.
>P.S. Apologies for top-posting, wrote from phone in order to not waste time
Andy,

I tested your review branch on XPS 9365 which uses power button through 
intel-vbtn.  I confirmed it worked properly both with usage in OS and during 
S2I.

Thanks,



Re: [PATCH 1/2] of_pci_irq: add a check to fallback to standard device tree parsing

2018-02-06 Thread Ryder Lee
Hi, Arnd

On Wed, 2018-02-07 at 09:31 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2018-02-06 at 13:42 +0800, Ryder Lee wrote:
> > Thanks for explanation.
> > 
> > So I guess the better way to achieve my aim - one IRQ per slot that is
> > connected to all INTx and get propagated through the bridges (and for
> > those root ports own interrupts (PME ..)} is to add interrupt-map
> > properties in both parent and root port nodes.
> > 
> > Something like this: https://patchwork.kernel.org/patch/9970923// ,right?
> 
> Yup.
> 
> Cheers,
> Ben.
> 

Do you have any thoughts on the original approach? If you are okay with
it, I will resend the DT patch.

Thanks.



Re: [PATCH] x86/nospec: Fixup array_index_nospec_mask() asm constraint

2018-02-06 Thread Dan Williams
On Tue, Feb 6, 2018 at 5:50 PM, Linus Torvalds
 wrote:
> On Tue, Feb 6, 2018 at 5:33 PM, Dan Williams  wrote:
>> Allow the compiler to handle @size as an immediate value rather than
>> allocating a register.
>
> Actually, maybe that "ir" should be "g".
>
> Because it's fine if it's a memory location too. "cmp" takes pretty
> much anything, as long as the thing we compare _to_ is a register.

Ok, no worries I'll do a v2. In fact you suggested 'g' in your initial
version and I lost that along the way while wrestling with why the
compiler miscompiled it until I put it in a static inline rather than
a macro.


Re: [PATCH] char: nvram: disable on ARM

2018-02-06 Thread Alexandre Belloni
On 06/02/2018 at 23:55:02 +0100, Arnd Bergmann wrote:
> * arch/arm/kernel/time.c has this code
> 
> #if defined(CONFIG_RTC_DRV_CMOS) || defined(CONFIG_RTC_DRV_CMOS_MODULE) || \
> defined(CONFIG_NVRAM) || defined(CONFIG_NVRAM_MODULE)
> /* this needs a better home */
> DEFINE_SPINLOCK(rtc_lock);
> EXPORT_SYMBOL(rtc_lock);
> #endif  /* pc-style 'CMOS' RTC support */
> 
> That can be adapted now, or maybe we could move all definitions into
> a common place (that needs some more planning).
> 

Yes, on arm, the rtc_lock is mostly there to please
drivers/rtc/rtc-cmos.c. Maybe we could make the locking in this driver
x86 and PPC specific.

If we can get rid of arch/powerpc/platforms/chrp/time.c and
arch/powerpc/platforms/maple/time.c (so much duplicated code), then it
is x86 only.

> * similarly, this line in nvram.c can be simplified:
> #if defined(CONFIG_ATARI)
> #  define MACH ATARI
> #elif defined(__i386__) || defined(__x86_64__) || defined(__arm__)  /* and ?? 
> */
> #  define MACH PC
> #else
> #  error Cannot build nvram driver for this machine configuration.
> #endif
> 
> * GENERIC_NVRAM is not really generic, instead this seems to be the
>   chardev that is used for 32-bit powerpc (powermac, 85xx, 86xx), while
>   64-bit powerpc (cell, maple, opal, pseries) use code from
>   arch/powerpc/kernel/nvram_64.c, with the same underlying arch hooks.
>   The nvram_64 code appears to be mostly a superset of the 32-bit
>   generic_nvram one.
> 
> * The code in drivers/char/nvram is not used at all when
>GENERIC_NVRAM is set, and half the code in there is different
>between x86 and atari.
> 
> * most of the external interface in include/linux/nvram.h is
>   unused, the rest tends to be architecture specific
> 
> * The procfs file appears to be completely useless on any 64-bit
>x86 machine, this is what I see:
> 
> $ cat /proc/driver/nvram
> Checksum status: valid
> # floppies : 0
> Floppy 0 type  : none
> Floppy 1 type  : none
> HD 0 type  : none
> HD 1 type  : none
> HD type 48 data: 0/0/0 C/H/S, precomp 0, lz 0
> HD type 49 data: 156/0/0 C/H/S, precomp 0, lz 0
> DOS base memory: 635 kB
> Extended memory: 65535 kB (configured), 65535 kB (tested)
> Gfx adapter: EGA, VGA, ... (with BIOS)
> FPU: not installed
> 

I really don't think anyone is using that but I don't really know much
about x86 and the specification this may be part of.

I see the info may be used in drivers/video/fbdev/ and
drivers/platform/x86/thinkpad_acpi.c

-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


[RFCv3 05/17] media: Document the media request API

2018-02-06 Thread Alexandre Courbot
From: Laurent Pinchart 

The media request API is made of a new ioctl to implement request
management. Document it.

Signed-off-by: Laurent Pinchart 
[acour...@chromium.org: adapt for newest API]
Signed-off-by: Alexandre Courbot 
---
 Documentation/media/uapi/mediactl/media-funcs.rst  |   1 +
 .../media/uapi/mediactl/media-ioc-request-cmd.rst  | 142 +
 2 files changed, 143 insertions(+)
 create mode 100644 Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst

diff --git a/Documentation/media/uapi/mediactl/media-funcs.rst 
b/Documentation/media/uapi/mediactl/media-funcs.rst
index 076856501cdb..e3a45d82ffcb 100644
--- a/Documentation/media/uapi/mediactl/media-funcs.rst
+++ b/Documentation/media/uapi/mediactl/media-funcs.rst
@@ -15,4 +15,5 @@ Function Reference
 media-ioc-g-topology
 media-ioc-enum-entities
 media-ioc-enum-links
+media-ioc-request-cmd
 media-ioc-setup-link
diff --git a/Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst 
b/Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst
new file mode 100644
index ..ced76ff3498d
--- /dev/null
+++ b/Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst
@@ -0,0 +1,142 @@
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _media_ioc_request_cmd:
+
+***
+ioctl MEDIA_IOC_REQUEST_CMD
+***
+
+Name
+
+
+MEDIA_IOC_REQUEST_CMD - Manage media device requests
+
+
+Synopsis
+
+
+.. c:function:: int ioctl( int fd, MEDIA_IOC_REQUEST_CMD, struct 
media_request_cmd *argp )
+:name: MEDIA_IOC_REQUEST_CMD
+
+
+Arguments
+=
+
+``fd``
+File descriptor returned by :ref:`open() `.
+
+``argp``
+
+
+Description
+===
+
+The MEDIA_IOC_REQUEST_CMD ioctl allows applications to manage media device
+requests. A request is an object that can group media device configuration
+parameters, including subsystem-specific parameters, in order to apply all the
+parameters atomically. Applications are responsible for allocating and
+deleting requests, filling them with configuration parameters and submitting
+them.
+
+Request operations are performed by calling the MEDIA_IOC_REQUEST_CMD ioctl
+with a pointer to a struct :c:type:`media_request_cmd` with the cmd field set
+to the appropriate command. :ref:`media-request-command` lists the commands
+supported by the ioctl.
+
+The struct :c:type:`media_request_cmd` request field contains the file
+descriptor of the request on which the command operates. For the
+``MEDIA_REQ_CMD_ALLOC`` command the field is set to zero by applications and
+filled by the driver. For all other commands the field is set by applications
+and left untouched by the driver.
+
+To allocate a new request applications use the ``MEDIA_REQ_CMD_ALLOC``
+command. The driver will allocate a new request and return its FD in the
+request field. After allocation, the request is "empty", which means that it
+does not hold any state of its own, and that the hardware's state will not be
+affected by it unless it is passed as argument to V4L2 or media controller
+commands.
+
+Requests are reference-counted. A newly allocated request is referenced
+by the returned file descriptor, and can be later referenced by
+subsystem-specific operations. Requests will thus be automatically deleted
+when they're no longer used after the returned file descriptor is closed.
+
+If a request isn't needed applications can delete it by calling ``close()``
+on it. The driver will drop the file handle reference. The request will not
+be usable through the MEDIA_IOC_REQUEST_CMD ioctl anymore, but will only be
+deleted when the last reference is released. If no other reference exists when
+``close()`` is invoked the request will be deleted immediately.
+
+After creating a request applications should fill it with configuration
+parameters. This is performed through subsystem-specific request APIs outside
+the scope of the media controller API. See the appropriate subsystem APIs for
+more information, including how they interact with the MEDIA_IOC_REQUEST_CMD
+ioctl.
+
+Once a request contains all the desired configuration parameters it can be
+submitted using the ``MEDIA_REQ_CMD_SUBMIT`` command. This will let the
+buffers queued for the request be passed to their respective drivers, which
+will then apply the request's parameters before processing them.
+
+Once a request has been queued applications are not allowed to modify its
+configuration parameters until the request has been fully processed. Any
+attempt to do so will result in the related subsystem API returning an error.
+The application that submitted the request can wait for its completion by
+polling on the request's file descriptor.
+
+Once a request has completed, it can be reused. The ``MEDIA_REQ_CMD_REINIT``
+command will bring it back to its initial state, so it can be prepared and
+submitted again.
+
+.. c:type:: media_request_cmd
+
+.. flat-table:: struct media_request_cmd
+:header-rows: 

[RFCv3 02/17] videodev2.h: Add request_fd field to v4l2_buffer

2018-02-06 Thread Alexandre Courbot
From: Hans Verkuil 

When queuing buffers allow for passing the request that should
be associated with this buffer.

Signed-off-by: Hans Verkuil 
[acour...@chromium.org: make request ID 32-bit]
Signed-off-by: Alexandre Courbot 
---
 drivers/media/common/videobuf2/videobuf2-v4l2.c | 3 ++-
 drivers/media/usb/cpia2/cpia2_v4l.c | 2 +-
 drivers/media/v4l2-core/v4l2-compat-ioctl32.c   | 9 ++---
 drivers/media/v4l2-core/v4l2-ioctl.c| 4 ++--
 include/media/videobuf2-v4l2.h  | 2 ++
 include/uapi/linux/videodev2.h  | 3 ++-
 6 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c 
b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index fac3cd6f901d..0034f4d190f2 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -203,7 +203,7 @@ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void 
*pb)
b->timestamp = ns_to_timeval(vb->timestamp);
b->timecode = vbuf->timecode;
b->sequence = vbuf->sequence;
-   b->reserved2 = 0;
+   b->request_fd = vbuf->request_fd;
b->reserved = 0;
 
if (q->is_multiplanar) {
@@ -320,6 +320,7 @@ static int __fill_vb2_buffer(struct vb2_buffer *vb,
}
vb->timestamp = 0;
vbuf->sequence = 0;
+   vbuf->request_fd = b->request_fd;
 
if (V4L2_TYPE_IS_MULTIPLANAR(b->type)) {
if (b->memory == VB2_MEMORY_USERPTR) {
diff --git a/drivers/media/usb/cpia2/cpia2_v4l.c 
b/drivers/media/usb/cpia2/cpia2_v4l.c
index a1c59f19cf2d..54c5aa0ecd26 100644
--- a/drivers/media/usb/cpia2/cpia2_v4l.c
+++ b/drivers/media/usb/cpia2/cpia2_v4l.c
@@ -948,7 +948,7 @@ static int cpia2_dqbuf(struct file *file, void *fh, struct 
v4l2_buffer *buf)
buf->sequence = cam->buffers[buf->index].seq;
buf->m.offset = cam->buffers[buf->index].data - cam->frame_buffer;
buf->length = cam->frame_size;
-   buf->reserved2 = 0;
+   buf->request_fd = 0;
buf->reserved = 0;
memset(&buf->timecode, 0, sizeof(buf->timecode));
 
diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c 
b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
index 5198c9eeb348..32bf47489a2e 100644
--- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
+++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
@@ -386,7 +386,7 @@ struct v4l2_buffer32 {
__s32   fd;
} m;
__u32   length;
-   __u32   reserved2;
+   __s32   request_fd;
__u32   reserved;
 };
 
@@ -486,6 +486,7 @@ static int get_v4l2_buffer32(struct v4l2_buffer __user *kp,
 {
u32 type;
u32 length;
+   s32 request_fd;
enum v4l2_memory memory;
struct v4l2_plane32 __user *uplane32;
struct v4l2_plane __user *uplane;
@@ -500,7 +501,9 @@ static int get_v4l2_buffer32(struct v4l2_buffer __user *kp,
get_user(memory, &up->memory) ||
put_user(memory, &kp->memory) ||
get_user(length, &up->length) ||
-   put_user(length, &kp->length))
+   put_user(length, &kp->length) ||
+   get_user(request_fd, &up->request_fd) ||
+   put_user(request_fd, &kp->request_fd))
return -EFAULT;
 
if (V4L2_TYPE_IS_OUTPUT(type))
@@ -604,7 +607,7 @@ static int put_v4l2_buffer32(struct v4l2_buffer __user *kp,
assign_in_user(&up->timestamp.tv_usec, &kp->timestamp.tv_usec) ||
copy_in_user(&up->timecode, &kp->timecode, sizeof(kp->timecode)) ||
assign_in_user(&up->sequence, &kp->sequence) ||
-   assign_in_user(&up->reserved2, &kp->reserved2) ||
+   assign_in_user(&up->request_fd, &kp->request_fd) ||
assign_in_user(&up->reserved, &kp->reserved) ||
get_user(length, &kp->length) ||
put_user(length, &up->length))
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index e5109e5b8bf5..2f40ac0cdf6e 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -437,13 +437,13 @@ static void v4l_print_buffer(const void *arg, bool 
write_only)
const struct v4l2_plane *plane;
int i;
 
-   pr_cont("%02ld:%02d:%02d.%08ld index=%d, type=%s, flags=0x%08x, 
field=%s, sequence=%d, memory=%s",
+   pr_cont("%02ld:%02d:%02d.%08ld index=%d, type=%s, request_fd=%u, 
flags=0x%08x, field=%s, sequence=%d, memory=%s",
p->timestamp.tv_sec / 3600,
(int)(p->timestamp.tv_sec / 60) % 60,
(int)(p->timestamp.tv_sec % 60),
(long)p->timestamp.tv_usec,
p->index,
-   prt_names(p->type, v4l2_type_names),
+   prt_names(p->type, v4l2_type_names), p->request_fd,
p->flags, prt_

[RFCv3 06/17] v4l2-ctrls: v4l2_ctrl_add_handler: add from_other_dev

2018-02-06 Thread Alexandre Courbot
From: Hans Verkuil 

Add a 'bool from_other_dev' argument: set to true if the two
handlers refer to different devices (e.g. it is true when
inheriting controls from a subdev into a main v4l2 bridge
driver).

This will be used later when implementing support for the
request API since we need to skip such controls.

TODO: check drivers/staging/media/imx/imx-media-fim.c change.

Signed-off-by: Hans Verkuil 
Signed-off-by: Alexandre Courbot 
---
 drivers/media/dvb-frontends/rtl2832_sdr.c|  5 +--
 drivers/media/pci/bt8xx/bttv-driver.c|  2 +-
 drivers/media/pci/cx23885/cx23885-417.c  |  2 +-
 drivers/media/pci/cx88/cx88-blackbird.c  |  2 +-
 drivers/media/pci/cx88/cx88-video.c  |  2 +-
 drivers/media/pci/saa7134/saa7134-empress.c  |  4 +--
 drivers/media/pci/saa7134/saa7134-video.c|  2 +-
 drivers/media/platform/exynos4-is/fimc-capture.c |  2 +-
 drivers/media/platform/rcar-vin/rcar-v4l2.c  |  3 +-
 drivers/media/platform/rcar_drif.c   |  2 +-
 drivers/media/platform/soc_camera/soc_camera.c   |  3 +-
 drivers/media/platform/vivid/vivid-ctrls.c   | 46 
 drivers/media/usb/cx231xx/cx231xx-417.c  |  2 +-
 drivers/media/usb/cx231xx/cx231xx-video.c|  4 +--
 drivers/media/usb/msi2500/msi2500.c  |  2 +-
 drivers/media/usb/tm6000/tm6000-video.c  |  2 +-
 drivers/media/v4l2-core/v4l2-ctrls.c | 11 +++---
 drivers/media/v4l2-core/v4l2-device.c|  3 +-
 drivers/staging/media/imx/imx-media-dev.c|  2 +-
 drivers/staging/media/imx/imx-media-fim.c|  2 +-
 include/media/v4l2-ctrls.h   |  4 ++-
 21 files changed, 58 insertions(+), 49 deletions(-)

diff --git a/drivers/media/dvb-frontends/rtl2832_sdr.c 
b/drivers/media/dvb-frontends/rtl2832_sdr.c
index c6e78d870ccd..6064d28224e8 100644
--- a/drivers/media/dvb-frontends/rtl2832_sdr.c
+++ b/drivers/media/dvb-frontends/rtl2832_sdr.c
@@ -1394,7 +1394,8 @@ static int rtl2832_sdr_probe(struct platform_device *pdev)
case RTL2832_SDR_TUNER_E4000:
v4l2_ctrl_handler_init(&dev->hdl, 9);
if (subdev)
-   v4l2_ctrl_add_handler(&dev->hdl, subdev->ctrl_handler, 
NULL);
+   v4l2_ctrl_add_handler(&dev->hdl, subdev->ctrl_handler,
+ NULL, true);
break;
case RTL2832_SDR_TUNER_R820T:
case RTL2832_SDR_TUNER_R828D:
@@ -1423,7 +1424,7 @@ static int rtl2832_sdr_probe(struct platform_device *pdev)
v4l2_ctrl_handler_init(&dev->hdl, 2);
if (subdev)
v4l2_ctrl_add_handler(&dev->hdl, subdev->ctrl_handler,
- NULL);
+ NULL, true);
break;
default:
v4l2_ctrl_handler_init(&dev->hdl, 0);
diff --git a/drivers/media/pci/bt8xx/bttv-driver.c 
b/drivers/media/pci/bt8xx/bttv-driver.c
index b366a7e1d976..91874f775d37 100644
--- a/drivers/media/pci/bt8xx/bttv-driver.c
+++ b/drivers/media/pci/bt8xx/bttv-driver.c
@@ -4211,7 +4211,7 @@ static int bttv_probe(struct pci_dev *dev, const struct 
pci_device_id *pci_id)
/* register video4linux + input */
if (!bttv_tvcards[btv->c.type].no_video) {
v4l2_ctrl_add_handler(&btv->radio_ctrl_handler, hdl,
-   v4l2_ctrl_radio_filter);
+   v4l2_ctrl_radio_filter, false);
if (btv->radio_ctrl_handler.error) {
result = btv->radio_ctrl_handler.error;
goto fail2;
diff --git a/drivers/media/pci/cx23885/cx23885-417.c 
b/drivers/media/pci/cx23885/cx23885-417.c
index a71f3c7569ce..762823871c78 100644
--- a/drivers/media/pci/cx23885/cx23885-417.c
+++ b/drivers/media/pci/cx23885/cx23885-417.c
@@ -1527,7 +1527,7 @@ int cx23885_417_register(struct cx23885_dev *dev)
dev->cxhdl.priv = dev;
dev->cxhdl.func = cx23885_api_func;
cx2341x_handler_set_50hz(&dev->cxhdl, tsport->height == 576);
-   v4l2_ctrl_add_handler(&dev->ctrl_handler, &dev->cxhdl.hdl, NULL);
+   v4l2_ctrl_add_handler(&dev->ctrl_handler, &dev->cxhdl.hdl, NULL, false);
 
/* Allocate and initialize V4L video device */
dev->v4l_device = cx23885_video_dev_alloc(tsport,
diff --git a/drivers/media/pci/cx88/cx88-blackbird.c 
b/drivers/media/pci/cx88/cx88-blackbird.c
index 0e0952e60795..39f69d89a663 100644
--- a/drivers/media/pci/cx88/cx88-blackbird.c
+++ b/drivers/media/pci/cx88/cx88-blackbird.c
@@ -1183,7 +1183,7 @@ static int cx8802_blackbird_probe(struct cx8802_driver 
*drv)
err = cx2341x_handler_init(&dev->cxhdl, 36);
if (err)
goto fail_core;
-   v4l2_ctrl_add_handler(&dev->cxhdl.hdl, &core->video_hdl, NULL);
+   v4l2_ctrl_add_handler(&dev->cxhdl.hdl, &core->video_hdl, NULL, false);
 
 

[RFCv3 03/17] media: videobuf2: add support for requests

2018-02-06 Thread Alexandre Courbot
Make vb2 aware of requests. Drivers can specify whether a given queue
can accept requests or not. Queues that accept requests will block on a
buffer that is part of a request until that request is submitted.

Signed-off-by: Alexandre Courbot 
---
 drivers/media/common/videobuf2/videobuf2-core.c | 133 ++--
 drivers/media/common/videobuf2/videobuf2-v4l2.c |  28 -
 include/media/videobuf2-core.h  |  15 ++-
 3 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
b/drivers/media/common/videobuf2/videobuf2-core.c
index f7109f827f6e..c1b9ccbdecb3 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -930,6 +931,17 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum 
vb2_buffer_state state)
vb->state = state;
}
atomic_dec(&q->owned_by_drv_count);
+   if (vb->request) {
+   struct media_request *req = vb->request;
+
+   if (atomic_dec_and_test(&req->buf_cpt))
+   media_request_complete(vb->request);
+
+   /* release reference acquired during qbuf */
+   vb->request = NULL;
+   media_request_put(req);
+   }
+
spin_unlock_irqrestore(&q->done_lock, flags);
 
trace_vb2_buf_done(q, vb);
@@ -1306,6 +1318,53 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned 
int index, void *pb)
 }
 EXPORT_SYMBOL_GPL(vb2_core_prepare_buf);
 
+/*
+ * vb2_check_buf_req_status() - Validate request state of a buffer
+ * @vb:buffer to check
+ *
+ * Returns true if a buffer is ready to be passed to the driver request-wise.
+ * This means that neither this buffer nor any previously-queued buffer is
+ * associated to a request that is not yet submitted.
+ *
+ * If this function returns false, then the buffer shall not be passed to its
+ * driver since the request state is not completely built yet. In that case,
+ * this function will register a notifier to be called when the request is
+ * submitted and the queue can be unblocked.
+ *
+ * This function must be called with req_lock held.
+ */
+static bool vb2_check_buf_req_status(struct vb2_buffer *vb)
+{
+   struct media_request *req = vb->request;
+   struct vb2_queue *q = vb->vb2_queue;
+   int ret = false;
+
+   mutex_lock(&q->req_lock);
+
+   if (!req) {
+   ret = !q->waiting_req;
+   goto done;
+   }
+
+   mutex_lock(&req->lock);
+   if (req->state == MEDIA_REQUEST_STATE_SUBMITTED) {
+   mutex_unlock(&req->lock);
+   ret = !q->waiting_req;
+   goto done;
+   }
+
+   if (!q->waiting_req) {
+   q->waiting_req = true;
+   atomic_notifier_chain_register(&req->submit_notif,
+  &q->req_blk);
+   }
+   mutex_unlock(&req->lock);
+
+done:
+   mutex_unlock(&q->req_lock);
+   return ret;
+}
+
 /*
  * vb2_start_streaming() - Attempt to start streaming.
  * @q: videobuf2 queue
@@ -1326,8 +1385,11 @@ static int vb2_start_streaming(struct vb2_queue *q)
 * If any buffers were queued before streamon,
 * we can now pass them to driver for processing.
 */
-   list_for_each_entry(vb, &q->queued_list, queued_entry)
+   list_for_each_entry(vb, &q->queued_list, queued_entry) {
+   if (!vb2_check_buf_req_status(vb))
+   break;
__enqueue_in_driver(vb);
+   }
 
/* Tell the driver to start streaming */
q->start_streaming_called = 1;
@@ -1369,7 +1431,46 @@ static int vb2_start_streaming(struct vb2_queue *q)
return ret;
 }
 
-int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
+/*
+ * vb2_unblock_requests() - unblock a queue waiting for a request submission
+ * @nb:notifier block that has been registered
+ * @action:unused
+ * @data:  request that has been submitted
+ *
+ * This is a callback function that is registered when
+ * vb2_check_buf_req_status() returns false. It is invoked when the request
+ * blocking the queue has been submitted. This means its buffers (and all
+ * following valid buffers) can be passed to drivers.
+ */
+static int vb2_unblock_requests(struct notifier_block *nb, unsigned long 
action,
+   void *data)
+{
+   struct vb2_queue *q = container_of(nb, struct vb2_queue, req_blk);
+   struct media_request *req = data;
+   struct vb2_buffer *vb;
+   bool found_request = false;
+
+   mutex_lock(&q->req_lock);
+   atomic_notifier_chain_unregister(&req->submit_notif, &q->req_blk);
+   q->waiting_req = false;
+   mutex_unlock(&q->req_lock);
+
+   list_for_each_entry(vb, &q->queued_list, queued_entry) {
+   /* A

[RFCv3 08/17] v4l2-ctrls: add core request API

2018-02-06 Thread Alexandre Courbot
From: Hans Verkuil 

Add the four core request functions:

v4l2_ctrl_request_init() initializes a new (empty) request.
v4l2_ctrl_request_clone() resets a request based on another request
(or clears it if that request is NULL).
v4l2_ctrl_request_get(): increase refcount
v4l2_ctrl_request_put(): decrease refcount and delete if it reaches 0.

Signed-off-by: Hans Verkuil 
[acour...@chromium.org: turn v4l2_ctrl_request_alloc into init function]
Signed-off-by: Alexandre Courbot 
---
 drivers/media/v4l2-core/v4l2-ctrls.c | 106 ++-
 include/media/v4l2-ctrls.h   |   7 +++
 2 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c
index 1ff8fc59fff5..c692a6d925c6 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -1878,6 +1878,7 @@ EXPORT_SYMBOL(v4l2_ctrl_find);
 /* Allocate a new v4l2_ctrl_ref and hook it into the handler. */
 static int handler_new_ref(struct v4l2_ctrl_handler *hdl,
   struct v4l2_ctrl *ctrl,
+  struct v4l2_ctrl_ref **ctrl_ref,
   bool from_other_dev)
 {
struct v4l2_ctrl_ref *ref;
@@ -1885,6 +1886,10 @@ static int handler_new_ref(struct v4l2_ctrl_handler *hdl,
u32 id = ctrl->id;
u32 class_ctrl = V4L2_CTRL_ID2WHICH(id) | 1;
int bucket = id % hdl->nr_of_buckets;   /* which bucket to use */
+   unsigned int sz_extra = 0;
+
+   if (ctrl_ref)
+   *ctrl_ref = NULL;
 
/*
 * Automatically add the control class if it is not yet present and
@@ -1898,11 +1903,16 @@ static int handler_new_ref(struct v4l2_ctrl_handler 
*hdl,
if (hdl->error)
return hdl->error;
 
-   new_ref = kzalloc(sizeof(*new_ref), GFP_KERNEL);
+   if (hdl->is_request)
+   sz_extra = ctrl->elems * ctrl->elem_size;
+   new_ref = kzalloc(sizeof(*new_ref) + sz_extra, GFP_KERNEL);
if (!new_ref)
return handler_set_err(hdl, -ENOMEM);
new_ref->ctrl = ctrl;
new_ref->from_other_dev = from_other_dev;
+   if (sz_extra)
+   new_ref->p_req.p = &new_ref[1];
+
if (ctrl->handler == hdl) {
/* By default each control starts in a cluster of its own.
   new_ref->ctrl is basically a cluster array with one
@@ -1942,6 +1952,8 @@ static int handler_new_ref(struct v4l2_ctrl_handler *hdl,
/* Insert the control node in the hash */
new_ref->next = hdl->buckets[bucket];
hdl->buckets[bucket] = new_ref;
+   if (ctrl_ref)
+   *ctrl_ref = new_ref;
 
 unlock:
mutex_unlock(hdl->lock);
@@ -2083,7 +2095,7 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
v4l2_ctrl_handler *hdl,
ctrl->type_ops->init(ctrl, idx, ctrl->p_new);
}
 
-   if (handler_new_ref(hdl, ctrl, false)) {
+   if (handler_new_ref(hdl, ctrl, NULL, false)) {
kvfree(ctrl);
return NULL;
}
@@ -2276,7 +2288,7 @@ int v4l2_ctrl_add_handler(struct v4l2_ctrl_handler *hdl,
/* Filter any unwanted controls */
if (filter && !filter(ctrl))
continue;
-   ret = handler_new_ref(hdl, ctrl, from_other_dev);
+   ret = handler_new_ref(hdl, ctrl, NULL, from_other_dev);
if (ret)
break;
}
@@ -2685,6 +2697,94 @@ int v4l2_querymenu(struct v4l2_ctrl_handler *hdl, struct 
v4l2_querymenu *qm)
 }
 EXPORT_SYMBOL(v4l2_querymenu);
 
+int v4l2_ctrl_request_init(struct v4l2_ctrl_handler *hdl)
+{
+   int err;
+
+   err = v4l2_ctrl_handler_init(hdl, 0);
+   if (err)
+   return err;
+   hdl->is_request = true;
+   kref_init(&hdl->ref);
+
+   return 0;
+}
+EXPORT_SYMBOL(v4l2_ctrl_request_init);
+
+int v4l2_ctrl_request_clone(struct v4l2_ctrl_handler *hdl,
+   const struct v4l2_ctrl_handler *from,
+   bool (*filter)(const struct v4l2_ctrl *ctrl))
+{
+   struct v4l2_ctrl_ref *ref;
+   int err;
+
+   if (WARN_ON(!hdl || hdl == from))
+   return -EINVAL;
+
+   if (hdl->error)
+   return hdl->error;
+
+   WARN_ON(hdl->lock != &hdl->_lock);
+   v4l2_ctrl_handler_free(hdl);
+   err = v4l2_ctrl_handler_init(hdl, (from->nr_of_buckets - 1) * 8);
+   hdl->is_request = true;
+   if (err)
+   return err;
+   if (!from)
+   return 0;
+
+   mutex_lock(from->lock);
+   list_for_each_entry(ref, &from->ctrl_refs, node) {
+   struct v4l2_ctrl *ctrl = ref->ctrl;
+   struct v4l2_ctrl_ref *new_ref;
+
+   /* Skip refs inherited from other devices */
+   if (ref->from_other_dev)
+   continue;
+   /* And buttons and control classes */

[RFCv3 10/17] v4l2-ctrls: support g/s_ext_ctrls for requests

2018-02-06 Thread Alexandre Courbot
From: Hans Verkuil 

The v4l2_g/s_ext_ctrls functions now support control handlers that
represent requests.

Signed-off-by: Hans Verkuil 
Signed-off-by: Alexandre Courbot 
---
 drivers/media/v4l2-core/v4l2-ctrls.c | 37 
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c
index 9090a49eef91..4fa7adef7531 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -1528,6 +1528,13 @@ static int new_to_user(struct v4l2_ext_control *c,
return ptr_to_user(c, ctrl, ctrl->p_new);
 }
 
+/* Helper function: copy the request value back to the caller */
+static int req_to_user(struct v4l2_ext_control *c,
+  struct v4l2_ctrl_ref *ref)
+{
+   return ptr_to_user(c, ref->ctrl, ref->p_req);
+}
+
 /* Helper function: copy the initial control value back to the caller */
 static int def_to_user(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl)
 {
@@ -1647,6 +1654,14 @@ static void cur_to_new(struct v4l2_ctrl *ctrl)
ptr_to_ptr(ctrl, ctrl->p_cur, ctrl->p_new);
 }
 
+/* Copy the new value to the request value */
+static void new_to_req(struct v4l2_ctrl_ref *ref)
+{
+   if (!ref)
+   return;
+   ptr_to_ptr(ref->ctrl, ref->ctrl->p_new, ref->p_req);
+}
+
 /* Return non-zero if one or more of the controls in the cluster has a new
value that differs from the current value. */
 static int cluster_changed(struct v4l2_ctrl *master)
@@ -2971,7 +2986,8 @@ int v4l2_g_ext_ctrls(struct v4l2_ctrl_handler *hdl, 
struct v4l2_ext_controls *cs
struct v4l2_ctrl *ctrl);
struct v4l2_ctrl *master;
 
-   ctrl_to_user = def_value ? def_to_user : cur_to_user;
+   ctrl_to_user = def_value ? def_to_user :
+  (hdl->is_request ? NULL : cur_to_user);
 
if (helpers[i].mref == NULL)
continue;
@@ -2997,8 +3013,12 @@ int v4l2_g_ext_ctrls(struct v4l2_ctrl_handler *hdl, 
struct v4l2_ext_controls *cs
u32 idx = i;
 
do {
-   ret = ctrl_to_user(cs->controls + idx,
-  helpers[idx].ref->ctrl);
+   if (ctrl_to_user)
+   ret = ctrl_to_user(cs->controls + idx,
+   helpers[idx].ref->ctrl);
+   else
+   ret = req_to_user(cs->controls + idx,
+   helpers[idx].ref);
idx = helpers[idx].next;
} while (!ret && idx);
}
@@ -3271,7 +3291,16 @@ static int try_set_ext_ctrls(struct v4l2_fh *fh, struct 
v4l2_ctrl_handler *hdl,
} while (!ret && idx);
 
if (!ret)
-   ret = try_or_set_cluster(fh, master, set, 0);
+   ret = try_or_set_cluster(fh, master,
+!hdl->is_request && set, 0);
+   if (!ret && hdl->is_request && set) {
+   for (j = 0; j < master->ncontrols; j++) {
+   struct v4l2_ctrl_ref *ref =
+   find_ref(hdl, master->cluster[j]->id);
+
+   new_to_req(ref);
+   }
+   }
 
/* Copy the new values back to userspace. */
if (!ret) {
-- 
2.16.0.rc1.238.g530d649a79-goog



[RFCv3 12/17] v4l2: add request API support

2018-02-06 Thread Alexandre Courbot
Add a v4l2 request entity data structure that takes care of storing the
request-related state of a V4L2 device ; in this case, its controls.

Signed-off-by: Alexandre Courbot 
---
 drivers/media/v4l2-core/Makefile   |  2 +-
 drivers/media/v4l2-core/v4l2-request.c | 54 ++
 include/media/v4l2-request.h   | 34 +
 3 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 drivers/media/v4l2-core/v4l2-request.c
 create mode 100644 include/media/v4l2-request.h

diff --git a/drivers/media/v4l2-core/Makefile b/drivers/media/v4l2-core/Makefile
index 80de2cb9c476..1113dea1f4f9 100644
--- a/drivers/media/v4l2-core/Makefile
+++ b/drivers/media/v4l2-core/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_V4L2_FWNODE) += v4l2-fwnode.o
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videodev-objs += vb2-trace.o v4l2-trace.o
 endif
-videodev-$(CONFIG_MEDIA_CONTROLLER) += v4l2-mc.o
+videodev-$(CONFIG_MEDIA_CONTROLLER) += v4l2-mc.o v4l2-request.o
 
 obj-$(CONFIG_VIDEO_V4L2) += videodev.o
 obj-$(CONFIG_VIDEO_V4L2) += v4l2-common.o
diff --git a/drivers/media/v4l2-core/v4l2-request.c 
b/drivers/media/v4l2-core/v4l2-request.c
new file mode 100644
index ..7bc29d3cc332
--- /dev/null
+++ b/drivers/media/v4l2-core/v4l2-request.c
@@ -0,0 +1,54 @@
+/*
+ * Media requests support for V4L2
+ *
+ * Copyright (C) 2018, The Chromium OS Authors.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+
+struct media_request_entity_data *media_request_v4l2_entity_data_alloc(
+   struct v4l2_ctrl_handler *hdl)
+{
+   struct media_request_v4l2_entity_data *data;
+   int ret;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+
+   ret = v4l2_ctrl_request_init(&data->ctrls);
+   if (ret) {
+   kfree(data);
+   return ERR_PTR(ret);
+   }
+
+   ret = v4l2_ctrl_request_clone(&data->ctrls, hdl, NULL);
+   if (ret) {
+   kfree(data);
+   return ERR_PTR(ret);
+   }
+
+   return &data->base;
+}
+EXPORT_SYMBOL_GPL(media_request_v4l2_entity_data_alloc);
+
+void
+media_request_v4l2_entity_data_free(struct media_request_entity_data *_data)
+{
+   struct media_request_v4l2_entity_data *data;
+
+   data = to_v4l2_entity_data(_data);
+
+   v4l2_ctrl_handler_free(&data->ctrls);
+   kfree(data);
+}
+EXPORT_SYMBOL_GPL(media_request_v4l2_entity_data_free);
diff --git a/include/media/v4l2-request.h b/include/media/v4l2-request.h
new file mode 100644
index ..db38dc5fc460
--- /dev/null
+++ b/include/media/v4l2-request.h
@@ -0,0 +1,34 @@
+/*
+ * Media requests support for V4L2
+ *
+ * Copyright (C) 2018, The Chromium OS Authors.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _MEDIA_REQUEST_V4L2_H
+#define _MEDIA_REQUEST_V4L2_H
+
+#include 
+#include 
+
+struct media_request_v4l2_entity_data {
+   struct media_request_entity_data base;
+
+   struct v4l2_ctrl_handler ctrls;
+};
+#define to_v4l2_entity_data(d) \
+   container_of(d, struct media_request_v4l2_entity_data, base)
+
+struct media_request_entity_data *media_request_v4l2_entity_data_alloc(
+   struct v4l2_ctrl_handler *hdl);
+void media_request_v4l2_entity_data_free(struct media_request_entity_data 
*data);
+
+#endif
-- 
2.16.0.rc1.238.g530d649a79-goog



[RFCv3 14/17] v4l2-ctrls: support requests in EXT_CTRLS ioctls

2018-02-06 Thread Alexandre Courbot
Read and use the request_fd field of struct v4l2_ext_controls to apply
VIDIOC_G_EXT_CTRLS or VIDIOC_S_EXT_CTRLS to a request when asked by
userspace.

Signed-off-by: Alexandre Courbot 
---
 drivers/media/v4l2-core/v4l2-ioctl.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index 235acdde3111..cbaefcad9694 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -2158,6 +2159,24 @@ static int v4l_g_ext_ctrls(const struct v4l2_ioctl_ops 
*ops,
test_bit(V4L2_FL_USES_V4L2_FH, &vfd->flags) ? fh : NULL;
 
p->error_idx = p->count;
+
+   if (p->request_fd > 0) {
+   struct media_request *req = NULL;
+   struct media_request_entity_data *_data;
+   struct media_request_v4l2_entity_data *data;
+   int ret;
+
+   req = check_request(p->request_fd, file, fh, &_data);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+   data = to_v4l2_entity_data(_data);
+
+   ret = v4l2_g_ext_ctrls(&data->ctrls, p);
+
+   media_request_put(req);
+   return ret;
+   }
+
if (vfh && vfh->ctrl_handler)
return v4l2_g_ext_ctrls(vfh->ctrl_handler, p);
if (vfd->ctrl_handler)
@@ -2177,6 +2196,23 @@ static int v4l_s_ext_ctrls(const struct v4l2_ioctl_ops 
*ops,
test_bit(V4L2_FL_USES_V4L2_FH, &vfd->flags) ? fh : NULL;
 
p->error_idx = p->count;
+   if (p->request_fd > 0) {
+   struct media_request *req = NULL;
+   struct media_request_entity_data *_data;
+   struct media_request_v4l2_entity_data *data;
+   int ret;
+
+   req = check_request(p->request_fd, file, fh, &_data);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+   data = to_v4l2_entity_data(_data);
+
+   ret = v4l2_s_ext_ctrls(vfh, &data->ctrls, p);
+
+   media_request_put(req);
+   return ret;
+   }
+
if (vfh && vfh->ctrl_handler)
return v4l2_s_ext_ctrls(vfh, vfh->ctrl_handler, p);
if (vfd->ctrl_handler)
-- 
2.16.0.rc1.238.g530d649a79-goog



Re: [PATCH] x86/nospec: Fixup array_index_nospec_mask() asm constraint

2018-02-06 Thread Linus Torvalds
On Tue, Feb 6, 2018 at 5:33 PM, Dan Williams  wrote:
> Allow the compiler to handle @size as an immediate value rather than
> allocating a register.

Actually, maybe that "ir" should be "g".

Because it's fine if it's a memory location too. "cmp" takes pretty
much anything, as long as the thing we compare _to_ is a register.

  Linus


[RFCv3 16/17] media: vim2m: add media device

2018-02-06 Thread Alexandre Courbot
Request API requires a media node. Add one to the vim2m driver so we can
use requests with it.

Signed-off-by: Alexandre Courbot 
---
 drivers/media/platform/vim2m.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/media/platform/vim2m.c b/drivers/media/platform/vim2m.c
index 065483e62db4..e0eb60310717 100644
--- a/drivers/media/platform/vim2m.c
+++ b/drivers/media/platform/vim2m.c
@@ -140,6 +140,9 @@ static struct vim2m_fmt *find_format(struct v4l2_format *f)
 struct vim2m_dev {
struct v4l2_device  v4l2_dev;
struct video_device vfd;
+#ifdef CONFIG_MEDIA_CONTROLLER
+   struct media_device mdev;
+#endif
 
atomic_tnum_inst;
struct mutexdev_mutex;
@@ -1001,6 +1004,13 @@ static int vim2m_probe(struct platform_device *pdev)
 
spin_lock_init(&dev->irqlock);
 
+#ifdef CONFIG_MEDIA_CONTROLLER
+   dev->mdev.dev = &pdev->dev;
+   strlcpy(dev->mdev.model, "vim2m", sizeof(dev->mdev.model));
+   media_device_init(&dev->mdev);
+   dev->v4l2_dev.mdev = &dev->mdev;
+#endif
+
ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
if (ret)
return ret;
@@ -1034,6 +1044,13 @@ static int vim2m_probe(struct platform_device *pdev)
goto err_m2m;
}
 
+#ifdef CONFIG_MEDIA_CONTROLLER
+   /* Register the media device node */
+   ret = media_device_register(&dev->mdev);
+   if (ret)
+   goto err_m2m;
+#endif
+
return 0;
 
 err_m2m:
@@ -1050,6 +1067,13 @@ static int vim2m_remove(struct platform_device *pdev)
struct vim2m_dev *dev = platform_get_drvdata(pdev);
 
v4l2_info(&dev->v4l2_dev, "Removing " MEM2MEM_NAME);
+
+#ifdef CONFIG_MEDIA_CONTROLLER
+   if (media_devnode_is_registered(dev->mdev.devnode))
+   media_device_unregister(&dev->mdev);
+   media_device_cleanup(&dev->mdev);
+#endif
+
v4l2_m2m_release(dev->m2m_dev);
del_timer_sync(&dev->timer);
video_unregister_device(&dev->vfd);
-- 
2.16.0.rc1.238.g530d649a79-goog



  1   2   3   4   5   6   7   8   9   10   >