RE: [PATCH] sd: remove redundant check for BLK_DEF_MAX_SECTORS

2016-06-07 Thread Long Li
Hi Martin,

Thanks for looking into this. The problem I'm trying to solve is that, I want 
to have lower layer driver to setup max_sectors bigger than 
BLK_DEF_MAX_SECTORS. In Hyper-v, we use 2MB max transfer I/O size, in future 
version the max transfer I/O size will increase to 8MB.
 
The implementation of sd.c limits the maximum value of max_sectors  to 
BLK_DEF_MAX_SECTORS.  Because sd_revalidate_disk is called late in the SCSI 
disk initialization process, there is no way for a lower layer driver to set 
this value to its "bigger" optimal size. 

The reason why I think it may not be necessary for sd.c to setup max_sectors, 
it's because this value may have already been setup twice before reaching the 
code in sd.c:
1. When this disk device is first scanned, or re-scanned (in scsi_scan.c), 
where it eventually calls __scsi_init_queue(), and use the max_sectors in the 
scsi_host_template.
2. in slave_configure of scsi_host_template, when the lower layer driver 
implements this function in its template and it can change this value there.

Long

> -Original Message-
> From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
> Sent: Monday, June 6, 2016 8:42 PM
> To: Long Li 
> Cc: Tom Yan ; James E.J. Bottomley
> ; Martin K. Petersen
> ; linux-s...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] sd: remove redundant check for
> BLK_DEF_MAX_SECTORS
> 
> > "Long" == Long Li  writes:
> 
> Long,
> 
> Long> The reason is that, max_sectors already has value at this point,
> Long> the default value is SCSI_DEFAULT_MAX_SECTORS
> Long> (include/scsi/scsi_host.h). The lower layer host driver can change
> Long> this value in its template.
> 
> The LLD sets max_hw_sectors which indicates the capabilities of the
> controller DMA hardware. Whereas the max_sectors limit is set by sd to
> either follow advise by the device or--if not provided--use the block layer
> default. max_sectors governs the size of READ/WRITE requests and do not
> reflect the capabilities of the DMA hardware.
> 
> Long> I think the drivers care about this value have already set it. So
> Long> it's better not to change it again. If they want max_sectors to be
> Long> set by sd, they can use BLOCK LIMITS VPD to tell it to do so.
> 
> Most drivers don't have the luxury of being able to generate VPDs for their
> attached target devices :)
> 
> --
> Martin K. PetersenOracle Linux Engineering


RE: [PATCH] sd: remove redundant check for BLK_DEF_MAX_SECTORS

2016-06-07 Thread Long Li
Hi Martin,

Thanks for looking into this. The problem I'm trying to solve is that, I want 
to have lower layer driver to setup max_sectors bigger than 
BLK_DEF_MAX_SECTORS. In Hyper-v, we use 2MB max transfer I/O size, in future 
version the max transfer I/O size will increase to 8MB.
 
The implementation of sd.c limits the maximum value of max_sectors  to 
BLK_DEF_MAX_SECTORS.  Because sd_revalidate_disk is called late in the SCSI 
disk initialization process, there is no way for a lower layer driver to set 
this value to its "bigger" optimal size. 

The reason why I think it may not be necessary for sd.c to setup max_sectors, 
it's because this value may have already been setup twice before reaching the 
code in sd.c:
1. When this disk device is first scanned, or re-scanned (in scsi_scan.c), 
where it eventually calls __scsi_init_queue(), and use the max_sectors in the 
scsi_host_template.
2. in slave_configure of scsi_host_template, when the lower layer driver 
implements this function in its template and it can change this value there.

Long

> -Original Message-
> From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
> Sent: Monday, June 6, 2016 8:42 PM
> To: Long Li 
> Cc: Tom Yan ; James E.J. Bottomley
> ; Martin K. Petersen
> ; linux-s...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] sd: remove redundant check for
> BLK_DEF_MAX_SECTORS
> 
> > "Long" == Long Li  writes:
> 
> Long,
> 
> Long> The reason is that, max_sectors already has value at this point,
> Long> the default value is SCSI_DEFAULT_MAX_SECTORS
> Long> (include/scsi/scsi_host.h). The lower layer host driver can change
> Long> this value in its template.
> 
> The LLD sets max_hw_sectors which indicates the capabilities of the
> controller DMA hardware. Whereas the max_sectors limit is set by sd to
> either follow advise by the device or--if not provided--use the block layer
> default. max_sectors governs the size of READ/WRITE requests and do not
> reflect the capabilities of the DMA hardware.
> 
> Long> I think the drivers care about this value have already set it. So
> Long> it's better not to change it again. If they want max_sectors to be
> Long> set by sd, they can use BLOCK LIMITS VPD to tell it to do so.
> 
> Most drivers don't have the luxury of being able to generate VPDs for their
> attached target devices :)
> 
> --
> Martin K. PetersenOracle Linux Engineering


[PATCH net] Driver: Vmxnet3: segCnt can be 1 for LRO packets

2016-06-07 Thread Shrikrishna Khare
The device emulation may send segCnt of 1 for LRO packets.

Signed-off-by: Shrikrishna Khare 
Signed-off-by: Jin Heo 
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
 drivers/net/vmxnet3/vmxnet3_int.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index db8022a..6f399b2 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1369,7 +1369,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
rcdlro = (struct Vmxnet3_RxCompDescExt *)rcd;
 
segCnt = rcdlro->segCnt;
-   BUG_ON(segCnt <= 1);
+   BUG_ON(segCnt == 0);
mss = rcdlro->mss;
if (unlikely(segCnt <= 1))
segCnt = 0;
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index c482539..3d2b64e 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.7.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.8.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x01040700
+#define VMXNET3_DRIVER_VERSION_NUM  0x01040800
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
2.8.2



[PATCH net] Driver: Vmxnet3: segCnt can be 1 for LRO packets

2016-06-07 Thread Shrikrishna Khare
The device emulation may send segCnt of 1 for LRO packets.

Signed-off-by: Shrikrishna Khare 
Signed-off-by: Jin Heo 
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
 drivers/net/vmxnet3/vmxnet3_int.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index db8022a..6f399b2 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1369,7 +1369,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
rcdlro = (struct Vmxnet3_RxCompDescExt *)rcd;
 
segCnt = rcdlro->segCnt;
-   BUG_ON(segCnt <= 1);
+   BUG_ON(segCnt == 0);
mss = rcdlro->mss;
if (unlikely(segCnt <= 1))
segCnt = 0;
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index c482539..3d2b64e 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.7.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.8.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x01040700
+#define VMXNET3_DRIVER_VERSION_NUM  0x01040800
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
2.8.2



Re: [PATCH v13 10/10] kprobes: Add arm64 case in kprobe example module

2016-06-07 Thread Huang Shijie
On Thu, Jun 02, 2016 at 11:26:24PM -0400, David Long wrote:
> From: Sandeepa Prabhu 
> 
> Add info prints in sample kprobe handlers for ARM64
> 
> Signed-off-by: Sandeepa Prabhu 
> ---
>  samples/kprobes/kprobe_example.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/samples/kprobes/kprobe_example.c 
> b/samples/kprobes/kprobe_example.c
> index ed0ca0c..aad8e6f 100644
> --- a/samples/kprobes/kprobe_example.c
> +++ b/samples/kprobes/kprobe_example.c
> @@ -46,6 +46,10 @@ static int handler_pre(struct kprobe *p, struct pt_regs 
> *regs)
>   " ex1 = 0x%lx\n",
>   p->symbol_name, p->addr, regs->pc, regs->ex1);
>  #endif
> +#ifdef CONFIG_ARM64
> + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n",
> + p->addr, (long)regs->pc);
Please add the "p->symbol_name" for the log, just as the above line.

> +#endif
>  
>   /* A dump_stack() here will give a stack backtrace */
>   return 0;
> @@ -71,6 +75,10 @@ static void handler_post(struct kprobe *p, struct pt_regs 
> *regs,
>   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
>   p->symbol_name, p->addr, regs->ex1);
>  #endif
> +#ifdef CONFIG_ARM64
> + pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx\n",
> + p->addr, (long)regs->pc);
> +#endif
>  }
Ditto.

thanks
Huang Shijie



Re: [PATCH v13 10/10] kprobes: Add arm64 case in kprobe example module

2016-06-07 Thread Huang Shijie
On Thu, Jun 02, 2016 at 11:26:24PM -0400, David Long wrote:
> From: Sandeepa Prabhu 
> 
> Add info prints in sample kprobe handlers for ARM64
> 
> Signed-off-by: Sandeepa Prabhu 
> ---
>  samples/kprobes/kprobe_example.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/samples/kprobes/kprobe_example.c 
> b/samples/kprobes/kprobe_example.c
> index ed0ca0c..aad8e6f 100644
> --- a/samples/kprobes/kprobe_example.c
> +++ b/samples/kprobes/kprobe_example.c
> @@ -46,6 +46,10 @@ static int handler_pre(struct kprobe *p, struct pt_regs 
> *regs)
>   " ex1 = 0x%lx\n",
>   p->symbol_name, p->addr, regs->pc, regs->ex1);
>  #endif
> +#ifdef CONFIG_ARM64
> + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n",
> + p->addr, (long)regs->pc);
Please add the "p->symbol_name" for the log, just as the above line.

> +#endif
>  
>   /* A dump_stack() here will give a stack backtrace */
>   return 0;
> @@ -71,6 +75,10 @@ static void handler_post(struct kprobe *p, struct pt_regs 
> *regs,
>   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
>   p->symbol_name, p->addr, regs->ex1);
>  #endif
> +#ifdef CONFIG_ARM64
> + pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx\n",
> + p->addr, (long)regs->pc);
> +#endif
>  }
Ditto.

thanks
Huang Shijie



Re: [PATCH 0/2] Proper ro_after_init implementation on s390

2016-06-07 Thread Heiko Carstens
On Tue, Jun 07, 2016 at 11:11:17AM -0700, Kees Cook wrote:
> On Tue, Jun 7, 2016 at 11:07 AM, Heiko Carstens
>  wrote:
> > On Tue, Jun 07, 2016 at 08:49:14AM -0700, Kees Cook wrote:
> >> > Heiko Carstens (2):
> >> >   vmlinux.lds.h: allow arch specific handling of ro_after_init data 
> >> > section
> >> >   s390/mm: add proper __ro_after_init support
> >> >
> >> >  arch/s390/include/asm/cache.h |  3 ---
> >> >  arch/s390/include/asm/sections.h  |  1 +
> >> >  arch/s390/kernel/vmlinux.lds.S| 12 +++-
> >> >  arch/s390/mm/init.c   |  7 ---
> >> >  arch/s390/mm/vmem.c   |  7 +++
> >> >  include/asm-generic/vmlinux.lds.h | 10 +-
> >> >  6 files changed, 28 insertions(+), 12 deletions(-)
> >>
> >> Awesome! This looks great to me! Have you had a chance to look through
> >> any of the arch/s390/ __init code for variables that should be marked
> >> __ro_after_init?
> >
> > Not yet, and actually this I'm a bit reluctant to do that, since any wrong
> > annotation will lead to kernel crashes sooner or later ;)
> > However I'll look into this as well.
> 
> Yup, though the good news is it's usually discovered very quickly. :)

Eventually it might make sense to add something like
DEBUG_SECTION_MISMATCH, which would only report on _write_ accesses from
non-init sections.

Not sure if this can be done easily and without the need of a new compiler
feature. The new problem class I'm afraid of is more or less the same that
we had when non-init code referenced (already freed) initdata objects.



Re: [PATCH 0/2] Proper ro_after_init implementation on s390

2016-06-07 Thread Heiko Carstens
On Tue, Jun 07, 2016 at 11:11:17AM -0700, Kees Cook wrote:
> On Tue, Jun 7, 2016 at 11:07 AM, Heiko Carstens
>  wrote:
> > On Tue, Jun 07, 2016 at 08:49:14AM -0700, Kees Cook wrote:
> >> > Heiko Carstens (2):
> >> >   vmlinux.lds.h: allow arch specific handling of ro_after_init data 
> >> > section
> >> >   s390/mm: add proper __ro_after_init support
> >> >
> >> >  arch/s390/include/asm/cache.h |  3 ---
> >> >  arch/s390/include/asm/sections.h  |  1 +
> >> >  arch/s390/kernel/vmlinux.lds.S| 12 +++-
> >> >  arch/s390/mm/init.c   |  7 ---
> >> >  arch/s390/mm/vmem.c   |  7 +++
> >> >  include/asm-generic/vmlinux.lds.h | 10 +-
> >> >  6 files changed, 28 insertions(+), 12 deletions(-)
> >>
> >> Awesome! This looks great to me! Have you had a chance to look through
> >> any of the arch/s390/ __init code for variables that should be marked
> >> __ro_after_init?
> >
> > Not yet, and actually this I'm a bit reluctant to do that, since any wrong
> > annotation will lead to kernel crashes sooner or later ;)
> > However I'll look into this as well.
> 
> Yup, though the good news is it's usually discovered very quickly. :)

Eventually it might make sense to add something like
DEBUG_SECTION_MISMATCH, which would only report on _write_ accesses from
non-init sections.

Not sure if this can be done easily and without the need of a new compiler
feature. The new problem class I'm afraid of is more or less the same that
we had when non-init code referenced (already freed) initdata objects.



Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

2016-06-07 Thread Horng-Shyang Liao
Hi Matthias,

On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
> 
> On 03/06/16 15:11, Matthias Brugger wrote:
> >
> >
> [...]
> 
> >> +
> >> +smp_mb(); /* modify jump before enable thread */
> >> +}
> >> +
> >> +cmdq_thread_writel(thread, task->pa_base +
> >> task->command_size,
> >> +   CMDQ_THR_END_ADDR);
> >> +cmdq_thread_resume(thread);
> >> +}
> >> +list_move_tail(>list_entry, >task_busy_list);
> >> +spin_unlock_irqrestore(>exec_lock, flags);
> >> +}
> >> +
> >> +static void cmdq_handle_error_done(struct cmdq *cmdq,
> >> +   struct cmdq_thread *thread, u32 irq_flag)
> >> +{
> >> +struct cmdq_task *task, *tmp, *curr_task = NULL;
> >> +u32 curr_pa;
> >> +struct cmdq_cb_data cmdq_cb_data;
> >> +bool err;
> >> +
> >> +if (irq_flag & CMDQ_THR_IRQ_ERROR)
> >> +err = true;
> >> +else if (irq_flag & CMDQ_THR_IRQ_DONE)
> >> +err = false;
> >> +else
> >> +return;
> >> +
> >> +curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> >> +
> >> +list_for_each_entry_safe(task, tmp, >task_busy_list,
> >> + list_entry) {
> >> +if (curr_pa >= task->pa_base &&
> >> +curr_pa < (task->pa_base + task->command_size))
> >
> > What are you checking here? It seems as if you make some implcit
> > assumptions about pa_base and the order of execution of
> > commands in the
> > thread. Is it save to do so? Does dma_alloc_coherent give any
> > guarantees
> > about dma_handle?
> 
>  1. Check what is the current running task in this GCE thread.
>  2. Yes.
>  3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
> 
> >>>
> >>> Yes, physical addresses might be continous, but AFAIK there is no
> >>> guarantee that the dma_handle address is steadily growing, when
> >>> calling
> >>> dma_alloc_coherent. And if I understand the code correctly, you
> >>> use this
> >>> assumption to decide if the task picked from task_busy_list is
> >>> currently
> >>> executing. So I think this mecanism is not working.
> >>
> >> I don't use dma_handle address, and just use physical addresses.
> >>From CPU's point of view, tasks are linked by the busy list.
> >>From GCE's point of view, tasks are linked by the JUMP command.
> >>
> >>> In which cases does the HW thread raise an interrupt.
> >>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
> >>
> >> GCE will raise interrupt if any task is done or error.
> >> However, GCE is fast, so CPU may get multiple done tasks
> >> when it is running ISR.
> >>
> >> In case of error, that GCE thread will pause and raise interrupt.
> >> So, CPU may get multiple done tasks and one error task.
> >>
> >
> > I think we should reimplement the ISR mechanism. Can't we just read
> > CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
> > cmdq_handle_error_done to the thread_fn? You will need to pass
> > information from the handler to thread_fn, but that shouldn't be an
> > issue. AFAIK interrupts are disabled in the handler, so we should stay
> > there as short as possible. Traversing task_busy_list is expensive, so
> > we need to do it in a thread context.
> 
>  Actually, our initial implementation is similar to your suggestion,
>  but display needs CMDQ to return callback function very precisely,
>  else display will drop frame.
>  For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>  and CMDQ needs to call callback function in ISR.
>  If we defer callback to workqueue, the time interval may be larger than
>  32 ms.sometimes.
> 
> >>>
> >>> I think the problem is, that you implemented the workqueue as a ordered
> >>> workqueue, so there is no parallel processing. I'm still not sure why
> >>> you need the workqueue to be ordered. Can you please explain.
> >>
> >> The order should be kept.
> >> Let me use mouse cursor as an example.
> >> If task 1 means move mouse cursor to point A, task 2 means point B,
> >> and task 3 means point C, our expected result is A -> B -> C.
> >> If the order is not kept, the result could become A -> C -> B.
> >>
> >
> > Got it, thanks for the clarification.
> >
> 
> I think a way to get rid of the workqueue is to use a timer, which gets 
> programmed to the time a timeout in the first task in the busy list 
> would happen. Everytime we update the busy list (e.g. because of task 
> got finished by the thread), we update the timer. When the 

Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

2016-06-07 Thread Horng-Shyang Liao
Hi Matthias,

On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
> 
> On 03/06/16 15:11, Matthias Brugger wrote:
> >
> >
> [...]
> 
> >> +
> >> +smp_mb(); /* modify jump before enable thread */
> >> +}
> >> +
> >> +cmdq_thread_writel(thread, task->pa_base +
> >> task->command_size,
> >> +   CMDQ_THR_END_ADDR);
> >> +cmdq_thread_resume(thread);
> >> +}
> >> +list_move_tail(>list_entry, >task_busy_list);
> >> +spin_unlock_irqrestore(>exec_lock, flags);
> >> +}
> >> +
> >> +static void cmdq_handle_error_done(struct cmdq *cmdq,
> >> +   struct cmdq_thread *thread, u32 irq_flag)
> >> +{
> >> +struct cmdq_task *task, *tmp, *curr_task = NULL;
> >> +u32 curr_pa;
> >> +struct cmdq_cb_data cmdq_cb_data;
> >> +bool err;
> >> +
> >> +if (irq_flag & CMDQ_THR_IRQ_ERROR)
> >> +err = true;
> >> +else if (irq_flag & CMDQ_THR_IRQ_DONE)
> >> +err = false;
> >> +else
> >> +return;
> >> +
> >> +curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> >> +
> >> +list_for_each_entry_safe(task, tmp, >task_busy_list,
> >> + list_entry) {
> >> +if (curr_pa >= task->pa_base &&
> >> +curr_pa < (task->pa_base + task->command_size))
> >
> > What are you checking here? It seems as if you make some implcit
> > assumptions about pa_base and the order of execution of
> > commands in the
> > thread. Is it save to do so? Does dma_alloc_coherent give any
> > guarantees
> > about dma_handle?
> 
>  1. Check what is the current running task in this GCE thread.
>  2. Yes.
>  3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
> 
> >>>
> >>> Yes, physical addresses might be continous, but AFAIK there is no
> >>> guarantee that the dma_handle address is steadily growing, when
> >>> calling
> >>> dma_alloc_coherent. And if I understand the code correctly, you
> >>> use this
> >>> assumption to decide if the task picked from task_busy_list is
> >>> currently
> >>> executing. So I think this mecanism is not working.
> >>
> >> I don't use dma_handle address, and just use physical addresses.
> >>From CPU's point of view, tasks are linked by the busy list.
> >>From GCE's point of view, tasks are linked by the JUMP command.
> >>
> >>> In which cases does the HW thread raise an interrupt.
> >>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
> >>
> >> GCE will raise interrupt if any task is done or error.
> >> However, GCE is fast, so CPU may get multiple done tasks
> >> when it is running ISR.
> >>
> >> In case of error, that GCE thread will pause and raise interrupt.
> >> So, CPU may get multiple done tasks and one error task.
> >>
> >
> > I think we should reimplement the ISR mechanism. Can't we just read
> > CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
> > cmdq_handle_error_done to the thread_fn? You will need to pass
> > information from the handler to thread_fn, but that shouldn't be an
> > issue. AFAIK interrupts are disabled in the handler, so we should stay
> > there as short as possible. Traversing task_busy_list is expensive, so
> > we need to do it in a thread context.
> 
>  Actually, our initial implementation is similar to your suggestion,
>  but display needs CMDQ to return callback function very precisely,
>  else display will drop frame.
>  For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>  and CMDQ needs to call callback function in ISR.
>  If we defer callback to workqueue, the time interval may be larger than
>  32 ms.sometimes.
> 
> >>>
> >>> I think the problem is, that you implemented the workqueue as a ordered
> >>> workqueue, so there is no parallel processing. I'm still not sure why
> >>> you need the workqueue to be ordered. Can you please explain.
> >>
> >> The order should be kept.
> >> Let me use mouse cursor as an example.
> >> If task 1 means move mouse cursor to point A, task 2 means point B,
> >> and task 3 means point C, our expected result is A -> B -> C.
> >> If the order is not kept, the result could become A -> C -> B.
> >>
> >
> > Got it, thanks for the clarification.
> >
> 
> I think a way to get rid of the workqueue is to use a timer, which gets 
> programmed to the time a timeout in the first task in the busy list 
> would happen. Everytime we update the busy list (e.g. because of task 
> got finished by the thread), we update the timer. When the 

Re: [LKP] [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

2016-06-07 Thread Ye Xiaolong
On Tue, Jun 07, 2016 at 05:56:27PM -0400, Johannes Weiner wrote:
>On Tue, Jun 07, 2016 at 12:48:17PM +0800, Ye Xiaolong wrote:
>> FYI, below is the comparison info between 3ed3a4f, 795ae7ay, v4.7-rc2 and the
>> revert commit (eaa7f0d).
>
>Thanks for running this.
>
>Alas, I still can not make heads or tails of this, or reproduce it
>locally for that matter.
>
>With this test run, there seems to be a significant increase in system time:
>
>>  92.03 ±  0%  +5.6%  97.23 ± 11% +30.5% 120.08 ±  1% 
>> +30.0% 119.61 ±  0%  pixz.time.system_time
>
>Would it be possible to profile the testruns using perf? Maybe we can
>find out where the kernel is spending the extra time.
>
>But just to make sure I'm looking at the right code, can you first try
>the following patch on top of Linus's current tree and see if that
>gets performance back to normal? It's a partial revert of the
>watermarks that singles out the fair zone allocator:

Seems that this patch doesn't help to gets performance back.
I've attached the comparison result among 3ed3a4f, 795ae7ay, v4.7-rc2 and
1fe49ba5 ("mm: revert fairness batching to before the watermarks were")
with perf profile information.  You can find it via searching 'perf-profile'.

Thanks,
Xiaolong

>
>From 2015eaad688486d65fcf86185e213fff8506b3fe Mon Sep 17 00:00:00 2001
>From: Johannes Weiner 
>Date: Tue, 7 Jun 2016 17:45:03 -0400
>Subject: [PATCH] mm: revert fairness batching to before the watermarks were
> boosted
>
>Signed-off-by: Johannes Weiner 
>---
> include/linux/mmzone.h | 2 ++
> mm/page_alloc.c| 6 --
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>index 02069c2..4565b92 100644
>--- a/include/linux/mmzone.h
>+++ b/include/linux/mmzone.h
>@@ -327,6 +327,8 @@ struct zone {
>   /* zone watermarks, access with *_wmark_pages(zone) macros */
>   unsigned long watermark[NR_WMARK];
> 
>+  unsigned long fairbatch;
>+
>   unsigned long nr_reserved_highatomic;
> 
>   /*
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 6903b69..33387ab 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -2889,7 +2889,7 @@ static void reset_alloc_batches(struct zone 
>*preferred_zone)
> 
>   do {
>   mod_zone_page_state(zone, NR_ALLOC_BATCH,
>-  high_wmark_pages(zone) - low_wmark_pages(zone) -
>+  zone->fairbatch -
>   atomic_long_read(>vm_stat[NR_ALLOC_BATCH]));
>   clear_bit(ZONE_FAIR_DEPLETED, >flags);
>   } while (zone++ != preferred_zone);
>@@ -6842,6 +6842,8 @@ static void __setup_per_zone_wmarks(void)
>   zone->watermark[WMARK_MIN] = tmp;
>   }
> 
>+  zone->fairbatch = tmp >> 2;
>+
>   /*
>* Set the kswapd watermarks distance according to the
>* scale factor in proportion to available memory, but
>@@ -6855,7 +6857,7 @@ static void __setup_per_zone_wmarks(void)
>   zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
> 
>   __mod_zone_page_state(zone, NR_ALLOC_BATCH,
>-  high_wmark_pages(zone) - low_wmark_pages(zone) -
>+  zone->fairbatch -
>   atomic_long_read(>vm_stat[NR_ALLOC_BATCH]));
> 
>   spin_unlock_irqrestore(>lock, flags);
>-- 
>2.8.2
=
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
  gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/ivb43/pixz

commit: 
  3ed3a4f0ddffece942bb2661924d87be4ce63cb7
  795ae7a0de6b834a0cc202aa55c190ef81496665
  v4.7-rc2
  1fe49ba5002a50aefd5b6c4913e61eff86ac7253

3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55   v4.7-rc2 
1fe49ba5002a50aefd5b6c4913
 -- -- 
--
   fail:runs  %reproductionfail:runs  %reproductionfail:runs  
%reproductionfail:runs
   | | | | |
 | |
   :40%:70%:4   
50%   2:4 kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
   :4   50%   2:70%:4   
 0%:4 kmsg.Spurious_LAPIC_timer_interrupt_on_cpu
   :40%:7   14%   1:4   
25%   1:4 kmsg.igb#:#:#:exceed_max#second
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  78505362 ±  0%  -9.2%   71298182 ±  0% -11.8%   69280014 ±  0%  
-9.1%   71350485 ±  0%  

Re: [LKP] [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

2016-06-07 Thread Ye Xiaolong
On Tue, Jun 07, 2016 at 05:56:27PM -0400, Johannes Weiner wrote:
>On Tue, Jun 07, 2016 at 12:48:17PM +0800, Ye Xiaolong wrote:
>> FYI, below is the comparison info between 3ed3a4f, 795ae7ay, v4.7-rc2 and the
>> revert commit (eaa7f0d).
>
>Thanks for running this.
>
>Alas, I still can not make heads or tails of this, or reproduce it
>locally for that matter.
>
>With this test run, there seems to be a significant increase in system time:
>
>>  92.03 ±  0%  +5.6%  97.23 ± 11% +30.5% 120.08 ±  1% 
>> +30.0% 119.61 ±  0%  pixz.time.system_time
>
>Would it be possible to profile the testruns using perf? Maybe we can
>find out where the kernel is spending the extra time.
>
>But just to make sure I'm looking at the right code, can you first try
>the following patch on top of Linus's current tree and see if that
>gets performance back to normal? It's a partial revert of the
>watermarks that singles out the fair zone allocator:

Seems that this patch doesn't help to gets performance back.
I've attached the comparison result among 3ed3a4f, 795ae7ay, v4.7-rc2 and
1fe49ba5 ("mm: revert fairness batching to before the watermarks were")
with perf profile information.  You can find it via searching 'perf-profile'.

Thanks,
Xiaolong

>
>From 2015eaad688486d65fcf86185e213fff8506b3fe Mon Sep 17 00:00:00 2001
>From: Johannes Weiner 
>Date: Tue, 7 Jun 2016 17:45:03 -0400
>Subject: [PATCH] mm: revert fairness batching to before the watermarks were
> boosted
>
>Signed-off-by: Johannes Weiner 
>---
> include/linux/mmzone.h | 2 ++
> mm/page_alloc.c| 6 --
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>index 02069c2..4565b92 100644
>--- a/include/linux/mmzone.h
>+++ b/include/linux/mmzone.h
>@@ -327,6 +327,8 @@ struct zone {
>   /* zone watermarks, access with *_wmark_pages(zone) macros */
>   unsigned long watermark[NR_WMARK];
> 
>+  unsigned long fairbatch;
>+
>   unsigned long nr_reserved_highatomic;
> 
>   /*
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 6903b69..33387ab 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -2889,7 +2889,7 @@ static void reset_alloc_batches(struct zone 
>*preferred_zone)
> 
>   do {
>   mod_zone_page_state(zone, NR_ALLOC_BATCH,
>-  high_wmark_pages(zone) - low_wmark_pages(zone) -
>+  zone->fairbatch -
>   atomic_long_read(>vm_stat[NR_ALLOC_BATCH]));
>   clear_bit(ZONE_FAIR_DEPLETED, >flags);
>   } while (zone++ != preferred_zone);
>@@ -6842,6 +6842,8 @@ static void __setup_per_zone_wmarks(void)
>   zone->watermark[WMARK_MIN] = tmp;
>   }
> 
>+  zone->fairbatch = tmp >> 2;
>+
>   /*
>* Set the kswapd watermarks distance according to the
>* scale factor in proportion to available memory, but
>@@ -6855,7 +6857,7 @@ static void __setup_per_zone_wmarks(void)
>   zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
> 
>   __mod_zone_page_state(zone, NR_ALLOC_BATCH,
>-  high_wmark_pages(zone) - low_wmark_pages(zone) -
>+  zone->fairbatch -
>   atomic_long_read(>vm_stat[NR_ALLOC_BATCH]));
> 
>   spin_unlock_irqrestore(>lock, flags);
>-- 
>2.8.2
=
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
  gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/ivb43/pixz

commit: 
  3ed3a4f0ddffece942bb2661924d87be4ce63cb7
  795ae7a0de6b834a0cc202aa55c190ef81496665
  v4.7-rc2
  1fe49ba5002a50aefd5b6c4913e61eff86ac7253

3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55   v4.7-rc2 
1fe49ba5002a50aefd5b6c4913
 -- -- 
--
   fail:runs  %reproductionfail:runs  %reproductionfail:runs  
%reproductionfail:runs
   | | | | |
 | |
   :40%:70%:4   
50%   2:4 kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
   :4   50%   2:70%:4   
 0%:4 kmsg.Spurious_LAPIC_timer_interrupt_on_cpu
   :40%:7   14%   1:4   
25%   1:4 kmsg.igb#:#:#:exceed_max#second
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  78505362 ±  0%  -9.2%   71298182 ±  0% -11.8%   69280014 ±  0%  
-9.1%   71350485 ±  0%  pixz.throughput
   5586220 ±  2%  -1.6%  

[PATCH 5/5] Staging: comedi: dmm32at: Prefer using the BIT macro

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes all occurences of (1<
---
 drivers/staging/comedi/drivers/dmm32at.c | 86 
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dmm32at.c 
b/drivers/staging/comedi/drivers/dmm32at.c
index 958c0d4..4ca6104 100644
--- a/drivers/staging/comedi/drivers/dmm32at.c
+++ b/drivers/staging/comedi/drivers/dmm32at.c
@@ -46,73 +46,73 @@
 #define DMM32AT_AI_START_CONV_REG  0x00
 #define DMM32AT_AI_LSB_REG 0x00
 #define DMM32AT_AUX_DOUT_REG   0x01
-#define DMM32AT_AUX_DOUT2  (1 << 2)  /* J3.42 - OUT2 (OUT2EN) */
-#define DMM32AT_AUX_DOUT1  (1 << 1)  /* J3.43 */
-#define DMM32AT_AUX_DOUT0  (1 << 0)  /* J3.44 - OUT0 (OUT0EN) */
+#define DMM32AT_AUX_DOUT2  BIT(2)  /* J3.42 - OUT2 (OUT2EN) */
+#define DMM32AT_AUX_DOUT1  BIT(1)  /* J3.43 */
+#define DMM32AT_AUX_DOUT0  BIT(0)  /* J3.44 - OUT0 (OUT0EN) */
 #define DMM32AT_AI_MSB_REG 0x01
 #define DMM32AT_AI_LO_CHAN_REG 0x02
 #define DMM32AT_AI_HI_CHAN_REG 0x03
 #define DMM32AT_AUX_DI_REG 0x04
-#define DMM32AT_AUX_DI_DACBUSY (1 << 7)
-#define DMM32AT_AUX_DI_CALBUSY (1 << 6)
-#define DMM32AT_AUX_DI3(1 << 3)  /* J3.45 - ADCLK 
(CLKSEL) */
-#define DMM32AT_AUX_DI2(1 << 2)  /* J3.46 - GATE12 
(GT12EN) */
-#define DMM32AT_AUX_DI1(1 << 1)  /* J3.47 - GATE0 
(GT0EN) */
-#define DMM32AT_AUX_DI0(1 << 0)  /* J3.48 - CLK0 
(SRC0) */
+#define DMM32AT_AUX_DI_DACBUSY BIT(7)
+#define DMM32AT_AUX_DI_CALBUSY BIT(6)
+#define DMM32AT_AUX_DI3BIT(3)  /* J3.45 - ADCLK 
(CLKSEL) */
+#define DMM32AT_AUX_DI2BIT(2)  /* J3.46 - GATE12 
(GT12EN) */
+#define DMM32AT_AUX_DI1BIT(1)  /* J3.47 - GATE0 
(GT0EN) */
+#define DMM32AT_AUX_DI0BIT(0)  /* J3.48 - CLK0 (SRC0) 
*/
 #define DMM32AT_AO_LSB_REG 0x04
 #define DMM32AT_AO_MSB_REG 0x05
 #define DMM32AT_AO_MSB_DACH(x) ((x) << 6)
 #define DMM32AT_FIFO_DEPTH_REG 0x06
 #define DMM32AT_FIFO_CTRL_REG  0x07
-#define DMM32AT_FIFO_CTRL_FIFOEN   (1 << 3)
-#define DMM32AT_FIFO_CTRL_SCANEN   (1 << 2)
-#define DMM32AT_FIFO_CTRL_FIFORST  (1 << 1)
+#define DMM32AT_FIFO_CTRL_FIFOEN   BIT(3)
+#define DMM32AT_FIFO_CTRL_SCANEN   BIT(2)
+#define DMM32AT_FIFO_CTRL_FIFORST  BIT(1)
 #define DMM32AT_FIFO_STATUS_REG0x07
-#define DMM32AT_FIFO_STATUS_EF (1 << 7)
-#define DMM32AT_FIFO_STATUS_HF (1 << 6)
-#define DMM32AT_FIFO_STATUS_FF (1 << 5)
-#define DMM32AT_FIFO_STATUS_OVF(1 << 4)
-#define DMM32AT_FIFO_STATUS_FIFOEN (1 << 3)
-#define DMM32AT_FIFO_STATUS_SCANEN (1 << 2)
+#define DMM32AT_FIFO_STATUS_EF BIT(7)
+#define DMM32AT_FIFO_STATUS_HF BIT(6)
+#define DMM32AT_FIFO_STATUS_FF BIT(5)
+#define DMM32AT_FIFO_STATUS_OVFBIT(4)
+#define DMM32AT_FIFO_STATUS_FIFOEN BIT(3)
+#define DMM32AT_FIFO_STATUS_SCANEN BIT(2)
 #define DMM32AT_FIFO_STATUS_PAGE_MASK  (3 << 0)
 #define DMM32AT_CTRL_REG   0x08
-#define DMM32AT_CTRL_RESETA(1 << 5)
-#define DMM32AT_CTRL_RESETD(1 << 4)
-#define DMM32AT_CTRL_INTRST(1 << 3)
+#define DMM32AT_CTRL_RESETABIT(5)
+#define DMM32AT_CTRL_RESETDBIT(4)
+#define DMM32AT_CTRL_INTRSTBIT(3)
 #define DMM32AT_CTRL_PAGE_8254 (0 << 0)
-#define DMM32AT_CTRL_PAGE_8255 (1 << 0)
+#define DMM32AT_CTRL_PAGE_8255 BIT(0)
 #define DMM32AT_CTRL_PAGE_CALIB(3 << 0)
 #define DMM32AT_AI_STATUS_REG  0x08
-#define DMM32AT_AI_STATUS_STS  (1 << 7)
-#define DMM32AT_AI_STATUS_SD1  (1 << 6)
-#define DMM32AT_AI_STATUS_SD0  (1 << 5)
+#define DMM32AT_AI_STATUS_STS  BIT(7)
+#define DMM32AT_AI_STATUS_SD1  BIT(6)
+#define DMM32AT_AI_STATUS_SD0  BIT(5)
 #define DMM32AT_AI_STATUS_ADCH_MASK(0x1f << 0)
 #define DMM32AT_INTCLK_REG 0x09
-#define DMM32AT_INTCLK_ADINT   (1 << 7)
-#define DMM32AT_INTCLK_DINT(1 << 6)
-#define DMM32AT_INTCLK_TINT(1 << 5)
-#define DMM32AT_INTCLK_CLKEN   (1 << 1)  /* 1=see below  0=software */
-#define DMM32AT_INTCLK_CLKSEL  (1 << 0)  /* 1=OUT2  0=EXTCLK */
+#define DMM32AT_INTCLK_ADINT   BIT(7)
+#define DMM32AT_INTCLK_DINTBIT(6)
+#define DMM32AT_INTCLK_TINTBIT(5)
+#define DMM32AT_INTCLK_CLKEN   BIT(1)  /* 1=see below  0=software */
+#define DMM32AT_INTCLK_CLKSEL  BIT(0)  /* 1=OUT2  0=EXTCLK */
 #define 

[PATCH 5/5] Staging: comedi: dmm32at: Prefer using the BIT macro

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes all occurences of (1<
---
 drivers/staging/comedi/drivers/dmm32at.c | 86 
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dmm32at.c 
b/drivers/staging/comedi/drivers/dmm32at.c
index 958c0d4..4ca6104 100644
--- a/drivers/staging/comedi/drivers/dmm32at.c
+++ b/drivers/staging/comedi/drivers/dmm32at.c
@@ -46,73 +46,73 @@
 #define DMM32AT_AI_START_CONV_REG  0x00
 #define DMM32AT_AI_LSB_REG 0x00
 #define DMM32AT_AUX_DOUT_REG   0x01
-#define DMM32AT_AUX_DOUT2  (1 << 2)  /* J3.42 - OUT2 (OUT2EN) */
-#define DMM32AT_AUX_DOUT1  (1 << 1)  /* J3.43 */
-#define DMM32AT_AUX_DOUT0  (1 << 0)  /* J3.44 - OUT0 (OUT0EN) */
+#define DMM32AT_AUX_DOUT2  BIT(2)  /* J3.42 - OUT2 (OUT2EN) */
+#define DMM32AT_AUX_DOUT1  BIT(1)  /* J3.43 */
+#define DMM32AT_AUX_DOUT0  BIT(0)  /* J3.44 - OUT0 (OUT0EN) */
 #define DMM32AT_AI_MSB_REG 0x01
 #define DMM32AT_AI_LO_CHAN_REG 0x02
 #define DMM32AT_AI_HI_CHAN_REG 0x03
 #define DMM32AT_AUX_DI_REG 0x04
-#define DMM32AT_AUX_DI_DACBUSY (1 << 7)
-#define DMM32AT_AUX_DI_CALBUSY (1 << 6)
-#define DMM32AT_AUX_DI3(1 << 3)  /* J3.45 - ADCLK 
(CLKSEL) */
-#define DMM32AT_AUX_DI2(1 << 2)  /* J3.46 - GATE12 
(GT12EN) */
-#define DMM32AT_AUX_DI1(1 << 1)  /* J3.47 - GATE0 
(GT0EN) */
-#define DMM32AT_AUX_DI0(1 << 0)  /* J3.48 - CLK0 
(SRC0) */
+#define DMM32AT_AUX_DI_DACBUSY BIT(7)
+#define DMM32AT_AUX_DI_CALBUSY BIT(6)
+#define DMM32AT_AUX_DI3BIT(3)  /* J3.45 - ADCLK 
(CLKSEL) */
+#define DMM32AT_AUX_DI2BIT(2)  /* J3.46 - GATE12 
(GT12EN) */
+#define DMM32AT_AUX_DI1BIT(1)  /* J3.47 - GATE0 
(GT0EN) */
+#define DMM32AT_AUX_DI0BIT(0)  /* J3.48 - CLK0 (SRC0) 
*/
 #define DMM32AT_AO_LSB_REG 0x04
 #define DMM32AT_AO_MSB_REG 0x05
 #define DMM32AT_AO_MSB_DACH(x) ((x) << 6)
 #define DMM32AT_FIFO_DEPTH_REG 0x06
 #define DMM32AT_FIFO_CTRL_REG  0x07
-#define DMM32AT_FIFO_CTRL_FIFOEN   (1 << 3)
-#define DMM32AT_FIFO_CTRL_SCANEN   (1 << 2)
-#define DMM32AT_FIFO_CTRL_FIFORST  (1 << 1)
+#define DMM32AT_FIFO_CTRL_FIFOEN   BIT(3)
+#define DMM32AT_FIFO_CTRL_SCANEN   BIT(2)
+#define DMM32AT_FIFO_CTRL_FIFORST  BIT(1)
 #define DMM32AT_FIFO_STATUS_REG0x07
-#define DMM32AT_FIFO_STATUS_EF (1 << 7)
-#define DMM32AT_FIFO_STATUS_HF (1 << 6)
-#define DMM32AT_FIFO_STATUS_FF (1 << 5)
-#define DMM32AT_FIFO_STATUS_OVF(1 << 4)
-#define DMM32AT_FIFO_STATUS_FIFOEN (1 << 3)
-#define DMM32AT_FIFO_STATUS_SCANEN (1 << 2)
+#define DMM32AT_FIFO_STATUS_EF BIT(7)
+#define DMM32AT_FIFO_STATUS_HF BIT(6)
+#define DMM32AT_FIFO_STATUS_FF BIT(5)
+#define DMM32AT_FIFO_STATUS_OVFBIT(4)
+#define DMM32AT_FIFO_STATUS_FIFOEN BIT(3)
+#define DMM32AT_FIFO_STATUS_SCANEN BIT(2)
 #define DMM32AT_FIFO_STATUS_PAGE_MASK  (3 << 0)
 #define DMM32AT_CTRL_REG   0x08
-#define DMM32AT_CTRL_RESETA(1 << 5)
-#define DMM32AT_CTRL_RESETD(1 << 4)
-#define DMM32AT_CTRL_INTRST(1 << 3)
+#define DMM32AT_CTRL_RESETABIT(5)
+#define DMM32AT_CTRL_RESETDBIT(4)
+#define DMM32AT_CTRL_INTRSTBIT(3)
 #define DMM32AT_CTRL_PAGE_8254 (0 << 0)
-#define DMM32AT_CTRL_PAGE_8255 (1 << 0)
+#define DMM32AT_CTRL_PAGE_8255 BIT(0)
 #define DMM32AT_CTRL_PAGE_CALIB(3 << 0)
 #define DMM32AT_AI_STATUS_REG  0x08
-#define DMM32AT_AI_STATUS_STS  (1 << 7)
-#define DMM32AT_AI_STATUS_SD1  (1 << 6)
-#define DMM32AT_AI_STATUS_SD0  (1 << 5)
+#define DMM32AT_AI_STATUS_STS  BIT(7)
+#define DMM32AT_AI_STATUS_SD1  BIT(6)
+#define DMM32AT_AI_STATUS_SD0  BIT(5)
 #define DMM32AT_AI_STATUS_ADCH_MASK(0x1f << 0)
 #define DMM32AT_INTCLK_REG 0x09
-#define DMM32AT_INTCLK_ADINT   (1 << 7)
-#define DMM32AT_INTCLK_DINT(1 << 6)
-#define DMM32AT_INTCLK_TINT(1 << 5)
-#define DMM32AT_INTCLK_CLKEN   (1 << 1)  /* 1=see below  0=software */
-#define DMM32AT_INTCLK_CLKSEL  (1 << 0)  /* 1=OUT2  0=EXTCLK */
+#define DMM32AT_INTCLK_ADINT   BIT(7)
+#define DMM32AT_INTCLK_DINTBIT(6)
+#define DMM32AT_INTCLK_TINTBIT(5)
+#define DMM32AT_INTCLK_CLKEN   BIT(1)  /* 1=see below  0=software */
+#define DMM32AT_INTCLK_CLKSEL  BIT(0)  /* 1=OUT2  0=EXTCLK */
 #define DMM32AT_CTRDIO_CFG_REG 0x0a
-#define DMM32AT_CTRDIO_CFG_FREQ12  (1 << 7)  /* CLK12 1=100KHz 0=10MHz */
-#define DMM32AT_CTRDIO_CFG_FREQ0   (1 << 6)  /* CLK0  1=10KHz  

Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Heiko Carstens
On Wed, Jun 08, 2016 at 07:17:35AM +0200, Christian Borntraeger wrote:
> On 06/07/2016 11:49 PM, Sudip Mukherjee wrote:
> > etr_ptff definitions are moved and renamed but we missed updating them
> > here and as a result s390 defconfig and allmodconfig was failing with
> > the error:
> > arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> > 
> > Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> > Signed-off-by: Sudip Mukherjee 
> 
> Thank you for the report and patch.
> 
> This is linux-next only. Its a conflict between my kvms390 queue and 
> Martins s390 queue. We cannot apply this directly as it would break
> the build of my tree when not merged in next. (and it does not apply
> on Martins tree).
> 
> I will have a look how to fix that up.

We could ask Stephen Rothwell to apply the patch only to linux-next? ;)

> > ---
> > 
> > s390 defconfig build log is at:
> > https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> > 
> >  arch/s390/kvm/kvm-s390.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index fa51aef..3039eaf 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -29,7 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
> > }
> > 
> > if (test_facility(28)) /* TOD-clock steering */
> > -   etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> > +   ptff(kvm_s390_available_subfunc.ptff,
> > +sizeof(kvm_s390_available_subfunc.ptff),
> > +PTFF_QAF);
> > 
> > if (test_facility(17)) { /* MSA */
> > __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);
> > 
> 



Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Heiko Carstens
On Wed, Jun 08, 2016 at 07:17:35AM +0200, Christian Borntraeger wrote:
> On 06/07/2016 11:49 PM, Sudip Mukherjee wrote:
> > etr_ptff definitions are moved and renamed but we missed updating them
> > here and as a result s390 defconfig and allmodconfig was failing with
> > the error:
> > arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> > 
> > Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> > Signed-off-by: Sudip Mukherjee 
> 
> Thank you for the report and patch.
> 
> This is linux-next only. Its a conflict between my kvms390 queue and 
> Martins s390 queue. We cannot apply this directly as it would break
> the build of my tree when not merged in next. (and it does not apply
> on Martins tree).
> 
> I will have a look how to fix that up.

We could ask Stephen Rothwell to apply the patch only to linux-next? ;)

> > ---
> > 
> > s390 defconfig build log is at:
> > https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> > 
> >  arch/s390/kvm/kvm-s390.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index fa51aef..3039eaf 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -29,7 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
> > }
> > 
> > if (test_facility(28)) /* TOD-clock steering */
> > -   etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> > +   ptff(kvm_s390_available_subfunc.ptff,
> > +sizeof(kvm_s390_available_subfunc.ptff),
> > +PTFF_QAF);
> > 
> > if (test_facility(17)) { /* MSA */
> > __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);
> > 
> 



Re: [PATCH v6 3/6] crypto: AF_ALG -- add asymmetric cipher interface

2016-06-07 Thread Stephan Mueller
Am Dienstag, 7. Juni 2016, 17:28:07 schrieb Mat Martineau:

Hi Mat,

> > +   used = ctx->used;
> > +
> > +   /* convert iovecs of output buffers into scatterlists */
> > +   while (iov_iter_count(>msg_iter)) {
> > +   /* make one iovec available as scatterlist */
> > +   err = af_alg_make_sg(>rsgl[cnt], >msg_iter,
> > +iov_iter_count(>msg_iter));
> > +   if (err < 0)
> > +   goto unlock;
> > +   usedpages += err;
> > +   /* chain the new scatterlist with previous one */
> > +   if (cnt)
> > +   af_alg_link_sg(>rsgl[cnt - 1], >rsgl[cnt]);
> > +
> > +   iov_iter_advance(>msg_iter, err);
> > +   cnt++;
> > +   }
> > +
> > +   /* ensure output buffer is sufficiently large */
> > +   if (usedpages < akcipher_calcsize(ctx)) {
> > +   err = -EMSGSIZE;
> > +   goto unlock;
> > +   }
> 
> Why is the size of the output buffer enforced here instead of depending on
> the algorithm implementation?

akcipher_calcsize calls crypto_akcipher_maxsize to get the maximum size the 
algorithm generates as output during its operation.

The code ensures that the caller provided at least that amount of memory for 
the kernel to store its data in. This check therefore is present to ensure the 
kernel does not overstep memory boundaries in user space.

What is your concern?

Thanks

Ciao
Stephan


[PATCH 3/5] Staging: comedi: das800: fix comment issue

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a WARNING: 'Block comments use a trailing */ on a
separate line'found by the checkpatch.pl tool

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das800.c | 102 
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das800.c 
b/drivers/staging/comedi/drivers/das800.c
index b02f122..0680d87 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -1,56 +1,56 @@
 /*
-comedi/drivers/das800.c
-Driver for Keitley das800 series boards and compatibles
-Copyright (C) 2000 Frank Mori Hess 
-
-COMEDI - Linux Control and Measurement Device Interface
-Copyright (C) 2000 David A. Schleef 
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-*/
+ * comedi/drivers/das800.c
+ * Driver for Keitley das800 series boards and compatibles
+ * Copyright (C) 2000 Frank Mori Hess 
+ *
+ * COMEDI - Linux Control and Measurement Device Interface
+ * Copyright (C) 2000 David A. Schleef 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
 /*
-Driver: das800
-Description: Keithley Metrabyte DAS800 (& compatibles)
-Author: Frank Mori Hess 
-Devices: [Keithley Metrabyte] DAS-800 (das-800), DAS-801 (das-801),
-  DAS-802 (das-802),
-  [Measurement Computing] CIO-DAS800 (cio-das800),
-  CIO-DAS801 (cio-das801), CIO-DAS802 (cio-das802),
-  CIO-DAS802/16 (cio-das802/16)
-Status: works, cio-das802/16 untested - email me if you have tested it
-
-Configuration options:
-  [0] - I/O port base address
-  [1] - IRQ (optional, required for timed or externally triggered conversions)
-
-Notes:
-   IRQ can be omitted, although the cmd interface will not work without it.
-
-   All entries in the channel/gain list must use the same gain and be
-   consecutive channels counting upwards in channel number (these are
-   hardware limitations.)
-
-   I've never tested the gain setting stuff since I only have a
-   DAS-800 board with fixed gain.
-
-   The cio-das802/16 does not have a fifo-empty status bit!  Therefore
-   only fifo-half-full transfers are possible with this card.
-
-cmd triggers supported:
-   start_src:  TRIG_NOW | TRIG_EXT
-   scan_begin_src: TRIG_FOLLOW
-   scan_end_src:   TRIG_COUNT
-   convert_src:TRIG_TIMER | TRIG_EXT
-   stop_src:   TRIG_NONE | TRIG_COUNT
-*/
+ * Driver: das800
+ * Description: Keithley Metrabyte DAS800 (& compatibles)
+ * Author: Frank Mori Hess 
+ * Devices: [Keithley Metrabyte] DAS-800 (das-800), DAS-801 (das-801),
+ * DAS-802 (das-802),
+ * [Measurement Computing] CIO-DAS800 (cio-das800),
+ * CIO-DAS801 (cio-das801), CIO-DAS802 (cio-das802),
+ * CIO-DAS802/16 (cio-das802/16)
+ * Status: works, cio-das802/16 untested - email me if you have tested it
+ *
+ * Configuration options:
+ * [0] - I/O port base address
+ *  [1] - IRQ (optional, required for timed or externally triggered 
conversions)
+ *
+ * Notes:
+ * IRQ can be omitted, although the cmd interface will not work without it.
+ *
+ * All entries in the channel/gain list must use the same gain and be
+ * consecutive channels counting upwards in channel number (these are
+ * hardware limitations.)
+ *
+ * I've never tested the gain setting stuff since I only have a
+ * DAS-800 board with fixed gain.
+ *
+ * The cio-das802/16 does not have a fifo-empty status bit!  Therefore
+ * only fifo-half-full transfers are possible with this card.
+ *
+ * cmd triggers supported:
+ * start_src:  TRIG_NOW | TRIG_EXT
+ * scan_begin_src: TRIG_FOLLOW
+ * scan_end_src:   TRIG_COUNT
+ * convert_src:TRIG_TIMER | TRIG_EXT
+ * stop_src:   TRIG_NONE | TRIG_COUNT
+ */
 
 #include 
 #include 
-- 
1.9.1



Re: [PATCH v6 3/6] crypto: AF_ALG -- add asymmetric cipher interface

2016-06-07 Thread Stephan Mueller
Am Dienstag, 7. Juni 2016, 17:28:07 schrieb Mat Martineau:

Hi Mat,

> > +   used = ctx->used;
> > +
> > +   /* convert iovecs of output buffers into scatterlists */
> > +   while (iov_iter_count(>msg_iter)) {
> > +   /* make one iovec available as scatterlist */
> > +   err = af_alg_make_sg(>rsgl[cnt], >msg_iter,
> > +iov_iter_count(>msg_iter));
> > +   if (err < 0)
> > +   goto unlock;
> > +   usedpages += err;
> > +   /* chain the new scatterlist with previous one */
> > +   if (cnt)
> > +   af_alg_link_sg(>rsgl[cnt - 1], >rsgl[cnt]);
> > +
> > +   iov_iter_advance(>msg_iter, err);
> > +   cnt++;
> > +   }
> > +
> > +   /* ensure output buffer is sufficiently large */
> > +   if (usedpages < akcipher_calcsize(ctx)) {
> > +   err = -EMSGSIZE;
> > +   goto unlock;
> > +   }
> 
> Why is the size of the output buffer enforced here instead of depending on
> the algorithm implementation?

akcipher_calcsize calls crypto_akcipher_maxsize to get the maximum size the 
algorithm generates as output during its operation.

The code ensures that the caller provided at least that amount of memory for 
the kernel to store its data in. This check therefore is present to ensure the 
kernel does not overstep memory boundaries in user space.

What is your concern?

Thanks

Ciao
Stephan


[PATCH 3/5] Staging: comedi: das800: fix comment issue

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a WARNING: 'Block comments use a trailing */ on a
separate line'found by the checkpatch.pl tool

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das800.c | 102 
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das800.c 
b/drivers/staging/comedi/drivers/das800.c
index b02f122..0680d87 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -1,56 +1,56 @@
 /*
-comedi/drivers/das800.c
-Driver for Keitley das800 series boards and compatibles
-Copyright (C) 2000 Frank Mori Hess 
-
-COMEDI - Linux Control and Measurement Device Interface
-Copyright (C) 2000 David A. Schleef 
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-*/
+ * comedi/drivers/das800.c
+ * Driver for Keitley das800 series boards and compatibles
+ * Copyright (C) 2000 Frank Mori Hess 
+ *
+ * COMEDI - Linux Control and Measurement Device Interface
+ * Copyright (C) 2000 David A. Schleef 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
 /*
-Driver: das800
-Description: Keithley Metrabyte DAS800 (& compatibles)
-Author: Frank Mori Hess 
-Devices: [Keithley Metrabyte] DAS-800 (das-800), DAS-801 (das-801),
-  DAS-802 (das-802),
-  [Measurement Computing] CIO-DAS800 (cio-das800),
-  CIO-DAS801 (cio-das801), CIO-DAS802 (cio-das802),
-  CIO-DAS802/16 (cio-das802/16)
-Status: works, cio-das802/16 untested - email me if you have tested it
-
-Configuration options:
-  [0] - I/O port base address
-  [1] - IRQ (optional, required for timed or externally triggered conversions)
-
-Notes:
-   IRQ can be omitted, although the cmd interface will not work without it.
-
-   All entries in the channel/gain list must use the same gain and be
-   consecutive channels counting upwards in channel number (these are
-   hardware limitations.)
-
-   I've never tested the gain setting stuff since I only have a
-   DAS-800 board with fixed gain.
-
-   The cio-das802/16 does not have a fifo-empty status bit!  Therefore
-   only fifo-half-full transfers are possible with this card.
-
-cmd triggers supported:
-   start_src:  TRIG_NOW | TRIG_EXT
-   scan_begin_src: TRIG_FOLLOW
-   scan_end_src:   TRIG_COUNT
-   convert_src:TRIG_TIMER | TRIG_EXT
-   stop_src:   TRIG_NONE | TRIG_COUNT
-*/
+ * Driver: das800
+ * Description: Keithley Metrabyte DAS800 (& compatibles)
+ * Author: Frank Mori Hess 
+ * Devices: [Keithley Metrabyte] DAS-800 (das-800), DAS-801 (das-801),
+ * DAS-802 (das-802),
+ * [Measurement Computing] CIO-DAS800 (cio-das800),
+ * CIO-DAS801 (cio-das801), CIO-DAS802 (cio-das802),
+ * CIO-DAS802/16 (cio-das802/16)
+ * Status: works, cio-das802/16 untested - email me if you have tested it
+ *
+ * Configuration options:
+ * [0] - I/O port base address
+ *  [1] - IRQ (optional, required for timed or externally triggered 
conversions)
+ *
+ * Notes:
+ * IRQ can be omitted, although the cmd interface will not work without it.
+ *
+ * All entries in the channel/gain list must use the same gain and be
+ * consecutive channels counting upwards in channel number (these are
+ * hardware limitations.)
+ *
+ * I've never tested the gain setting stuff since I only have a
+ * DAS-800 board with fixed gain.
+ *
+ * The cio-das802/16 does not have a fifo-empty status bit!  Therefore
+ * only fifo-half-full transfers are possible with this card.
+ *
+ * cmd triggers supported:
+ * start_src:  TRIG_NOW | TRIG_EXT
+ * scan_begin_src: TRIG_FOLLOW
+ * scan_end_src:   TRIG_COUNT
+ * convert_src:TRIG_TIMER | TRIG_EXT
+ * stop_src:   TRIG_NONE | TRIG_COUNT
+ */
 
 #include 
 #include 
-- 
1.9.1



[PATCH 1/5] Staging: comedi: das16: fix blank line

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a blank line after function/struct/union/enum check found
by the checkpatch.pl tool

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das16.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/comedi/drivers/das16.c 
b/drivers/staging/comedi/drivers/das16.c
index fd8e0b7..4d6e581 100644
--- a/drivers/staging/comedi/drivers/das16.c
+++ b/drivers/staging/comedi/drivers/das16.c
@@ -198,6 +198,7 @@ enum {
das16_pg_1601,
das16_pg_1602,
 };
+
 static const int *const das16_gainlists[] = {
NULL,
das16jr_gainlist,
-- 
1.9.1



[PATCH 1/5] Staging: comedi: das16: fix blank line

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a blank line after function/struct/union/enum check found
by the checkpatch.pl tool

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das16.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/comedi/drivers/das16.c 
b/drivers/staging/comedi/drivers/das16.c
index fd8e0b7..4d6e581 100644
--- a/drivers/staging/comedi/drivers/das16.c
+++ b/drivers/staging/comedi/drivers/das16.c
@@ -198,6 +198,7 @@ enum {
das16_pg_1601,
das16_pg_1602,
 };
+
 static const int *const das16_gainlists[] = {
NULL,
das16jr_gainlist,
-- 
1.9.1



[PATCH 2/5] Staging: comedi: das16: fix Block comment

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a WARNING: 'Block comments use a trailing */ on a
separate line'found by the checkpatch.pl tool.

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das16.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das16.c 
b/drivers/staging/comedi/drivers/das16.c
index 4d6e581..ef345dc 100644
--- a/drivers/staging/comedi/drivers/das16.c
+++ b/drivers/staging/comedi/drivers/das16.c
@@ -429,8 +429,10 @@ static const struct das16_board das16_boards[] = {
},
 };
 
-/* Period for timer interrupt in jiffies.  It's a function
- * to deal with possibility of dynamic HZ patches  */
+/*
+ * Period for timer interrupt in jiffies.  It's a function
+ * to deal with possibility of dynamic HZ patches
+ */
 static inline int timer_period(void)
 {
return HZ / 20;
-- 
1.9.1



[PATCH 4/5] Staging: comedi: das800: Prefer unsigned int instead of unsigned

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
found by the checkpatch.pl tool.

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das800.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das800.c 
b/drivers/staging/comedi/drivers/das800.c
index 0680d87..ef48c48 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -218,7 +218,7 @@ struct das800_private {
 };
 
 static void das800_ind_write(struct comedi_device *dev,
-unsigned val, unsigned reg)
+unsigned int val, unsigned int reg)
 {
/*
 * Select dev->iobase + 2 to be desired register
@@ -228,7 +228,7 @@ static void das800_ind_write(struct comedi_device *dev,
outb(val, dev->iobase + 2);
 }
 
-static unsigned das800_ind_read(struct comedi_device *dev, unsigned reg)
+static unsigned int das800_ind_read(struct comedi_device *dev, unsigned int 
reg)
 {
/*
 * Select dev->iobase + 7 to be desired register
-- 
1.9.1



[PATCH 2/5] Staging: comedi: das16: fix Block comment

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a WARNING: 'Block comments use a trailing */ on a
separate line'found by the checkpatch.pl tool.

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das16.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das16.c 
b/drivers/staging/comedi/drivers/das16.c
index 4d6e581..ef345dc 100644
--- a/drivers/staging/comedi/drivers/das16.c
+++ b/drivers/staging/comedi/drivers/das16.c
@@ -429,8 +429,10 @@ static const struct das16_board das16_boards[] = {
},
 };
 
-/* Period for timer interrupt in jiffies.  It's a function
- * to deal with possibility of dynamic HZ patches  */
+/*
+ * Period for timer interrupt in jiffies.  It's a function
+ * to deal with possibility of dynamic HZ patches
+ */
 static inline int timer_period(void)
 {
return HZ / 20;
-- 
1.9.1



[PATCH 4/5] Staging: comedi: das800: Prefer unsigned int instead of unsigned

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya
This fixes up a WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
found by the checkpatch.pl tool.

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das800.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das800.c 
b/drivers/staging/comedi/drivers/das800.c
index 0680d87..ef48c48 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -218,7 +218,7 @@ struct das800_private {
 };
 
 static void das800_ind_write(struct comedi_device *dev,
-unsigned val, unsigned reg)
+unsigned int val, unsigned int reg)
 {
/*
 * Select dev->iobase + 2 to be desired register
@@ -228,7 +228,7 @@ static void das800_ind_write(struct comedi_device *dev,
outb(val, dev->iobase + 2);
 }
 
-static unsigned das800_ind_read(struct comedi_device *dev, unsigned reg)
+static unsigned int das800_ind_read(struct comedi_device *dev, unsigned int 
reg)
 {
/*
 * Select dev->iobase + 7 to be desired register
-- 
1.9.1



Re: [PATCH 1/3] usb: pci: Remove unnecessary pci_set_drvdata().

2016-06-07 Thread Greg KH
On Wed, May 11, 2016 at 06:08:15PM +0530, Sandhya Bankar wrote:
> Unnecessary [platform|pci]_set_drvdata() have been removed since the driver 
> core clears the driver data to NULLafter device release or on probe failure. 
> There is no need to manually clear the
> device driver data to NULL.

Please fix your changelog text to be wrapped at 72 columns like it is
supposed to be.

thanks,

greg k-h


Re: [PATCH 1/3] usb: pci: Remove unnecessary pci_set_drvdata().

2016-06-07 Thread Greg KH
On Wed, May 11, 2016 at 06:08:15PM +0530, Sandhya Bankar wrote:
> Unnecessary [platform|pci]_set_drvdata() have been removed since the driver 
> core clears the driver data to NULLafter device release or on probe failure. 
> There is no need to manually clear the
> device driver data to NULL.

Please fix your changelog text to be wrapped at 72 columns like it is
supposed to be.

thanks,

greg k-h


Re: NVMe over Fabrics target implementation

2016-06-07 Thread Nicholas A. Bellinger
On Tue, 2016-06-07 at 12:55 +0200, Christoph Hellwig wrote:
> There is absolutely no point in dragging in an overcomplicated configfs 
> structure for a very simple protocol which also is very different from
> SCSI in it's nitty gritty details.

Please be more specific wrt the two individual points that have been
raised.

>  Keeping the nvme target self contains
> allows it to be both much simpler and much easier to understand, as well
> as much better testable - see the amount of test coverage we could easily
> add for example.

I disagree.

> 
> Or to put it the other way around - if there was any major synergy in
> reusing the SCSI target code that just shows we're missing functionality
> in the block layer or configfs.
> 

To reiterate the points again.

*) Extensible to multiple types of backend drivers.

nvme-target needs a way to absorb new backend drivers, that
does not effect existing configfs group layout or attributes.

Looking at the nvmet/configfs layout as-is, there are no multiple
backend types defined, nor a way to control backend feature bits
exposed to nvme namespaces at runtime.

What is being proposed is a way to share target-core backends via
existing configfs symlinks across SCSI and NVMe targets.

Which means:

   - All I/O state + memory submission is done at RCU protected
 se_device level via sbc_ops
   - percpu reference counting is done outside of target-core
   - Absorb all nvmet/io-cmd optimizations into target_core_iblock.c
   - Base starting point for features in SCSI + NVMe that span
 across multiple endpoints and instances (reservations + APTPL, 
 multipath, copy-offload across fabric types)

Using target-core backends means we get features like T10-PI and
sbc_ops->write_same for free that don't exist in nvmet, and can
utilize a common set of backend drivers for SCSI and NVMe via an
existing configfs ABI and python userspace community.

And to the second, and more important point for defining a configfs ABI
that works for both today's requirements, as well into the 2020s
without breaking user-space compatibility.

As-is, the initial design using top level nvmet configfs symlinks of
subsystem groups into individual port + host groups does not scale.

That is, it currently does:

  - Sequential list lookup under global rw_mutex of top-level nvmet_port
and nvmet_host symlink ->allow_link() and ->drop_link() configfs
callbacks.
  - nvmet_fabrics_ops->add_port() callback invoked under same global
rw mutex.

This is very bad for several reasons.

As-is, this blocks all other configfs port + host operations from
occurring even during normal operation, which makes it quite useless for
any type of multi-tenant target environment where the individual target
endpoints *must* be able to operate independently.

Seriously, there is never a good reason why configfs group or item
callbacks should be performing list lookup under a global lock at
this level.

Why does it ever make sense for $SUBSYSTEM_NQN_0 with $PORT_DRIVER_FOO
to block operation of $SUBSYSTEM_NQN_1 with $PORT_DRIVER_BAR..?

A simple example where this design breaks down quickly is a NVMf
ops->add_port() call that requires a HW reset, or say reloading of
firmware that can take multiple seconds. (qla2xxx comes to mind).

There is a simple test to highlight this limitation.  Take any
nvme-target driver that is capable of multiple ports, and introduce
a sleep(5) into each ops->add_port() call.

Now create 256 different subsystem NQNs with 256 different ports
across four different user-space processes.

What happens to other subsystems, ports and host groups configfs
symlinks when this occurs..?

What happens to the other user-space processes..?



Re: NVMe over Fabrics target implementation

2016-06-07 Thread Nicholas A. Bellinger
On Tue, 2016-06-07 at 12:55 +0200, Christoph Hellwig wrote:
> There is absolutely no point in dragging in an overcomplicated configfs 
> structure for a very simple protocol which also is very different from
> SCSI in it's nitty gritty details.

Please be more specific wrt the two individual points that have been
raised.

>  Keeping the nvme target self contains
> allows it to be both much simpler and much easier to understand, as well
> as much better testable - see the amount of test coverage we could easily
> add for example.

I disagree.

> 
> Or to put it the other way around - if there was any major synergy in
> reusing the SCSI target code that just shows we're missing functionality
> in the block layer or configfs.
> 

To reiterate the points again.

*) Extensible to multiple types of backend drivers.

nvme-target needs a way to absorb new backend drivers, that
does not effect existing configfs group layout or attributes.

Looking at the nvmet/configfs layout as-is, there are no multiple
backend types defined, nor a way to control backend feature bits
exposed to nvme namespaces at runtime.

What is being proposed is a way to share target-core backends via
existing configfs symlinks across SCSI and NVMe targets.

Which means:

   - All I/O state + memory submission is done at RCU protected
 se_device level via sbc_ops
   - percpu reference counting is done outside of target-core
   - Absorb all nvmet/io-cmd optimizations into target_core_iblock.c
   - Base starting point for features in SCSI + NVMe that span
 across multiple endpoints and instances (reservations + APTPL, 
 multipath, copy-offload across fabric types)

Using target-core backends means we get features like T10-PI and
sbc_ops->write_same for free that don't exist in nvmet, and can
utilize a common set of backend drivers for SCSI and NVMe via an
existing configfs ABI and python userspace community.

And to the second, and more important point for defining a configfs ABI
that works for both today's requirements, as well into the 2020s
without breaking user-space compatibility.

As-is, the initial design using top level nvmet configfs symlinks of
subsystem groups into individual port + host groups does not scale.

That is, it currently does:

  - Sequential list lookup under global rw_mutex of top-level nvmet_port
and nvmet_host symlink ->allow_link() and ->drop_link() configfs
callbacks.
  - nvmet_fabrics_ops->add_port() callback invoked under same global
rw mutex.

This is very bad for several reasons.

As-is, this blocks all other configfs port + host operations from
occurring even during normal operation, which makes it quite useless for
any type of multi-tenant target environment where the individual target
endpoints *must* be able to operate independently.

Seriously, there is never a good reason why configfs group or item
callbacks should be performing list lookup under a global lock at
this level.

Why does it ever make sense for $SUBSYSTEM_NQN_0 with $PORT_DRIVER_FOO
to block operation of $SUBSYSTEM_NQN_1 with $PORT_DRIVER_BAR..?

A simple example where this design breaks down quickly is a NVMf
ops->add_port() call that requires a HW reset, or say reloading of
firmware that can take multiple seconds. (qla2xxx comes to mind).

There is a simple test to highlight this limitation.  Take any
nvme-target driver that is capable of multiple ports, and introduce
a sleep(5) into each ops->add_port() call.

Now create 256 different subsystem NQNs with 256 different ports
across four different user-space processes.

What happens to other subsystems, ports and host groups configfs
symlinks when this occurs..?

What happens to the other user-space processes..?



Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin
On Tue, 2016-06-07 at 22:49 -0600, Jens Axboe wrote:
> On 06/06/2016 03:21 PM, Christoph Hellwig wrote:
> > From: Ming Lin 
> > 
> > For some protocols like NVMe over Fabrics we need to be able to
> > send
> > initialization commands to a specific queue.
> > 
> > Based on an earlier patch from Christoph Hellwig .
> > 
> > Signed-off-by: Ming Lin 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >   block/blk-mq.c | 33 +
> >   include/linux/blk-mq.h |  2 ++
> >   2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 29cbc1b..7bb45ed 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct
> > request_queue *q, int rw,
> >   }
> >   EXPORT_SYMBOL(blk_mq_alloc_request);
> > 
> > +struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> > int rw,
> > +   unsigned int flags, unsigned int hctx_idx)
> > +{
> > +   struct blk_mq_hw_ctx *hctx;
> > +   struct blk_mq_ctx *ctx;
> > +   struct request *rq;
> > +   struct blk_mq_alloc_data alloc_data;
> > +   int ret;
> > +
> > +   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> > +   if (ret)
> > +   return ERR_PTR(ret);
> > +
> > +   hctx = q->queue_hw_ctx[hctx_idx];
> > +   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
> > +
> > +   blk_mq_set_alloc_data(_data, q, flags, ctx, hctx);
> > +
> > +   rq = __blk_mq_alloc_request(_data, rw);
> > +   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
> > +   __blk_mq_run_hw_queue(hctx);
> > +
> > +   rq =  __blk_mq_alloc_request(_data, rw);
> > +   }
> 
> Why are we duplicating this code here? If NOWAIT isn't set, then
> we'll
> always return a request. bt_get() will run the queue for us, if it
> needs
> to. blk_mq_alloc_request() does this too, and I'm guessing that code
> was
> just copied. I'll fix that up. Looks like this should just be:
> 
>   rq = __blk_mq_alloc_request(_data, rw);
>   if (rq)
>   return rq;
> 
>   blk_queue_exit(q);
>   return ERR_PTR(-EWOULDBLOCK);
> 
> for this case.

Yes,

But the bt_get() reminds me that this patch actually has a problem.

blk_mq_alloc_request_hctx() ->
  __blk_mq_alloc_request() ->
    blk_mq_get_tag() -> 
      __blk_mq_get_tag() ->
        bt_get() ->
          blk_mq_put_ctx(data->ctx);

Here are blk_mq_get_ctx() and blk_mq_put_ctx().

static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
{   
return __blk_mq_get_ctx(q, get_cpu());
} 

static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
{
put_cpu();
}

blk_mq_alloc_request_hctx() calls __blk_mq_get_ctx() instead
of blk_mq_get_ctx(). Then reason is the "hctx" could belong to other
cpu. So blk_mq_get_ctx() doesn't work.

But then above put_cpu() in blk_mq_put_ctx() will trigger a WARNING
because we didn't do get_cpu() in blk_mq_alloc_request_hctx()


Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin
On Tue, 2016-06-07 at 22:49 -0600, Jens Axboe wrote:
> On 06/06/2016 03:21 PM, Christoph Hellwig wrote:
> > From: Ming Lin 
> > 
> > For some protocols like NVMe over Fabrics we need to be able to
> > send
> > initialization commands to a specific queue.
> > 
> > Based on an earlier patch from Christoph Hellwig .
> > 
> > Signed-off-by: Ming Lin 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >   block/blk-mq.c | 33 +
> >   include/linux/blk-mq.h |  2 ++
> >   2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 29cbc1b..7bb45ed 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct
> > request_queue *q, int rw,
> >   }
> >   EXPORT_SYMBOL(blk_mq_alloc_request);
> > 
> > +struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> > int rw,
> > +   unsigned int flags, unsigned int hctx_idx)
> > +{
> > +   struct blk_mq_hw_ctx *hctx;
> > +   struct blk_mq_ctx *ctx;
> > +   struct request *rq;
> > +   struct blk_mq_alloc_data alloc_data;
> > +   int ret;
> > +
> > +   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> > +   if (ret)
> > +   return ERR_PTR(ret);
> > +
> > +   hctx = q->queue_hw_ctx[hctx_idx];
> > +   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
> > +
> > +   blk_mq_set_alloc_data(_data, q, flags, ctx, hctx);
> > +
> > +   rq = __blk_mq_alloc_request(_data, rw);
> > +   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
> > +   __blk_mq_run_hw_queue(hctx);
> > +
> > +   rq =  __blk_mq_alloc_request(_data, rw);
> > +   }
> 
> Why are we duplicating this code here? If NOWAIT isn't set, then
> we'll
> always return a request. bt_get() will run the queue for us, if it
> needs
> to. blk_mq_alloc_request() does this too, and I'm guessing that code
> was
> just copied. I'll fix that up. Looks like this should just be:
> 
>   rq = __blk_mq_alloc_request(_data, rw);
>   if (rq)
>   return rq;
> 
>   blk_queue_exit(q);
>   return ERR_PTR(-EWOULDBLOCK);
> 
> for this case.

Yes,

But the bt_get() reminds me that this patch actually has a problem.

blk_mq_alloc_request_hctx() ->
  __blk_mq_alloc_request() ->
    blk_mq_get_tag() -> 
      __blk_mq_get_tag() ->
        bt_get() ->
          blk_mq_put_ctx(data->ctx);

Here are blk_mq_get_ctx() and blk_mq_put_ctx().

static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
{   
return __blk_mq_get_ctx(q, get_cpu());
} 

static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
{
put_cpu();
}

blk_mq_alloc_request_hctx() calls __blk_mq_get_ctx() instead
of blk_mq_get_ctx(). Then reason is the "hctx" could belong to other
cpu. So blk_mq_get_ctx() doesn't work.

But then above put_cpu() in blk_mq_put_ctx() will trigger a WARNING
because we didn't do get_cpu() in blk_mq_alloc_request_hctx()


Re: [PATCH 3.10 000/143] 3.10.102-stable review

2016-06-07 Thread Willy Tarreau
On Tue, Jun 07, 2016 at 05:52:52PM -0700, Guenter Roeck wrote:
> Here we are;
> 
> Build results:
>   total: 123 pass: 123 fail: 0
> Qemu test results:
>   total: 75 pass: 75 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Excellent, thank you Guenter!

Willy


Re: [PATCH 3.10 000/143] 3.10.102-stable review

2016-06-07 Thread Willy Tarreau
On Tue, Jun 07, 2016 at 05:52:52PM -0700, Guenter Roeck wrote:
> Here we are;
> 
> Build results:
>   total: 123 pass: 123 fail: 0
> Qemu test results:
>   total: 75 pass: 75 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Excellent, thank you Guenter!

Willy


Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Christian Borntraeger
On 06/07/2016 11:49 PM, Sudip Mukherjee wrote:
> etr_ptff definitions are moved and renamed but we missed updating them
> here and as a result s390 defconfig and allmodconfig was failing with
> the error:
> arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> 
> Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> Signed-off-by: Sudip Mukherjee 

Thank you for the report and patch.

This is linux-next only. Its a conflict between my kvms390 queue and 
Martins s390 queue. We cannot apply this directly as it would break
the build of my tree when not merged in next. (and it does not apply
on Martins tree).

I will have a look how to fix that up.


> ---
> 
> s390 defconfig build log is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> 
>  arch/s390/kvm/kvm-s390.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fa51aef..3039eaf 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -29,7 +29,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
>   }
> 
>   if (test_facility(28)) /* TOD-clock steering */
> - etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> + ptff(kvm_s390_available_subfunc.ptff,
> +  sizeof(kvm_s390_available_subfunc.ptff),
> +  PTFF_QAF);
> 
>   if (test_facility(17)) { /* MSA */
>   __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);
> 



Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Christian Borntraeger
On 06/07/2016 11:49 PM, Sudip Mukherjee wrote:
> etr_ptff definitions are moved and renamed but we missed updating them
> here and as a result s390 defconfig and allmodconfig was failing with
> the error:
> arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> 
> Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> Signed-off-by: Sudip Mukherjee 

Thank you for the report and patch.

This is linux-next only. Its a conflict between my kvms390 queue and 
Martins s390 queue. We cannot apply this directly as it would break
the build of my tree when not merged in next. (and it does not apply
on Martins tree).

I will have a look how to fix that up.


> ---
> 
> s390 defconfig build log is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> 
>  arch/s390/kvm/kvm-s390.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fa51aef..3039eaf 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -29,7 +29,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
>   }
> 
>   if (test_facility(28)) /* TOD-clock steering */
> - etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> + ptff(kvm_s390_available_subfunc.ptff,
> +  sizeof(kvm_s390_available_subfunc.ptff),
> +  PTFF_QAF);
> 
>   if (test_facility(17)) { /* MSA */
>   __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);
> 



Re: [PATCH] mm/zsmalloc: add trace events for zs_compact

2016-06-07 Thread Minchan Kim
On Wed, Jun 08, 2016 at 09:48:30AM +0800, Ganesh Mahendran wrote:
> Hi, Minchan:
> 
> 2016-06-08 8:16 GMT+08:00 Minchan Kim :
> > Hello Ganesh,
> >
> > On Tue, Jun 07, 2016 at 04:56:44PM +0800, Ganesh Mahendran wrote:
> >> Currently zsmalloc is widely used in android device.
> >> Sometimes, we want to see how frequently zs_compact is
> >> triggered or how may pages freed by zs_compact(), or which
> >> zsmalloc pool is compacted.
> >>
> >> Most of the time, user can get the brief information from
> >> trace_mm_shrink_slab_[start | end], but in some senario,
> >> they do not use zsmalloc shrinker, but trigger compaction manually.
> >> So add some trace events in zs_compact is convenient. Also we
> >> can add some zsmalloc specific information(pool name, total compact
> >> pages, etc) in zsmalloc trace.
> >
> > Sorry, I cannot understand what's the problem now and what you want to
> > solve. Could you elaborate it a bit?
> >
> > Thanks.
> 
> We have backported the zs_compact() to our product(kernel 3.18).
> It is usefull for a longtime running device.
> But there is not a convenient way to get the detailed information
> of zs_comapct() which is usefull for  performance optimization.
> Information about how much time zs_compact used, which pool is
> compacted, how many page freed, etc.

You can know how many pages are freed by object compaction via mm_stat
each /sys/block/zram-id/mm_stat. And you can use function_graph to know
how much time zs_compact used.


> With these information, we will know what is going on in zs_comapct.
> And draw the relation between free mem and zs_comapct.
> 
> >
> >>
> >> This patch add two trace events for zs_compact(), below the trace log:
> >> -
> >> root@land:/ # cat /d/tracing/trace
> >>  kswapd0-125   [007] ...1   174.176979: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [007] ...1   174.181967: zsmalloc_compact_end: pool 
> >> zram0: 608 pages compacted(total 1794)
> >>  kswapd0-125   [000] ...1   184.134475: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [000] ...1   184.135010: zsmalloc_compact_end: pool 
> >> zram0: 62 pages compacted(total 1856)
> >>  kswapd0-125   [003] ...1   226.927221: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [003] ...1   226.928575: zsmalloc_compact_end: pool 
> >> zram0: 250 pages compacted(total 2106)
> >> -
> >>
> >> Signed-off-by: Ganesh Mahendran 
> >> ---
> >>  include/trace/events/zsmalloc.h | 56 
> >> +
> >>  mm/zsmalloc.c   | 10 
> >>  2 files changed, 66 insertions(+)
> >>  create mode 100644 include/trace/events/zsmalloc.h
> >>
> >> diff --git a/include/trace/events/zsmalloc.h 
> >> b/include/trace/events/zsmalloc.h
> >> new file mode 100644
> >> index 000..3b6f14e
> >> --- /dev/null
> >> +++ b/include/trace/events/zsmalloc.h
> >> @@ -0,0 +1,56 @@
> >> +#undef TRACE_SYSTEM
> >> +#define TRACE_SYSTEM zsmalloc
> >> +
> >> +#if !defined(_TRACE_ZSMALLOC_H) || defined(TRACE_HEADER_MULTI_READ)
> >> +#define _TRACE_ZSMALLOC_H
> >> +
> >> +#include 
> >> +#include 
> >> +
> >> +TRACE_EVENT(zsmalloc_compact_start,
> >> +
> >> + TP_PROTO(const char *pool_name),
> >> +
> >> + TP_ARGS(pool_name),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(const char *, pool_name)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->pool_name = pool_name;
> >> + ),
> >> +
> >> + TP_printk("pool %s",
> >> +   __entry->pool_name)
> >> +);
> >> +
> >> +TRACE_EVENT(zsmalloc_compact_end,
> >> +
> >> + TP_PROTO(const char *pool_name, unsigned long pages_compacted,
> >> + unsigned long pages_total_compacted),
> >> +
> >> + TP_ARGS(pool_name, pages_compacted, pages_total_compacted),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(const char *, pool_name)
> >> + __field(unsigned long, pages_compacted)
> >> + __field(unsigned long, pages_total_compacted)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->pool_name = pool_name;
> >> + __entry->pages_compacted = pages_compacted;
> >> + __entry->pages_total_compacted = pages_total_compacted;
> >> + ),
> >> +
> >> + TP_printk("pool %s: %ld pages compacted(total %ld)",
> >> +   __entry->pool_name,
> >> +   __entry->pages_compacted,
> >> +   __entry->pages_total_compacted)
> >> +);
> >> +
> >> +#endif /* _TRACE_ZSMALLOC_H */
> >> +
> >> +/* This part must be outside protection */
> >> +#include 
> >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> >> index 213d0e1..441b9f7 100644
> >> --- a/mm/zsmalloc.c
> >> +++ b/mm/zsmalloc.c
> >> @@ -30,6 +30,8 @@
> >>
> >>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >>
> >> +#define CREATE_TRACE_POINTS

Re: [PATCH] mm/zsmalloc: add trace events for zs_compact

2016-06-07 Thread Minchan Kim
On Wed, Jun 08, 2016 at 09:48:30AM +0800, Ganesh Mahendran wrote:
> Hi, Minchan:
> 
> 2016-06-08 8:16 GMT+08:00 Minchan Kim :
> > Hello Ganesh,
> >
> > On Tue, Jun 07, 2016 at 04:56:44PM +0800, Ganesh Mahendran wrote:
> >> Currently zsmalloc is widely used in android device.
> >> Sometimes, we want to see how frequently zs_compact is
> >> triggered or how may pages freed by zs_compact(), or which
> >> zsmalloc pool is compacted.
> >>
> >> Most of the time, user can get the brief information from
> >> trace_mm_shrink_slab_[start | end], but in some senario,
> >> they do not use zsmalloc shrinker, but trigger compaction manually.
> >> So add some trace events in zs_compact is convenient. Also we
> >> can add some zsmalloc specific information(pool name, total compact
> >> pages, etc) in zsmalloc trace.
> >
> > Sorry, I cannot understand what's the problem now and what you want to
> > solve. Could you elaborate it a bit?
> >
> > Thanks.
> 
> We have backported the zs_compact() to our product(kernel 3.18).
> It is usefull for a longtime running device.
> But there is not a convenient way to get the detailed information
> of zs_comapct() which is usefull for  performance optimization.
> Information about how much time zs_compact used, which pool is
> compacted, how many page freed, etc.

You can know how many pages are freed by object compaction via mm_stat
each /sys/block/zram-id/mm_stat. And you can use function_graph to know
how much time zs_compact used.


> With these information, we will know what is going on in zs_comapct.
> And draw the relation between free mem and zs_comapct.
> 
> >
> >>
> >> This patch add two trace events for zs_compact(), below the trace log:
> >> -
> >> root@land:/ # cat /d/tracing/trace
> >>  kswapd0-125   [007] ...1   174.176979: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [007] ...1   174.181967: zsmalloc_compact_end: pool 
> >> zram0: 608 pages compacted(total 1794)
> >>  kswapd0-125   [000] ...1   184.134475: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [000] ...1   184.135010: zsmalloc_compact_end: pool 
> >> zram0: 62 pages compacted(total 1856)
> >>  kswapd0-125   [003] ...1   226.927221: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [003] ...1   226.928575: zsmalloc_compact_end: pool 
> >> zram0: 250 pages compacted(total 2106)
> >> -
> >>
> >> Signed-off-by: Ganesh Mahendran 
> >> ---
> >>  include/trace/events/zsmalloc.h | 56 
> >> +
> >>  mm/zsmalloc.c   | 10 
> >>  2 files changed, 66 insertions(+)
> >>  create mode 100644 include/trace/events/zsmalloc.h
> >>
> >> diff --git a/include/trace/events/zsmalloc.h 
> >> b/include/trace/events/zsmalloc.h
> >> new file mode 100644
> >> index 000..3b6f14e
> >> --- /dev/null
> >> +++ b/include/trace/events/zsmalloc.h
> >> @@ -0,0 +1,56 @@
> >> +#undef TRACE_SYSTEM
> >> +#define TRACE_SYSTEM zsmalloc
> >> +
> >> +#if !defined(_TRACE_ZSMALLOC_H) || defined(TRACE_HEADER_MULTI_READ)
> >> +#define _TRACE_ZSMALLOC_H
> >> +
> >> +#include 
> >> +#include 
> >> +
> >> +TRACE_EVENT(zsmalloc_compact_start,
> >> +
> >> + TP_PROTO(const char *pool_name),
> >> +
> >> + TP_ARGS(pool_name),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(const char *, pool_name)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->pool_name = pool_name;
> >> + ),
> >> +
> >> + TP_printk("pool %s",
> >> +   __entry->pool_name)
> >> +);
> >> +
> >> +TRACE_EVENT(zsmalloc_compact_end,
> >> +
> >> + TP_PROTO(const char *pool_name, unsigned long pages_compacted,
> >> + unsigned long pages_total_compacted),
> >> +
> >> + TP_ARGS(pool_name, pages_compacted, pages_total_compacted),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(const char *, pool_name)
> >> + __field(unsigned long, pages_compacted)
> >> + __field(unsigned long, pages_total_compacted)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->pool_name = pool_name;
> >> + __entry->pages_compacted = pages_compacted;
> >> + __entry->pages_total_compacted = pages_total_compacted;
> >> + ),
> >> +
> >> + TP_printk("pool %s: %ld pages compacted(total %ld)",
> >> +   __entry->pool_name,
> >> +   __entry->pages_compacted,
> >> +   __entry->pages_total_compacted)
> >> +);
> >> +
> >> +#endif /* _TRACE_ZSMALLOC_H */
> >> +
> >> +/* This part must be outside protection */
> >> +#include 
> >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> >> index 213d0e1..441b9f7 100644
> >> --- a/mm/zsmalloc.c
> >> +++ b/mm/zsmalloc.c
> >> @@ -30,6 +30,8 @@
> >>
> >>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >>
> >> +#define CREATE_TRACE_POINTS
> >> +
> >>  #include 
> >>  #include 
> >>  

Re: [PATCH 09/10] x86, asm: Use CC_SET()/CC_OUT() and static_cpu_has() in archrandom.h

2016-06-07 Thread Andy Lutomirski
On Tue, Jun 7, 2016 at 4:31 PM, H. Peter Anvin  wrote:
> Use CC_SET()/CC_OUT() and static_cpu_has().  This produces code good
> enough to eliminate ad hoc use of alternatives in ,
> greatly simplifying the code.

Looks reasonable.


Re: [PATCH 09/10] x86, asm: Use CC_SET()/CC_OUT() and static_cpu_has() in archrandom.h

2016-06-07 Thread Andy Lutomirski
On Tue, Jun 7, 2016 at 4:31 PM, H. Peter Anvin  wrote:
> Use CC_SET()/CC_OUT() and static_cpu_has().  This produces code good
> enough to eliminate ad hoc use of alternatives in ,
> greatly simplifying the code.

Looks reasonable.


RE: [PATCH 2/2] aer: add support aer interrupt with none MSI/MSI-X/INTx mode

2016-06-07 Thread Po Liu
Hi Bjorn,

Thanks for the kindly reply. All these are helpful.

>  From: Bjorn Helgaas [mailto:helg...@kernel.org]
>  On Wed, June 08, 2016 6:47 AM
>  
>  On Tue, Jun 07, 2016 at 10:07:40AM +, Po Liu wrote:
>  > Hi Bjorn,
>  >
>  > >  -Original Message-
>  > >
>  > >  On Mon, Jun 06, 2016 at 10:01:44AM -0400, Murali Karicheri wrote:
>  > >  > On 06/06/2016 03:32 AM, Po Liu wrote:
>  > >  > > Hi Bjorn,
>  > >  > > I confirm we met same problem with KeyStone base on DesignWare
>  > > design.
>  > >  > >
>  > >  > >
>  > >  > > Best regards,
>  > >  > > Liu Po
>  > >  > >
>  > >  > >>  -Original Message-
>  > >  > >>  From: Bjorn Helgaas [mailto:helg...@kernel.org]  > >>  Sent:
>  > > Saturday, June 04, 2016 11:49 AM  > >>  To: Murali Karicheri  > >>
>  > > Cc: Po Liu; linux-...@vger.kernel.org; linux-arm-  > >>
>  > > ker...@lists.infradead.org; linux-kernel@vger.kernel.org;  > >>
>  > > devicet...@vger.kernel.org; Arnd Bergmann; Roy Zang; Marc Zyngier;
>  > > > >> Stuart Yoder; Yang-Leo Li; Minghuan Lian; Bjorn Helgaas; Shawn
>  > > Guo;  > >> Mingkai Hu; Rob Herring  > >>  Subject: Re: [PATCH 2/2]
>  > > aer: add support aer interrupt with none  > >> MSI/MSI-X/INTx mode
>  > > > >>  > >>  On Fri, Jun 03, 2016 at 01:31:11PM -0400, Murali
>  > > Karicheri wrote:
>  > >  > >>  > Po,
>  > >  > >>  >
>  > >  > >>  > Sorry to hijack your discussion, but the problem seems to
>  > > be  > >> same for  > Keystone PCI controller which is also
>  > > designware (old
>  > >  version) based.
>  > >  > >>  >
>  > >  > >>  > On 06/03/2016 12:09 AM, Bjorn Helgaas wrote:
>  > >  > >>  > > On Thu, Jun 02, 2016 at 11:37:28AM -0400, Murali
>  > > Karicheri
>  > >  wrote:
>  > >  > >>  > >> On 06/02/2016 09:55 AM, Bjorn Helgaas wrote:
>  > >  > >>  > >>> On Thu, Jun 02, 2016 at 05:01:19AM +, Po Liu wrote:
>  > >  > >>  > >  -Original Message-  > >  From: Bjorn
>  > > Helgaas  > >> [mailto:helg...@kernel.org]  > >  Sent: Thursday,
>  > > June 02, 2016  > >> 11:48 AM  > >  To: Po Liu  > >  Cc:
>  > >  > >> linux-...@vger.kernel.org;  > >  > >>
>  > > linux-arm-ker...@lists.infradead.org;
>  > >  > >>  > >  linux-kernel@vger.kernel.org;
>  > > devicet...@vger.kernel.org;  > >> Arnd  > > Bergmann;  Roy Zang;
>  > > Marc Zyngier; Stuart Yoder;  > >> Yang-Leo Li;  > > Minghuan
>  > > Lian; Bjorn  Helgaas; Shawn Guo;  > >> Mingkai Hu; Rob  > >
>  > > Herring  > >  Subject: Re: [PATCH 2/2]  > >> aer: add support
>  > > aer interrupt with  > > none  MSI/MSI-X/INTx  > >> mode  > >
>  > > > >  [+cc Rob]  > >  > >  Hi Po,  >  > >> >  > >
>  > > On Thu, May 26, 2016 at 02:00:06PM +0800, Po Liu  > >> wrote:
>  > >  > >>  > >  > On some platforms, root port doesn't support  > >>
>  > > MSI/MSI-X/INTx  in RC mode.
>  > >  > >>  > >  > When chip support the aer interrupt with none  >
>  > > >> MSI/MSI-X/INTx  > > mode,  > maybe there is interrupt line
>  > > for  > >> aer pme etc. Search  > > the interrupt  > number in
>  > > the fdt  file.
>  > >  > >>  > >
>  > >  > >>  > >  My understanding is that AER interrupt signaling can
>  > > be  > >> done  > > via INTx,  MSI, or MSI-X (PCIe spec r3.0, sec
>  > > 6.2.4.1.2).
>  > >  > >>  > > Apparently your device  doesn't support MSI or MSI-X.
>  > > Are  > >> you  > > saying it doesn't support INTx  either?  How
>  > > is the  > >> interrupt  you're requesting here different from INTx?
>  > >  > >>  > 
>  > >  > >>  >  Layerscape use none of MSI or MSI-X or INTx to
>  > > indicate the  > >> >  devices or root error in RC mode. But use
>  > > an independent SPI  > >> >  interrupt(arm interrupt controller)
>  line.
>  > >  > >>  > >>>
>  > >  > >>  > >>> The Root Port is a PCI device and should follow the
>  > > normal  > >> PCI  > >>> rules for interrupts.  As far as I
>  > > understand, that  > >> means it  > >>> should use MSI, MSI-X, or
>  > > INTx.  If your Root Port  > >> doesn't use MSI  > >>> or MSI-X, it
>  > > should use INTx, the  > >> PCI_INTERRUPT_PIN register  > >>> should
>  > > tell us which (INTA/  > >> INTB/etc.), and  PCI_COMMAND_INTX_DISABLE
>  should work to disable it.
>  > >  > >>  > >>> That's all from the PCI point of view, of course.
>  > >  > >>  > >>
>  > >  > >>  > >> I am faced with the same issue on Keystone PCI hardware
>  > > and  > >> it has  > >> been on my TODO list  for quite some time.
>  > > Keystone  > >> PCI hardware  > >> also doesn't use MSI or MSI-X or
>  > > INTx for  > >> reporting errors received  > >> at the root port, but
>  > > use a  > >> platform interrupt instead (not  > >> complaint to PCI
>  > > standard as  > >> per PCI base spec). So I would need  > >> similar
>  > > change to have  > >> the error interrupt passed to the aer  > >>
>  > > driver. So there are  > >> hardware out there like Keystone which
>  > > requires to support this  through platform IRQ.
>  > >  > >>  > >
>  > >  > >>  > 

RE: [PATCH 2/2] aer: add support aer interrupt with none MSI/MSI-X/INTx mode

2016-06-07 Thread Po Liu
Hi Bjorn,

Thanks for the kindly reply. All these are helpful.

>  From: Bjorn Helgaas [mailto:helg...@kernel.org]
>  On Wed, June 08, 2016 6:47 AM
>  
>  On Tue, Jun 07, 2016 at 10:07:40AM +, Po Liu wrote:
>  > Hi Bjorn,
>  >
>  > >  -Original Message-
>  > >
>  > >  On Mon, Jun 06, 2016 at 10:01:44AM -0400, Murali Karicheri wrote:
>  > >  > On 06/06/2016 03:32 AM, Po Liu wrote:
>  > >  > > Hi Bjorn,
>  > >  > > I confirm we met same problem with KeyStone base on DesignWare
>  > > design.
>  > >  > >
>  > >  > >
>  > >  > > Best regards,
>  > >  > > Liu Po
>  > >  > >
>  > >  > >>  -Original Message-
>  > >  > >>  From: Bjorn Helgaas [mailto:helg...@kernel.org]  > >>  Sent:
>  > > Saturday, June 04, 2016 11:49 AM  > >>  To: Murali Karicheri  > >>
>  > > Cc: Po Liu; linux-...@vger.kernel.org; linux-arm-  > >>
>  > > ker...@lists.infradead.org; linux-kernel@vger.kernel.org;  > >>
>  > > devicet...@vger.kernel.org; Arnd Bergmann; Roy Zang; Marc Zyngier;
>  > > > >> Stuart Yoder; Yang-Leo Li; Minghuan Lian; Bjorn Helgaas; Shawn
>  > > Guo;  > >> Mingkai Hu; Rob Herring  > >>  Subject: Re: [PATCH 2/2]
>  > > aer: add support aer interrupt with none  > >> MSI/MSI-X/INTx mode
>  > > > >>  > >>  On Fri, Jun 03, 2016 at 01:31:11PM -0400, Murali
>  > > Karicheri wrote:
>  > >  > >>  > Po,
>  > >  > >>  >
>  > >  > >>  > Sorry to hijack your discussion, but the problem seems to
>  > > be  > >> same for  > Keystone PCI controller which is also
>  > > designware (old
>  > >  version) based.
>  > >  > >>  >
>  > >  > >>  > On 06/03/2016 12:09 AM, Bjorn Helgaas wrote:
>  > >  > >>  > > On Thu, Jun 02, 2016 at 11:37:28AM -0400, Murali
>  > > Karicheri
>  > >  wrote:
>  > >  > >>  > >> On 06/02/2016 09:55 AM, Bjorn Helgaas wrote:
>  > >  > >>  > >>> On Thu, Jun 02, 2016 at 05:01:19AM +, Po Liu wrote:
>  > >  > >>  > >  -Original Message-  > >  From: Bjorn
>  > > Helgaas  > >> [mailto:helg...@kernel.org]  > >  Sent: Thursday,
>  > > June 02, 2016  > >> 11:48 AM  > >  To: Po Liu  > >  Cc:
>  > >  > >> linux-...@vger.kernel.org;  > >  > >>
>  > > linux-arm-ker...@lists.infradead.org;
>  > >  > >>  > >  linux-kernel@vger.kernel.org;
>  > > devicet...@vger.kernel.org;  > >> Arnd  > > Bergmann;  Roy Zang;
>  > > Marc Zyngier; Stuart Yoder;  > >> Yang-Leo Li;  > > Minghuan
>  > > Lian; Bjorn  Helgaas; Shawn Guo;  > >> Mingkai Hu; Rob  > >
>  > > Herring  > >  Subject: Re: [PATCH 2/2]  > >> aer: add support
>  > > aer interrupt with  > > none  MSI/MSI-X/INTx  > >> mode  > >
>  > > > >  [+cc Rob]  > >  > >  Hi Po,  >  > >> >  > >
>  > > On Thu, May 26, 2016 at 02:00:06PM +0800, Po Liu  > >> wrote:
>  > >  > >>  > >  > On some platforms, root port doesn't support  > >>
>  > > MSI/MSI-X/INTx  in RC mode.
>  > >  > >>  > >  > When chip support the aer interrupt with none  >
>  > > >> MSI/MSI-X/INTx  > > mode,  > maybe there is interrupt line
>  > > for  > >> aer pme etc. Search  > > the interrupt  > number in
>  > > the fdt  file.
>  > >  > >>  > >
>  > >  > >>  > >  My understanding is that AER interrupt signaling can
>  > > be  > >> done  > > via INTx,  MSI, or MSI-X (PCIe spec r3.0, sec
>  > > 6.2.4.1.2).
>  > >  > >>  > > Apparently your device  doesn't support MSI or MSI-X.
>  > > Are  > >> you  > > saying it doesn't support INTx  either?  How
>  > > is the  > >> interrupt  you're requesting here different from INTx?
>  > >  > >>  > 
>  > >  > >>  >  Layerscape use none of MSI or MSI-X or INTx to
>  > > indicate the  > >> >  devices or root error in RC mode. But use
>  > > an independent SPI  > >> >  interrupt(arm interrupt controller)
>  line.
>  > >  > >>  > >>>
>  > >  > >>  > >>> The Root Port is a PCI device and should follow the
>  > > normal  > >> PCI  > >>> rules for interrupts.  As far as I
>  > > understand, that  > >> means it  > >>> should use MSI, MSI-X, or
>  > > INTx.  If your Root Port  > >> doesn't use MSI  > >>> or MSI-X, it
>  > > should use INTx, the  > >> PCI_INTERRUPT_PIN register  > >>> should
>  > > tell us which (INTA/  > >> INTB/etc.), and  PCI_COMMAND_INTX_DISABLE
>  should work to disable it.
>  > >  > >>  > >>> That's all from the PCI point of view, of course.
>  > >  > >>  > >>
>  > >  > >>  > >> I am faced with the same issue on Keystone PCI hardware
>  > > and  > >> it has  > >> been on my TODO list  for quite some time.
>  > > Keystone  > >> PCI hardware  > >> also doesn't use MSI or MSI-X or
>  > > INTx for  > >> reporting errors received  > >> at the root port, but
>  > > use a  > >> platform interrupt instead (not  > >> complaint to PCI
>  > > standard as  > >> per PCI base spec). So I would need  > >> similar
>  > > change to have  > >> the error interrupt passed to the aer  > >>
>  > > driver. So there are  > >> hardware out there like Keystone which
>  > > requires to support this  through platform IRQ.
>  > >  > >>  > >
>  > >  > >>  > 

Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

2016-06-07 Thread Horng-Shyang Liao
Hi Matthias,

On Tue, 2016-06-07 at 19:04 +0200, Matthias Brugger wrote:
> 
> On 30/05/16 05:19, HS Liao wrote:
> > This patch is first version of Mediatek Command Queue(CMDQ) driver. The
> > CMDQ is used to help read/write registers with critical time limitation,
> > such as updating display configuration during the vblank. It controls
> > Global Command Engine (GCE) hardware to achieve this requirement.
> > Currently, CMDQ only supports display related hardwares, but we expect
> > it can be extended to other hardwares for future requirements.
> >
> > Signed-off-by: HS Liao 
> > Signed-off-by: CK Hu 
> > ---
> 
> [...]
> 
> > +static void cmdq_handle_error_done(struct cmdq *cmdq,
> > +  struct cmdq_thread *thread, u32 irq_flag)
> > +{
> > +   struct cmdq_task *task, *tmp, *curr_task = NULL;
> > +   u32 curr_pa;
> > +   struct cmdq_cb_data cmdq_cb_data;
> > +   bool err;
> > +
> > +   if (irq_flag & CMDQ_THR_IRQ_ERROR)
> > +   err = true;
> > +   else if (irq_flag & CMDQ_THR_IRQ_DONE)
> > +   err = false;
> > +   else
> > +   return;
> > +
> > +   curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> > +
> > +   list_for_each_entry_safe(task, tmp, >task_busy_list,
> > +list_entry) {
> > +   if (curr_pa >= task->pa_base &&
> > +   curr_pa < (task->pa_base + task->command_size))
> > +   curr_task = task;
> > +   if (task->cb.cb) {
> > +   cmdq_cb_data.err = curr_task ? err : false;
> > +   cmdq_cb_data.data = task->cb.data;
> > +   task->cb.cb(cmdq_cb_data);
> > +   }
> 
> I think this is not right. If we got an IRQ_DONE, then the current task 
> is in execution, we should not call the callback until it has finished.

Thanks for your finding. This is a bug from CMDQ v6.
I will fix it in next version (CMDQ v9).

> 
> Regards,
> Matthias

Thanks,
HS



Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

2016-06-07 Thread Horng-Shyang Liao
Hi Matthias,

On Tue, 2016-06-07 at 19:04 +0200, Matthias Brugger wrote:
> 
> On 30/05/16 05:19, HS Liao wrote:
> > This patch is first version of Mediatek Command Queue(CMDQ) driver. The
> > CMDQ is used to help read/write registers with critical time limitation,
> > such as updating display configuration during the vblank. It controls
> > Global Command Engine (GCE) hardware to achieve this requirement.
> > Currently, CMDQ only supports display related hardwares, but we expect
> > it can be extended to other hardwares for future requirements.
> >
> > Signed-off-by: HS Liao 
> > Signed-off-by: CK Hu 
> > ---
> 
> [...]
> 
> > +static void cmdq_handle_error_done(struct cmdq *cmdq,
> > +  struct cmdq_thread *thread, u32 irq_flag)
> > +{
> > +   struct cmdq_task *task, *tmp, *curr_task = NULL;
> > +   u32 curr_pa;
> > +   struct cmdq_cb_data cmdq_cb_data;
> > +   bool err;
> > +
> > +   if (irq_flag & CMDQ_THR_IRQ_ERROR)
> > +   err = true;
> > +   else if (irq_flag & CMDQ_THR_IRQ_DONE)
> > +   err = false;
> > +   else
> > +   return;
> > +
> > +   curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> > +
> > +   list_for_each_entry_safe(task, tmp, >task_busy_list,
> > +list_entry) {
> > +   if (curr_pa >= task->pa_base &&
> > +   curr_pa < (task->pa_base + task->command_size))
> > +   curr_task = task;
> > +   if (task->cb.cb) {
> > +   cmdq_cb_data.err = curr_task ? err : false;
> > +   cmdq_cb_data.data = task->cb.data;
> > +   task->cb.cb(cmdq_cb_data);
> > +   }
> 
> I think this is not right. If we got an IRQ_DONE, then the current task 
> is in execution, we should not call the callback until it has finished.

Thanks for your finding. This is a bug from CMDQ v6.
I will fix it in next version (CMDQ v9).

> 
> Regards,
> Matthias

Thanks,
HS



Re: [PATCH 04/10] x86, asm: define CC_SET() and CC_OUT() macros

2016-06-07 Thread Andy Lutomirski
On Tue, Jun 7, 2016 at 4:31 PM, H. Peter Anvin  wrote:
> From: "H. Peter Anvin" 
>
> The CC_SET() and CC_OUT() macros can be used together to take
> advantage of the new __GCC_ASM_FLAG_OUTPUTS__ feature in gcc 6+ while
> remaining backwards compatible.  CC_SET() generates a SET instruction
> on older compilers; CC_OUT() makes sure the output is received in the
> correct variable.

Nice.

Reviewed-by: Andy Lutomirski 


Re: [PATCH 04/10] x86, asm: define CC_SET() and CC_OUT() macros

2016-06-07 Thread Andy Lutomirski
On Tue, Jun 7, 2016 at 4:31 PM, H. Peter Anvin  wrote:
> From: "H. Peter Anvin" 
>
> The CC_SET() and CC_OUT() macros can be used together to take
> advantage of the new __GCC_ASM_FLAG_OUTPUTS__ feature in gcc 6+ while
> remaining backwards compatible.  CC_SET() generates a SET instruction
> on older compilers; CC_OUT() makes sure the output is received in the
> correct variable.

Nice.

Reviewed-by: Andy Lutomirski 


Re: [PATCH v2 1/6] power: Introduce Broadcom kona reset driver

2016-06-07 Thread Sebastian Reichel
Hi,

On Tue, Jun 07, 2016 at 12:40:41PM -0700, Chris Brand wrote:
> On Mon, Jun 6, 2016 at 6:50 PM, Sebastian Reichel  wrote:
> > Hi,
> >
> > On Mon, Jun 06, 2016 at 09:42:03AM -0700, Chris Brand wrote:
> >> On Thu, Jun 2, 2016 at 7:38 PM, Sebastian Reichel  wrote:
> >> > Feel free to queue it via arm-soc with
> >> >
> >> > Acked-By: Sebastian Reichel 
> >> >
> >> > If I didn't overlook it, it's missing DT documentation, though.
> >>
> >> Thanks, Sebastian. Because this is effectively a move of code from
> >> arch/arm rather than new code, there's already dt documentation in
> >> Documentation/devicetree/bindings/reset/brcm,bcm21664-resetmgr.txt
> >
> > Ok. That directory is usually used for periphal reset controller.
> > Board/System reset controllers are usually documented in
> > .../bindings/power/reset (following kernel strucuture
> > [drivers/reset and drivers/power/reset]).
> >
> > -- Sebastian
> 
> Would you like me to send a separate patch to move that file ?

That would nice, thanks!

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH v2 1/6] power: Introduce Broadcom kona reset driver

2016-06-07 Thread Sebastian Reichel
Hi,

On Tue, Jun 07, 2016 at 12:40:41PM -0700, Chris Brand wrote:
> On Mon, Jun 6, 2016 at 6:50 PM, Sebastian Reichel  wrote:
> > Hi,
> >
> > On Mon, Jun 06, 2016 at 09:42:03AM -0700, Chris Brand wrote:
> >> On Thu, Jun 2, 2016 at 7:38 PM, Sebastian Reichel  wrote:
> >> > Feel free to queue it via arm-soc with
> >> >
> >> > Acked-By: Sebastian Reichel 
> >> >
> >> > If I didn't overlook it, it's missing DT documentation, though.
> >>
> >> Thanks, Sebastian. Because this is effectively a move of code from
> >> arch/arm rather than new code, there's already dt documentation in
> >> Documentation/devicetree/bindings/reset/brcm,bcm21664-resetmgr.txt
> >
> > Ok. That directory is usually used for periphal reset controller.
> > Board/System reset controllers are usually documented in
> > .../bindings/power/reset (following kernel strucuture
> > [drivers/reset and drivers/power/reset]).
> >
> > -- Sebastian
> 
> Would you like me to send a separate patch to move that file ?

That would nice, thanks!

-- Sebastian


signature.asc
Description: PGP signature


linux-next: Tree for Jun 8

2016-06-07 Thread Stephen Rothwell
Hi all,

News: there will be no linux-next releases on Friday or Monday, so the
release following tomorrow's will be next-20160614.

Changes since 20160607:

Removed tree: drm-vc4 (merged into the bcm2835 tree)

Dropped tree: amlogic (build failure)

My fixes tree contains:

  of: silence warnings due to max() usage

The amlogic tree still had its build failure so I dropped it for today.

The net-next tree gained a conflict against the net tree.

The clockevents tree still had its build failure so I used the version
from next-20160606.

I applied a supplied merge fix for a semantic conlict between the s390
and kvms390 trees.

Non-merge commits (relative to Linus' tree): 1881
 1809 files changed, 74075 insertions(+), 32709 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 232 trees (counting Linus' and 34 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (43c082e72745 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fixes/master (b31033aacbd0 of: silence warnings due to max() usage)
Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported 
variables)
Merging arc-current/for-curr (ed6aefed726a Revert "ARCv2: 
spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential 
backoff")
Merging arm-current/fixes (e2dfb4b88014 ARM: fix PTRACE_SETVFPREGS on SMP 
systems)
Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic 
)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (8a934efe9434 powerpc/pseries: Fix PCI config 
address for DDW)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (6b15d6650c53 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (a03e6fe56971 act_police: fix a crash during removal)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (3ec10d3a2ba5 ipvs: update real-server binding of outgoing 
connections in SIP-pe)
Merging wireless-drivers/master (182fd9eecb28 MAINTAINERS: Add file patterns 
for wireless device tree bindings)
Merging mac80211/master (6fe04128f158 mac80211: fix fast_tx header alignment)
Merging sound-current/for-linus (f90d83b30170 ALSA: hda - Fix headset mic 
detection problem for Dell machine)
Merging pci-current/for-linus (1a695a905c18 Linux 4.7-rc1)
Merging driver-core.current/driver-core-linus (1a695a905c18 Linux 4.7-rc1)
Merging tty.current/tty-linus (1a695a905c18 Linux 4.7-rc1)
Merging usb.current/usb-linus (7b2c17f82954 usb: musb: Stop bulk endpoint while 
queue is rotated)
Merging usb-gadget-fixes/fixes (50c763f8c1ba usb: dwc3: Set the ClearPendIN bit 
on Clear Stall EP command)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (1a695a905c18 Linux 4.7-rc1)
Merging char-misc.current/char-misc-linus (1a695a905c18 Linux 4.7-rc1)
Merging input-current/for-linus (540c26087bfb Input: xpad - fix rumble on Xbox 
One controllers with 2015 firmware)
Merging crypto-current/master (ab6a11a7c8ef crypto: ccp - Fix AES XTS error for 
request sizes 

linux-next: Tree for Jun 8

2016-06-07 Thread Stephen Rothwell
Hi all,

News: there will be no linux-next releases on Friday or Monday, so the
release following tomorrow's will be next-20160614.

Changes since 20160607:

Removed tree: drm-vc4 (merged into the bcm2835 tree)

Dropped tree: amlogic (build failure)

My fixes tree contains:

  of: silence warnings due to max() usage

The amlogic tree still had its build failure so I dropped it for today.

The net-next tree gained a conflict against the net tree.

The clockevents tree still had its build failure so I used the version
from next-20160606.

I applied a supplied merge fix for a semantic conlict between the s390
and kvms390 trees.

Non-merge commits (relative to Linus' tree): 1881
 1809 files changed, 74075 insertions(+), 32709 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 232 trees (counting Linus' and 34 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (43c082e72745 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fixes/master (b31033aacbd0 of: silence warnings due to max() usage)
Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported 
variables)
Merging arc-current/for-curr (ed6aefed726a Revert "ARCv2: 
spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential 
backoff")
Merging arm-current/fixes (e2dfb4b88014 ARM: fix PTRACE_SETVFPREGS on SMP 
systems)
Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic 
)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (8a934efe9434 powerpc/pseries: Fix PCI config 
address for DDW)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (6b15d6650c53 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (a03e6fe56971 act_police: fix a crash during removal)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (3ec10d3a2ba5 ipvs: update real-server binding of outgoing 
connections in SIP-pe)
Merging wireless-drivers/master (182fd9eecb28 MAINTAINERS: Add file patterns 
for wireless device tree bindings)
Merging mac80211/master (6fe04128f158 mac80211: fix fast_tx header alignment)
Merging sound-current/for-linus (f90d83b30170 ALSA: hda - Fix headset mic 
detection problem for Dell machine)
Merging pci-current/for-linus (1a695a905c18 Linux 4.7-rc1)
Merging driver-core.current/driver-core-linus (1a695a905c18 Linux 4.7-rc1)
Merging tty.current/tty-linus (1a695a905c18 Linux 4.7-rc1)
Merging usb.current/usb-linus (7b2c17f82954 usb: musb: Stop bulk endpoint while 
queue is rotated)
Merging usb-gadget-fixes/fixes (50c763f8c1ba usb: dwc3: Set the ClearPendIN bit 
on Clear Stall EP command)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (1a695a905c18 Linux 4.7-rc1)
Merging char-misc.current/char-misc-linus (1a695a905c18 Linux 4.7-rc1)
Merging input-current/for-linus (540c26087bfb Input: xpad - fix rumble on Xbox 
One controllers with 2015 firmware)
Merging crypto-current/master (ab6a11a7c8ef crypto: ccp - Fix AES XTS error for 
request sizes 

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Jens Axboe

On 06/06/2016 03:21 PM, Christoph Hellwig wrote:

From: Ming Lin 

For some protocols like NVMe over Fabrics we need to be able to send
initialization commands to a specific queue.

Based on an earlier patch from Christoph Hellwig .

Signed-off-by: Ming Lin 
Signed-off-by: Christoph Hellwig 
---
  block/blk-mq.c | 33 +
  include/linux/blk-mq.h |  2 ++
  2 files changed, 35 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29cbc1b..7bb45ed 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, int rw,
  }
  EXPORT_SYMBOL(blk_mq_alloc_request);

+struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
+   unsigned int flags, unsigned int hctx_idx)
+{
+   struct blk_mq_hw_ctx *hctx;
+   struct blk_mq_ctx *ctx;
+   struct request *rq;
+   struct blk_mq_alloc_data alloc_data;
+   int ret;
+
+   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+   if (ret)
+   return ERR_PTR(ret);
+
+   hctx = q->queue_hw_ctx[hctx_idx];
+   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
+
+   blk_mq_set_alloc_data(_data, q, flags, ctx, hctx);
+
+   rq = __blk_mq_alloc_request(_data, rw);
+   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
+   __blk_mq_run_hw_queue(hctx);
+
+   rq =  __blk_mq_alloc_request(_data, rw);
+   }


Why are we duplicating this code here? If NOWAIT isn't set, then we'll
always return a request. bt_get() will run the queue for us, if it needs
to. blk_mq_alloc_request() does this too, and I'm guessing that code was
just copied. I'll fix that up. Looks like this should just be:

rq = __blk_mq_alloc_request(_data, rw);
if (rq)
return rq;

blk_queue_exit(q);
return ERR_PTR(-EWOULDBLOCK);

for this case.

--
Jens Axboe



Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Jens Axboe

On 06/06/2016 03:21 PM, Christoph Hellwig wrote:

From: Ming Lin 

For some protocols like NVMe over Fabrics we need to be able to send
initialization commands to a specific queue.

Based on an earlier patch from Christoph Hellwig .

Signed-off-by: Ming Lin 
Signed-off-by: Christoph Hellwig 
---
  block/blk-mq.c | 33 +
  include/linux/blk-mq.h |  2 ++
  2 files changed, 35 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29cbc1b..7bb45ed 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, int rw,
  }
  EXPORT_SYMBOL(blk_mq_alloc_request);

+struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
+   unsigned int flags, unsigned int hctx_idx)
+{
+   struct blk_mq_hw_ctx *hctx;
+   struct blk_mq_ctx *ctx;
+   struct request *rq;
+   struct blk_mq_alloc_data alloc_data;
+   int ret;
+
+   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+   if (ret)
+   return ERR_PTR(ret);
+
+   hctx = q->queue_hw_ctx[hctx_idx];
+   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
+
+   blk_mq_set_alloc_data(_data, q, flags, ctx, hctx);
+
+   rq = __blk_mq_alloc_request(_data, rw);
+   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
+   __blk_mq_run_hw_queue(hctx);
+
+   rq =  __blk_mq_alloc_request(_data, rw);
+   }


Why are we duplicating this code here? If NOWAIT isn't set, then we'll
always return a request. bt_get() will run the queue for us, if it needs
to. blk_mq_alloc_request() does this too, and I'm guessing that code was
just copied. I'll fix that up. Looks like this should just be:

rq = __blk_mq_alloc_request(_data, rw);
if (rq)
return rq;

blk_queue_exit(q);
return ERR_PTR(-EWOULDBLOCK);

for this case.

--
Jens Axboe



Re: [PATCH v10 6/7] usb: pci-quirks: add Intel USB drcfg mux device

2016-06-07 Thread Greg Kroah-Hartman
On Thu, Jun 02, 2016 at 09:37:28AM +0800, Lu Baolu wrote:
> In some Intel platforms, a single usb port is shared between USB host
> and device controllers. The shared port is under control of a switch
> which is defined in the Intel vendor defined extended capability for
> xHCI.
> 
> This patch adds the support to detect and create the platform device
> for the port mux switch.

Why do you need a platform device for this?  You do nothing with this
device, why create it at all?

And why is it a platform device, isn't is really a PCI device?  Why
would you ever find a "platform" device below a PCI device?  Don't abuse
platform devices for things that aren't.  It makes me want to delete
that whole interface more and more...

greg k-h


Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-06-07 Thread Ganapatrao Kulkarni
On Wed, Jun 8, 2016 at 7:46 AM, Leizhen (ThunderTown)
 wrote:
>
>
> On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
>> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
>>  wrote:
>>>
>>>
>>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
 On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  
 wrote:
> Some numa nodes may have no memory. For example:
> 1. cpu0 on node0
> 2. cpu1 on node1
> 3. device0 access the momory from node0 and node1 take the same time.

 i am wondering, if access to both nodes is same, then why you need numa.
 the example you are quoting is against the basic principle of "numa"
 what is device0 here? cpu?
>>> The device0 can also be a cpu. I drew a simple diagram:
>>>
>>>   cpu0 cpu1cpu2/device0
>>> ||  |
>>> ||  |
>>>DDR0 DDR1No DIMM slots or no DIMM plugged
>>>  (node0)  (node1) (node2)
>>>
>>
>> thanks for the clarification. your example is for 3 node system, where
>> third node is memory less node.
>> do you see any issue in supporting this topology with existing code?
> If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
> memoryless node.

i see couple of arch enabled HAVE_MEMORYLESS_NODES, but i don't see
any code in arch specific numa code for this.
is that means the core code will take care of this?

>
> For example, in include/linux/topology.h
> #ifdef CONFIG_HAVE_MEMORYLESS_NODES
> ...
> static inline int cpu_to_mem(int cpu)
> {
> return per_cpu(_numa_mem_, cpu);
> }
> ...
> #else
> ...
> static inline int cpu_to_mem(int cpu)
> {
> return cpu_to_node(cpu);
> }
> ...
> #endif
>
>> I think, this use case should be supported with present code.
>>
>
> So, we can not simply classify device0 to node0 or node1, but we can
> define a node2 which distances to node0 and node1 are the same.
>
> Signed-off-by: Zhen Lei 
> ---
>  arch/arm64/Kconfig  |  4 
>  arch/arm64/kernel/smp.c |  1 +
>  arch/arm64/mm/numa.c| 43 +--
>  3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 05c1bf1..5904a62 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> +config HAVE_MEMORYLESS_NODES
> +   def_bool y
> +   depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index d099306..9e15297 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
> }
>
> bootcpu_valid = true;
> +   early_map_cpu_to_node(0, of_node_to_nid(dn));
>
> /*
>  * cpu_logical_map has already been
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index df5c842..d73b0a0 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, 
> int nid)
> nid = 0;
>
> cpu_to_node_map[cpu] = nid;
> +
> +   /*
> +* We should set the numa node of cpu0 as soon as possible, 
> because it
> +* has already been set up online before. cpu_to_node(0) will 
> soon be
> +* called.
> +*/
> +   if (!cpu)
> +   set_cpu_numa_node(cpu, nid);
>  }
>
>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 
> end)
> return ret;
>  }
>
> +static u64 __init alloc_node_data_from_nearest_node(int nid, const 
> size_t size)
> +{
> +   int i, best_nid, distance;
> +   u64 pa;
> +   DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> +   bitmap_zero(nodes_map, MAX_NUMNODES);
> +   bitmap_set(nodes_map, nid, 1);
> +
> +find_nearest_node:
> +   best_nid = NUMA_NO_NODE;
> +   distance = INT_MAX;
> +
> +   for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> +   if (numa_distance[nid][i] < distance) {
> +   best_nid = i;
> +   distance = numa_distance[nid][i];
> +   }
> +
> +   pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
> +   if (!pa) {
> +   BUG_ON(best_nid == NUMA_NO_NODE);
> +   

Re: [PATCH v10 6/7] usb: pci-quirks: add Intel USB drcfg mux device

2016-06-07 Thread Greg Kroah-Hartman
On Thu, Jun 02, 2016 at 09:37:28AM +0800, Lu Baolu wrote:
> In some Intel platforms, a single usb port is shared between USB host
> and device controllers. The shared port is under control of a switch
> which is defined in the Intel vendor defined extended capability for
> xHCI.
> 
> This patch adds the support to detect and create the platform device
> for the port mux switch.

Why do you need a platform device for this?  You do nothing with this
device, why create it at all?

And why is it a platform device, isn't is really a PCI device?  Why
would you ever find a "platform" device below a PCI device?  Don't abuse
platform devices for things that aren't.  It makes me want to delete
that whole interface more and more...

greg k-h


Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-06-07 Thread Ganapatrao Kulkarni
On Wed, Jun 8, 2016 at 7:46 AM, Leizhen (ThunderTown)
 wrote:
>
>
> On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
>> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
>>  wrote:
>>>
>>>
>>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
 On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  
 wrote:
> Some numa nodes may have no memory. For example:
> 1. cpu0 on node0
> 2. cpu1 on node1
> 3. device0 access the momory from node0 and node1 take the same time.

 i am wondering, if access to both nodes is same, then why you need numa.
 the example you are quoting is against the basic principle of "numa"
 what is device0 here? cpu?
>>> The device0 can also be a cpu. I drew a simple diagram:
>>>
>>>   cpu0 cpu1cpu2/device0
>>> ||  |
>>> ||  |
>>>DDR0 DDR1No DIMM slots or no DIMM plugged
>>>  (node0)  (node1) (node2)
>>>
>>
>> thanks for the clarification. your example is for 3 node system, where
>> third node is memory less node.
>> do you see any issue in supporting this topology with existing code?
> If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
> memoryless node.

i see couple of arch enabled HAVE_MEMORYLESS_NODES, but i don't see
any code in arch specific numa code for this.
is that means the core code will take care of this?

>
> For example, in include/linux/topology.h
> #ifdef CONFIG_HAVE_MEMORYLESS_NODES
> ...
> static inline int cpu_to_mem(int cpu)
> {
> return per_cpu(_numa_mem_, cpu);
> }
> ...
> #else
> ...
> static inline int cpu_to_mem(int cpu)
> {
> return cpu_to_node(cpu);
> }
> ...
> #endif
>
>> I think, this use case should be supported with present code.
>>
>
> So, we can not simply classify device0 to node0 or node1, but we can
> define a node2 which distances to node0 and node1 are the same.
>
> Signed-off-by: Zhen Lei 
> ---
>  arch/arm64/Kconfig  |  4 
>  arch/arm64/kernel/smp.c |  1 +
>  arch/arm64/mm/numa.c| 43 +--
>  3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 05c1bf1..5904a62 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> +config HAVE_MEMORYLESS_NODES
> +   def_bool y
> +   depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index d099306..9e15297 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
> }
>
> bootcpu_valid = true;
> +   early_map_cpu_to_node(0, of_node_to_nid(dn));
>
> /*
>  * cpu_logical_map has already been
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index df5c842..d73b0a0 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, 
> int nid)
> nid = 0;
>
> cpu_to_node_map[cpu] = nid;
> +
> +   /*
> +* We should set the numa node of cpu0 as soon as possible, 
> because it
> +* has already been set up online before. cpu_to_node(0) will 
> soon be
> +* called.
> +*/
> +   if (!cpu)
> +   set_cpu_numa_node(cpu, nid);
>  }
>
>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 
> end)
> return ret;
>  }
>
> +static u64 __init alloc_node_data_from_nearest_node(int nid, const 
> size_t size)
> +{
> +   int i, best_nid, distance;
> +   u64 pa;
> +   DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> +   bitmap_zero(nodes_map, MAX_NUMNODES);
> +   bitmap_set(nodes_map, nid, 1);
> +
> +find_nearest_node:
> +   best_nid = NUMA_NO_NODE;
> +   distance = INT_MAX;
> +
> +   for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> +   if (numa_distance[nid][i] < distance) {
> +   best_nid = i;
> +   distance = numa_distance[nid][i];
> +   }
> +
> +   pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
> +   if (!pa) {
> +   BUG_ON(best_nid == NUMA_NO_NODE);
> +   bitmap_set(nodes_map, best_nid, 1);
> +   goto find_nearest_node;
> +   }
> +

Re: [PATCH v10 1/7] regulator: fixed: add support for ACPI interface

2016-06-07 Thread Greg Kroah-Hartman
On Thu, Jun 02, 2016 at 09:37:23AM +0800, Lu Baolu wrote:
> Add support to retrieve fixed voltage configure information through
> ACPI interface. This is needed for Intel Bay Trail devices, where a
> GPIO is used to control the USB vbus.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/regulator/fixed.c | 46 ++
>  1 file changed, 46 insertions(+)

Can't do anything with this until I get an ack from the "owners" of this
file.

And what happened to the acks from other Intel developers for this whole
patch series, I don't see that here :(

greg k-h


Re: [PATCH v10 1/7] regulator: fixed: add support for ACPI interface

2016-06-07 Thread Greg Kroah-Hartman
On Thu, Jun 02, 2016 at 09:37:23AM +0800, Lu Baolu wrote:
> Add support to retrieve fixed voltage configure information through
> ACPI interface. This is needed for Intel Bay Trail devices, where a
> GPIO is used to control the USB vbus.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/regulator/fixed.c | 46 ++
>  1 file changed, 46 insertions(+)

Can't do anything with this until I get an ack from the "owners" of this
file.

And what happened to the acks from other Intel developers for this whole
patch series, I don't see that here :(

greg k-h


[PATCH v3 2/2] ARM: at91/dt: sama5d2: Use new compatible for ohci node

2016-06-07 Thread Wenyou Yang
Use compatible "atmel,sama5d2-ohci" to be capable of suspending
ports while sleep to save the power consumption.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2:
 - Use the new compatible for ohci-node.

 arch/arm/boot/dts/sama5d2.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sama5d2.dtsi b/arch/arm/boot/dts/sama5d2.dtsi
index 78996bd..03d6724 100644
--- a/arch/arm/boot/dts/sama5d2.dtsi
+++ b/arch/arm/boot/dts/sama5d2.dtsi
@@ -232,7 +232,7 @@
};
 
usb1: ohci@0040 {
-   compatible = "atmel,at91rm9200-ohci", "usb-ohci";
+   compatible = "atmel,sama5d2-ohci", "usb-ohci";
reg = <0x0040 0x10>;
interrupts = <41 IRQ_TYPE_LEVEL_HIGH 2>;
clocks = <_clk>, <_clk>, <>;
-- 
2.7.4



[PATCH v3 2/2] ARM: at91/dt: sama5d2: Use new compatible for ohci node

2016-06-07 Thread Wenyou Yang
Use compatible "atmel,sama5d2-ohci" to be capable of suspending
ports while sleep to save the power consumption.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2:
 - Use the new compatible for ohci-node.

 arch/arm/boot/dts/sama5d2.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sama5d2.dtsi b/arch/arm/boot/dts/sama5d2.dtsi
index 78996bd..03d6724 100644
--- a/arch/arm/boot/dts/sama5d2.dtsi
+++ b/arch/arm/boot/dts/sama5d2.dtsi
@@ -232,7 +232,7 @@
};
 
usb1: ohci@0040 {
-   compatible = "atmel,at91rm9200-ohci", "usb-ohci";
+   compatible = "atmel,sama5d2-ohci", "usb-ohci";
reg = <0x0040 0x10>;
interrupts = <41 IRQ_TYPE_LEVEL_HIGH 2>;
clocks = <_clk>, <_clk>, <>;
-- 
2.7.4



[PATCH v3 1/2] usb: ohci-at91: Forcibly suspend ports while USB suspend

2016-06-07 Thread Wenyou Yang
In order to the save power consumption, as a workaround, suspend
forcibly the USB PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI
Interrupt Configuration Register in the SFRs while OHCI USB suspend.

This suspend operation must be done before the USB clock is disabled,
resume after the USB clock is enabled.

Signed-off-by: Wenyou Yang 
---

Changes in v3:
 - Change the compatible description for more precise.

Changes in v2:
 - Add compatible to support forcibly suspend the ports.
 - Add soc/at91/at91_sfr.h to accommodate the defines.
 - Add error checking for .sfr_regmap.
 - Remove unnecessary regmap_read() statement.

 .../devicetree/bindings/usb/atmel-usb.txt  |  6 +-
 drivers/usb/host/ohci-at91.c   | 80 +-
 include/soc/at91/at91_sfr.h| 29 
 3 files changed, 112 insertions(+), 3 deletions(-)
 create mode 100644 include/soc/at91/at91_sfr.h

diff --git a/Documentation/devicetree/bindings/usb/atmel-usb.txt 
b/Documentation/devicetree/bindings/usb/atmel-usb.txt
index 5883b73..888deaa 100644
--- a/Documentation/devicetree/bindings/usb/atmel-usb.txt
+++ b/Documentation/devicetree/bindings/usb/atmel-usb.txt
@@ -3,8 +3,10 @@ Atmel SOC USB controllers
 OHCI
 
 Required properties:
- - compatible: Should be "atmel,at91rm9200-ohci" for USB controllers
-   used in host mode.
+ - compatible: Should be one of the following
+  "atmel,at91rm9200-ohci" for USB controllers used in host mode.
+  "atmel,sama5d2-ohci" for USB controllers used in host mode
+  on SAMA5D2 which can force to suspend.
  - reg: Address and length of the register set for the device
  - interrupts: Should contain ehci interrupt
  - clocks: Should reference the peripheral, host and system clocks
diff --git a/drivers/usb/host/ohci-at91.c b/drivers/usb/host/ohci-at91.c
index d177372..54e8feb 100644
--- a/drivers/usb/host/ohci-at91.c
+++ b/drivers/usb/host/ohci-at91.c
@@ -21,8 +21,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
+#include 
 
 #include "ohci.h"
 
@@ -45,12 +48,18 @@ struct at91_usbh_data {
u8 overcurrent_changed[AT91_MAX_USBH_PORTS];
 };
 
+struct ohci_at91_caps {
+   bool suspend_ctrl;
+};
+
 struct ohci_at91_priv {
struct clk *iclk;
struct clk *fclk;
struct clk *hclk;
bool clocked;
bool wakeup;/* Saved wake-up state for resume */
+   const struct ohci_at91_caps *caps;
+   struct regmap *sfr_regmap;
 };
 /* interface and function clocks; sometimes also an AHB clock */
 
@@ -132,6 +141,17 @@ static void at91_stop_hc(struct platform_device *pdev)
 
 /*-*/
 
+struct regmap *at91_dt_syscon_sfr(void)
+{
+   struct regmap *regmap;
+
+   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
+   if (IS_ERR(regmap))
+   regmap = NULL;
+
+   return regmap;
+}
+
 static void usb_hcd_at91_remove (struct usb_hcd *, struct platform_device *);
 
 /* configure so an HC device and id are always provided */
@@ -197,6 +217,17 @@ static int usb_hcd_at91_probe(const struct hc_driver 
*driver,
goto err;
}
 
+   ohci_at91->caps = (const struct ohci_at91_caps *)
+ of_device_get_match_data(>dev);
+   if (!ohci_at91->caps)
+   return -ENODEV;
+
+   if (ohci_at91->caps->suspend_ctrl) {
+   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
+   if (!ohci_at91->sfr_regmap)
+   dev_warn(dev, "failed to find sfr node\n");
+   }
+
board = hcd->self.controller->platform_data;
ohci = hcd_to_ohci(hcd);
ohci->num_ports = board->ports;
@@ -440,8 +471,17 @@ static irqreturn_t ohci_hcd_at91_overcurrent_irq(int irq, 
void *data)
return IRQ_HANDLED;
 }
 
+static const struct ohci_at91_caps at91rm9200_caps = {
+   .suspend_ctrl = false,
+};
+
+static const struct ohci_at91_caps sama5d2_caps = {
+   .suspend_ctrl = true,
+};
+
 static const struct of_device_id at91_ohci_dt_ids[] = {
-   { .compatible = "atmel,at91rm9200-ohci" },
+   { .compatible = "atmel,at91rm9200-ohci", .data = _caps },
+   { .compatible = "atmel,sama5d2-ohci", .data = _caps },
{ /* sentinel */ }
 };
 
@@ -581,6 +621,38 @@ static int ohci_hcd_at91_drv_remove(struct platform_device 
*pdev)
return 0;
 }
 
+static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable)
+{
+   u32 regval;
+   int ret;
+
+   if (!regmap)
+   return -EINVAL;
+
+   ret = regmap_read(regmap, SFR_OHCIICR, );
+   if (ret)
+   return ret;
+
+   if (enable)
+   regval &= ~SFR_OHCIICR_USB_SUSPEND;
+   else
+   regval |= SFR_OHCIICR_USB_SUSPEND;
+
+   regmap_write(regmap, SFR_OHCIICR, regval);
+
+   return 0;
+}
+

[PATCH v3 0/2] ARM: ohci-at91: Add support to forcibly suspend ports while sleep

2016-06-07 Thread Wenyou Yang
To save the power consumption, add a new compatible to support forcibly
suspend the USB PORTA/B/C via OHCI Interrupt Configuration SFR Register.

Changes in v3:
 - Change the compatible description for more precise.

Changes in v2:
 - Add compatible to support forcibly suspend the ports.
 - Add soc/at91/at91_sfr.h to accommodate the defines.
 - Add error checking for .sfr_regmap.
 - Remove unnecessary regmap_read() statement.
 - Use the new compatible for ohci-node.

Wenyou Yang (2):
  usb: ohci-at91: Forcibly suspend ports while USB suspend
  ARM: at91/dt: sama5d2: Use new compatible for ohci node

 .../devicetree/bindings/usb/atmel-usb.txt  |  6 +-
 arch/arm/boot/dts/sama5d2.dtsi |  2 +-
 drivers/usb/host/ohci-at91.c   | 80 +-
 include/soc/at91/at91_sfr.h| 29 
 4 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 include/soc/at91/at91_sfr.h

-- 
2.7.4



[PATCH v3 0/2] ARM: ohci-at91: Add support to forcibly suspend ports while sleep

2016-06-07 Thread Wenyou Yang
To save the power consumption, add a new compatible to support forcibly
suspend the USB PORTA/B/C via OHCI Interrupt Configuration SFR Register.

Changes in v3:
 - Change the compatible description for more precise.

Changes in v2:
 - Add compatible to support forcibly suspend the ports.
 - Add soc/at91/at91_sfr.h to accommodate the defines.
 - Add error checking for .sfr_regmap.
 - Remove unnecessary regmap_read() statement.
 - Use the new compatible for ohci-node.

Wenyou Yang (2):
  usb: ohci-at91: Forcibly suspend ports while USB suspend
  ARM: at91/dt: sama5d2: Use new compatible for ohci node

 .../devicetree/bindings/usb/atmel-usb.txt  |  6 +-
 arch/arm/boot/dts/sama5d2.dtsi |  2 +-
 drivers/usb/host/ohci-at91.c   | 80 +-
 include/soc/at91/at91_sfr.h| 29 
 4 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 include/soc/at91/at91_sfr.h

-- 
2.7.4



[PATCH v3 1/2] usb: ohci-at91: Forcibly suspend ports while USB suspend

2016-06-07 Thread Wenyou Yang
In order to the save power consumption, as a workaround, suspend
forcibly the USB PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI
Interrupt Configuration Register in the SFRs while OHCI USB suspend.

This suspend operation must be done before the USB clock is disabled,
resume after the USB clock is enabled.

Signed-off-by: Wenyou Yang 
---

Changes in v3:
 - Change the compatible description for more precise.

Changes in v2:
 - Add compatible to support forcibly suspend the ports.
 - Add soc/at91/at91_sfr.h to accommodate the defines.
 - Add error checking for .sfr_regmap.
 - Remove unnecessary regmap_read() statement.

 .../devicetree/bindings/usb/atmel-usb.txt  |  6 +-
 drivers/usb/host/ohci-at91.c   | 80 +-
 include/soc/at91/at91_sfr.h| 29 
 3 files changed, 112 insertions(+), 3 deletions(-)
 create mode 100644 include/soc/at91/at91_sfr.h

diff --git a/Documentation/devicetree/bindings/usb/atmel-usb.txt 
b/Documentation/devicetree/bindings/usb/atmel-usb.txt
index 5883b73..888deaa 100644
--- a/Documentation/devicetree/bindings/usb/atmel-usb.txt
+++ b/Documentation/devicetree/bindings/usb/atmel-usb.txt
@@ -3,8 +3,10 @@ Atmel SOC USB controllers
 OHCI
 
 Required properties:
- - compatible: Should be "atmel,at91rm9200-ohci" for USB controllers
-   used in host mode.
+ - compatible: Should be one of the following
+  "atmel,at91rm9200-ohci" for USB controllers used in host mode.
+  "atmel,sama5d2-ohci" for USB controllers used in host mode
+  on SAMA5D2 which can force to suspend.
  - reg: Address and length of the register set for the device
  - interrupts: Should contain ehci interrupt
  - clocks: Should reference the peripheral, host and system clocks
diff --git a/drivers/usb/host/ohci-at91.c b/drivers/usb/host/ohci-at91.c
index d177372..54e8feb 100644
--- a/drivers/usb/host/ohci-at91.c
+++ b/drivers/usb/host/ohci-at91.c
@@ -21,8 +21,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
+#include 
 
 #include "ohci.h"
 
@@ -45,12 +48,18 @@ struct at91_usbh_data {
u8 overcurrent_changed[AT91_MAX_USBH_PORTS];
 };
 
+struct ohci_at91_caps {
+   bool suspend_ctrl;
+};
+
 struct ohci_at91_priv {
struct clk *iclk;
struct clk *fclk;
struct clk *hclk;
bool clocked;
bool wakeup;/* Saved wake-up state for resume */
+   const struct ohci_at91_caps *caps;
+   struct regmap *sfr_regmap;
 };
 /* interface and function clocks; sometimes also an AHB clock */
 
@@ -132,6 +141,17 @@ static void at91_stop_hc(struct platform_device *pdev)
 
 /*-*/
 
+struct regmap *at91_dt_syscon_sfr(void)
+{
+   struct regmap *regmap;
+
+   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
+   if (IS_ERR(regmap))
+   regmap = NULL;
+
+   return regmap;
+}
+
 static void usb_hcd_at91_remove (struct usb_hcd *, struct platform_device *);
 
 /* configure so an HC device and id are always provided */
@@ -197,6 +217,17 @@ static int usb_hcd_at91_probe(const struct hc_driver 
*driver,
goto err;
}
 
+   ohci_at91->caps = (const struct ohci_at91_caps *)
+ of_device_get_match_data(>dev);
+   if (!ohci_at91->caps)
+   return -ENODEV;
+
+   if (ohci_at91->caps->suspend_ctrl) {
+   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
+   if (!ohci_at91->sfr_regmap)
+   dev_warn(dev, "failed to find sfr node\n");
+   }
+
board = hcd->self.controller->platform_data;
ohci = hcd_to_ohci(hcd);
ohci->num_ports = board->ports;
@@ -440,8 +471,17 @@ static irqreturn_t ohci_hcd_at91_overcurrent_irq(int irq, 
void *data)
return IRQ_HANDLED;
 }
 
+static const struct ohci_at91_caps at91rm9200_caps = {
+   .suspend_ctrl = false,
+};
+
+static const struct ohci_at91_caps sama5d2_caps = {
+   .suspend_ctrl = true,
+};
+
 static const struct of_device_id at91_ohci_dt_ids[] = {
-   { .compatible = "atmel,at91rm9200-ohci" },
+   { .compatible = "atmel,at91rm9200-ohci", .data = _caps },
+   { .compatible = "atmel,sama5d2-ohci", .data = _caps },
{ /* sentinel */ }
 };
 
@@ -581,6 +621,38 @@ static int ohci_hcd_at91_drv_remove(struct platform_device 
*pdev)
return 0;
 }
 
+static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable)
+{
+   u32 regval;
+   int ret;
+
+   if (!regmap)
+   return -EINVAL;
+
+   ret = regmap_read(regmap, SFR_OHCIICR, );
+   if (ret)
+   return ret;
+
+   if (enable)
+   regval &= ~SFR_OHCIICR_USB_SUSPEND;
+   else
+   regval |= SFR_OHCIICR_USB_SUSPEND;
+
+   regmap_write(regmap, SFR_OHCIICR, regval);
+
+   return 0;
+}
+
+static int 

[PATCH] mmc: dw_mmc: remove UBSAN warning in dw_mci_setup_bus()

2016-06-07 Thread Seung-Woo Kim
This patch removes following UBSAN warnings in dw_mci_setup_bus().
The warnings are caused because of shift with more than 31 on 32
bit variable, so this patch fixes to shift only for less than 32.

  UBSAN: Undefined behaviour in drivers/mmc/host/dw_mmc.c:1102:14
  shift exponent 250 is too large for 32-bit type 'unsigned int'
  Call trace:
  [] dump_backtrace+0x0/0x380
  [] show_stack+0x14/0x20
  [] dump_stack+0xe0/0x120
  [] ubsan_epilogue+0x18/0x68
  [] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
  [] dw_mci_setup_bus+0x3a0/0x438
  [...]

  UBSAN: Undefined behaviour in drivers/mmc/host/dw_mmc.c:1132:27
  shift exponent 250 is too large for 32-bit type 'unsigned int'
  Call trace:
  [] dump_backtrace+0x0/0x380
  [] show_stack+0x14/0x20
  [] dump_stack+0xe0/0x120
  [] ubsan_epilogue+0x18/0x68
  [] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
  [] dw_mci_setup_bus+0x384/0x438
  [] dw_mci_set_ios+0x184/0x798
  [] mmc_power_up+0x11c/0x260
  [] mmc_start_host+0x88/0x100
  [] mmc_add_host+0x6c/0x128
  [] dw_mci_probe+0x1088/0x1750
  [] dw_mci_pltfm_register+0x108/0x178
  [] dw_mci_exynos_probe+0x4c/0x88
  [] platform_drv_probe+0x78/0x180
  [] driver_probe_device+0x144/0x460
  [] __driver_attach+0xf4/0x140
  [] bus_for_each_dev+0xf0/0x160
  [] driver_attach+0x34/0x58
  [] bus_add_driver+0x2c0/0x398
  [] driver_register+0xbc/0x1e0
  [] __platform_driver_register+0x84/0xa8
  [] dw_mci_exynos_pltfm_driver_init+0x18/0x20
  [] do_one_initcall+0xa0/0x2c8
  [] kernel_init_freeable+0x52c/0x5dc
  [] kernel_init+0x1c/0xf8
  [] ret_from_fork+0x10/0x40

Signed-off-by: Seung-Woo Kim 
---
 drivers/mmc/host/dw_mmc.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 2cc6123..dff045e 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1099,7 +1099,8 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, 
bool force_clkinit)
 
div = (host->bus_hz != clock) ? DIV_ROUND_UP(div, 2) : 0;
 
-   if ((clock << div) != slot->__clk_old || force_clkinit)
+   if (((div < 32) ? (clock << div) : 0) != slot->__clk_old ||
+   force_clkinit)
dev_info(>mmc->class_dev,
 "Bus speed (slot %d) = %dHz (slot req %dHz, 
actual %dHZ div = %d)\n",
 slot->id, host->bus_hz, clock,
@@ -1129,7 +1130,7 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, 
bool force_clkinit)
mci_send_cmd(slot, sdmmc_cmd_bits, 0);
 
/* keep the clock with reflecting clock dividor */
-   slot->__clk_old = clock << div;
+   slot->__clk_old = (div < 32) ? (clock << div) : 0;
}
 
host->current_speed = clock;
-- 
1.7.4.1



[PATCH] mmc: dw_mmc: remove UBSAN warning in dw_mci_setup_bus()

2016-06-07 Thread Seung-Woo Kim
This patch removes following UBSAN warnings in dw_mci_setup_bus().
The warnings are caused because of shift with more than 31 on 32
bit variable, so this patch fixes to shift only for less than 32.

  UBSAN: Undefined behaviour in drivers/mmc/host/dw_mmc.c:1102:14
  shift exponent 250 is too large for 32-bit type 'unsigned int'
  Call trace:
  [] dump_backtrace+0x0/0x380
  [] show_stack+0x14/0x20
  [] dump_stack+0xe0/0x120
  [] ubsan_epilogue+0x18/0x68
  [] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
  [] dw_mci_setup_bus+0x3a0/0x438
  [...]

  UBSAN: Undefined behaviour in drivers/mmc/host/dw_mmc.c:1132:27
  shift exponent 250 is too large for 32-bit type 'unsigned int'
  Call trace:
  [] dump_backtrace+0x0/0x380
  [] show_stack+0x14/0x20
  [] dump_stack+0xe0/0x120
  [] ubsan_epilogue+0x18/0x68
  [] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
  [] dw_mci_setup_bus+0x384/0x438
  [] dw_mci_set_ios+0x184/0x798
  [] mmc_power_up+0x11c/0x260
  [] mmc_start_host+0x88/0x100
  [] mmc_add_host+0x6c/0x128
  [] dw_mci_probe+0x1088/0x1750
  [] dw_mci_pltfm_register+0x108/0x178
  [] dw_mci_exynos_probe+0x4c/0x88
  [] platform_drv_probe+0x78/0x180
  [] driver_probe_device+0x144/0x460
  [] __driver_attach+0xf4/0x140
  [] bus_for_each_dev+0xf0/0x160
  [] driver_attach+0x34/0x58
  [] bus_add_driver+0x2c0/0x398
  [] driver_register+0xbc/0x1e0
  [] __platform_driver_register+0x84/0xa8
  [] dw_mci_exynos_pltfm_driver_init+0x18/0x20
  [] do_one_initcall+0xa0/0x2c8
  [] kernel_init_freeable+0x52c/0x5dc
  [] kernel_init+0x1c/0xf8
  [] ret_from_fork+0x10/0x40

Signed-off-by: Seung-Woo Kim 
---
 drivers/mmc/host/dw_mmc.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 2cc6123..dff045e 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1099,7 +1099,8 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, 
bool force_clkinit)
 
div = (host->bus_hz != clock) ? DIV_ROUND_UP(div, 2) : 0;
 
-   if ((clock << div) != slot->__clk_old || force_clkinit)
+   if (((div < 32) ? (clock << div) : 0) != slot->__clk_old ||
+   force_clkinit)
dev_info(>mmc->class_dev,
 "Bus speed (slot %d) = %dHz (slot req %dHz, 
actual %dHZ div = %d)\n",
 slot->id, host->bus_hz, clock,
@@ -1129,7 +1130,7 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, 
bool force_clkinit)
mci_send_cmd(slot, sdmmc_cmd_bits, 0);
 
/* keep the clock with reflecting clock dividor */
-   slot->__clk_old = clock << div;
+   slot->__clk_old = (div < 32) ? (clock << div) : 0;
}
 
host->current_speed = clock;
-- 
1.7.4.1



[PATCH 1/1] perf/x86/intel: Add extended event constraints for Knights Landing

2016-06-07 Thread Lukasz Odzioba
For Knights Landing processor we need to filter OFFCORE_RESPONSE
events by config1 parameter to make sure that it will end up in
an appropriate PMC to meet specification.

On Knights Landing:
MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1
MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0

This patch introduces INTEL_EEVENT_CONSTRAINT where third parameter
specifies extended config bits allowed only on given PMCs.

Patch depends on "Change offcore response masks for Knights Landing"

Reported-by: Andi Kleen 
Acked-by: Andi Kleen 
Signed-off-by: Lukasz Odzioba 
---
 arch/x86/events/core.c |  3 ++-
 arch/x86/events/intel/core.c   | 17 ++---
 arch/x86/events/intel/uncore.c |  2 +-
 arch/x86/events/perf_event.h   | 41 -
 4 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 33787ee..a4be71c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -122,6 +122,7 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event 
*event)
continue;
if (event->attr.config1 & ~er->valid_mask)
return -EINVAL;
+
/* Check if the extra msrs can be safely accessed*/
if (!er->extra_msr_access)
return -ENXIO;
@@ -1736,7 +1737,7 @@ static int __init init_hw_perf_events(void)
 
unconstrained = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
-  0, x86_pmu.num_counters, 0, 0);
+  0, x86_pmu.num_counters, 0, 0, 0);
 
x86_pmu_format_group.attrs = x86_pmu.format_attrs;
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7c66695..794f5c8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -177,6 +177,17 @@ static struct event_constraint 
intel_slm_event_constraints[] __read_mostly =
EVENT_CONSTRAINT_END
 };
 
+static struct event_constraint intel_knl_event_constraints[] __read_mostly = {
+   FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+   FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+   FIXED_EVENT_CONSTRAINT(0x0300, 2), /* pseudo CPU_CLK_UNHALTED.REF */
+   /* MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1 */
+   INTEL_EEVENT_CONSTRAINT(0x02b7, 2, 0x4900),
+   /* MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0 */
+   INTEL_EEVENT_CONSTRAINT(0x01b7, 1, 1ull<<38),
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint intel_skl_event_constraints[] = {
FIXED_EVENT_CONSTRAINT(0x00c0, 0),  /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1),  /* CPU_CLK_UNHALTED.CORE */
@@ -2284,16 +2295,16 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
  struct perf_event *event)
 {
struct event_constraint *c;
-
if (x86_pmu.event_constraints) {
for_each_event_constraint(c, x86_pmu.event_constraints) {
if ((event->hw.config & c->cmask) == c->code) {
+   if (c->emask && !(c->emask & 
event->attr.config1))
+   continue;
event->hw.flags |= c->flags;
return c;
}
}
}
-
return 
 }
 
@@ -3784,7 +3795,7 @@ __init int intel_pmu_init(void)
   knl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_knl();
 
-   x86_pmu.event_constraints = intel_slm_event_constraints;
+   x86_pmu.event_constraints = intel_knl_event_constraints;
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
x86_pmu.extra_regs = intel_knl_extra_regs;
 
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index fce7406..fc5b866 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -839,7 +839,7 @@ static int __init uncore_type_init(struct intel_uncore_type 
*type, bool setid)
type->pmus = pmus;
type->unconstrainted = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-   0, type->num_counters, 0, 0);
+   0, type->num_counters, 0, 0, 0);
 
if (type->event_descs) {
for (i = 0; type->event_descs[i].attr.attr.name; i++);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8bd764d..47241ed5 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -52,6 +52,7 @@ struct event_constraint {
int weight;
int overlap;
  

[PATCH 1/1] perf/x86/intel: Add extended event constraints for Knights Landing

2016-06-07 Thread Lukasz Odzioba
For Knights Landing processor we need to filter OFFCORE_RESPONSE
events by config1 parameter to make sure that it will end up in
an appropriate PMC to meet specification.

On Knights Landing:
MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1
MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0

This patch introduces INTEL_EEVENT_CONSTRAINT where third parameter
specifies extended config bits allowed only on given PMCs.

Patch depends on "Change offcore response masks for Knights Landing"

Reported-by: Andi Kleen 
Acked-by: Andi Kleen 
Signed-off-by: Lukasz Odzioba 
---
 arch/x86/events/core.c |  3 ++-
 arch/x86/events/intel/core.c   | 17 ++---
 arch/x86/events/intel/uncore.c |  2 +-
 arch/x86/events/perf_event.h   | 41 -
 4 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 33787ee..a4be71c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -122,6 +122,7 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event 
*event)
continue;
if (event->attr.config1 & ~er->valid_mask)
return -EINVAL;
+
/* Check if the extra msrs can be safely accessed*/
if (!er->extra_msr_access)
return -ENXIO;
@@ -1736,7 +1737,7 @@ static int __init init_hw_perf_events(void)
 
unconstrained = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
-  0, x86_pmu.num_counters, 0, 0);
+  0, x86_pmu.num_counters, 0, 0, 0);
 
x86_pmu_format_group.attrs = x86_pmu.format_attrs;
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7c66695..794f5c8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -177,6 +177,17 @@ static struct event_constraint 
intel_slm_event_constraints[] __read_mostly =
EVENT_CONSTRAINT_END
 };
 
+static struct event_constraint intel_knl_event_constraints[] __read_mostly = {
+   FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+   FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+   FIXED_EVENT_CONSTRAINT(0x0300, 2), /* pseudo CPU_CLK_UNHALTED.REF */
+   /* MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1 */
+   INTEL_EEVENT_CONSTRAINT(0x02b7, 2, 0x4900),
+   /* MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0 */
+   INTEL_EEVENT_CONSTRAINT(0x01b7, 1, 1ull<<38),
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint intel_skl_event_constraints[] = {
FIXED_EVENT_CONSTRAINT(0x00c0, 0),  /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1),  /* CPU_CLK_UNHALTED.CORE */
@@ -2284,16 +2295,16 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
  struct perf_event *event)
 {
struct event_constraint *c;
-
if (x86_pmu.event_constraints) {
for_each_event_constraint(c, x86_pmu.event_constraints) {
if ((event->hw.config & c->cmask) == c->code) {
+   if (c->emask && !(c->emask & 
event->attr.config1))
+   continue;
event->hw.flags |= c->flags;
return c;
}
}
}
-
return 
 }
 
@@ -3784,7 +3795,7 @@ __init int intel_pmu_init(void)
   knl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_knl();
 
-   x86_pmu.event_constraints = intel_slm_event_constraints;
+   x86_pmu.event_constraints = intel_knl_event_constraints;
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
x86_pmu.extra_regs = intel_knl_extra_regs;
 
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index fce7406..fc5b866 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -839,7 +839,7 @@ static int __init uncore_type_init(struct intel_uncore_type 
*type, bool setid)
type->pmus = pmus;
type->unconstrainted = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-   0, type->num_counters, 0, 0);
+   0, type->num_counters, 0, 0, 0);
 
if (type->event_descs) {
for (i = 0; type->event_descs[i].attr.attr.name; i++);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8bd764d..47241ed5 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -52,6 +52,7 @@ struct event_constraint {
int weight;
int overlap;
int flags;
+   u64 emask;
 };
 /*
  * struct 

Re: [PATCH 06/10] drm/amdgpu: use drm_crtc_vblank_{on,off}()

2016-06-07 Thread Michel Dänzer
On 07.06.2016 23:07, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Replace the legacy drm_vblank_{on,off}() with the new helper functions.
> 
> Signed-off-by: Gustavo Padovan 

Patches 6 & 8-10 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: Files leak from nfsd in 4.7.1-rc1 (and more?)

2016-06-07 Thread Oleg Drokin

On Jun 7, 2016, at 10:22 PM, Oleg Drokin wrote:

> 
> On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote:
> 
 That said, this code is quite subtle. I'd need to look over it in more
 detail before I offer up any fixes. I'd also appreciate it if anyone
 else wants to sanity check my analysis there.
 
>> Yeah, I think you're right. It's fine since r/w opens have a distinct
>> slot, even though the refcounting just tracks the number of read and
>> write references. So yeah, the leak probably is in an error path
>> someplace, or maybe a race someplace.
> 
> So I noticed that set_access is always called locked, but clear_access is not,
> this does not sound right.
> 
> So I placed this strategic WARN_ON:
> @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, 
> struct nfs4_file *fp,
>goto out_put_access;
>spin_lock(>fi_lock);
>if (!fp->fi_fds[oflag]) {
> +WARN_ON(!test_access(open->op_share_access, stp));
>fp->fi_fds[oflag] = filp;
>filp = NULL;
> 
> This is right in the place where nfsd set the access flag already, discovered
> that the file is not opened and went on to open it, yet some parallel thread
> came in and cleared the flag by the time we got the file opened.
> It did trigger (but there are 30 minutes left till test finish, so I don't
> know yet if this will correspond to the problem at hand yet, so below is 
> speculation).

Duh, I looked for a warning, but did not cross reference, and it was not this 
one that
hit yet.

Though apparently I am hitting some of the "impossible" warnings, so you might 
want to
look into that anyway.

status = nfsd4_process_open2(rqstp, resfh, open);
WARN(status && open->op_created,
 "nfsd4_process_open2 failed to open newly-created file! 
status=%u\n",
 be32_to_cpu(status));

and

filp = find_readable_file(fp);
if (!filp) {
/* We should always have a readable file here */
WARN_ON_ONCE(1);
locks_free_lock(fl);
return -EBADF;
}



Re: [PATCH 06/10] drm/amdgpu: use drm_crtc_vblank_{on,off}()

2016-06-07 Thread Michel Dänzer
On 07.06.2016 23:07, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Replace the legacy drm_vblank_{on,off}() with the new helper functions.
> 
> Signed-off-by: Gustavo Padovan 

Patches 6 & 8-10 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: Files leak from nfsd in 4.7.1-rc1 (and more?)

2016-06-07 Thread Oleg Drokin

On Jun 7, 2016, at 10:22 PM, Oleg Drokin wrote:

> 
> On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote:
> 
 That said, this code is quite subtle. I'd need to look over it in more
 detail before I offer up any fixes. I'd also appreciate it if anyone
 else wants to sanity check my analysis there.
 
>> Yeah, I think you're right. It's fine since r/w opens have a distinct
>> slot, even though the refcounting just tracks the number of read and
>> write references. So yeah, the leak probably is in an error path
>> someplace, or maybe a race someplace.
> 
> So I noticed that set_access is always called locked, but clear_access is not,
> this does not sound right.
> 
> So I placed this strategic WARN_ON:
> @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, 
> struct nfs4_file *fp,
>goto out_put_access;
>spin_lock(>fi_lock);
>if (!fp->fi_fds[oflag]) {
> +WARN_ON(!test_access(open->op_share_access, stp));
>fp->fi_fds[oflag] = filp;
>filp = NULL;
> 
> This is right in the place where nfsd set the access flag already, discovered
> that the file is not opened and went on to open it, yet some parallel thread
> came in and cleared the flag by the time we got the file opened.
> It did trigger (but there are 30 minutes left till test finish, so I don't
> know yet if this will correspond to the problem at hand yet, so below is 
> speculation).

Duh, I looked for a warning, but did not cross reference, and it was not this 
one that
hit yet.

Though apparently I am hitting some of the "impossible" warnings, so you might 
want to
look into that anyway.

status = nfsd4_process_open2(rqstp, resfh, open);
WARN(status && open->op_created,
 "nfsd4_process_open2 failed to open newly-created file! 
status=%u\n",
 be32_to_cpu(status));

and

filp = find_readable_file(fp);
if (!filp) {
/* We should always have a readable file here */
WARN_ON_ONCE(1);
locks_free_lock(fl);
return -EBADF;
}



Re: [PATCH V3 8/9] cpufreq: Keep policy->freq_table sorted in ascending order

2016-06-07 Thread Viresh Kumar
On 08-06-16, 02:38, Rafael J. Wysocki wrote:
> On Tuesday, June 07, 2016 09:58:07 AM Viresh Kumar wrote:
> > On 06-06-16, 23:56, Rafael J. Wysocki wrote:
> > > Since you are adding new code, you can write it so it doesn't do
> > > unnecessary checks from the start.
> > 
> > Hmm, I will do all that in this series only now.
> > 
> > > While at it, the "if ((freq < policy->min) || (freq > policy->max))"
> > > checks in cpufreq_find_index_l() and cpufreq_find_index_h() don't look
> > > good to me, because they very well may cause those function to return
> > > -EINVAL even when there's a valid table and that may cause
> > > acpi_cpufreq_fast_switch() to do bad things.
> > 
> > Hmm. So, the checks are for sure required here, otherwise we may end up
> > returning a frequency which we aren't allowed to. Also note that 'freq' here
> > isn't the target-freq, but the entry in the freq-table.
> > 
> > This routine should be returning a valid freq within the ranges specified by
> > policy->min/max.
> 
> Which in principle may not be possible if the range doesn't include any
> frequency in the table, eg. min == max and between the table entries.

By within ranges I meant, policy->min <= freq <= policy->max, and that's how all
our checks are. So even if the table will have a single valid frequency, we will
return that only.

> However, the CPU has to run at *some* frequency, even if there's none in the
> min/max range.

I completely agree. But the error will be fired only if there is no frequency
within ranges we can switch to. And that's a bug somewhere else then.

> And if we are sure that there is at least one valid frequency between min
> and max, please note that target_freq has already been clamped between them,

Yeah, its already clamped by the freq-change helpers in cpufreq core, but others
may not be doing it properly.

> > Also note that these routines shall *never* return -EINVAL, otherwise it is
> > mostly a bug we are hitting.
> 
> So make them explicitly return a valid frequency every time.

I thought about return Index 0 on such errors, will that be fine ? Anyway the
new patches have added a WARN() for such cases.

> > We have enough checks in place to make sure that there is at least one valid
> > entry in the freq-table which is >= policy->min and <= policy->max.
> 
> That assuming that the driver will always do the right thing in its ->verify
> callback.

Yeah.

-- 
viresh


Re: [PATCH V3 8/9] cpufreq: Keep policy->freq_table sorted in ascending order

2016-06-07 Thread Viresh Kumar
On 08-06-16, 02:38, Rafael J. Wysocki wrote:
> On Tuesday, June 07, 2016 09:58:07 AM Viresh Kumar wrote:
> > On 06-06-16, 23:56, Rafael J. Wysocki wrote:
> > > Since you are adding new code, you can write it so it doesn't do
> > > unnecessary checks from the start.
> > 
> > Hmm, I will do all that in this series only now.
> > 
> > > While at it, the "if ((freq < policy->min) || (freq > policy->max))"
> > > checks in cpufreq_find_index_l() and cpufreq_find_index_h() don't look
> > > good to me, because they very well may cause those function to return
> > > -EINVAL even when there's a valid table and that may cause
> > > acpi_cpufreq_fast_switch() to do bad things.
> > 
> > Hmm. So, the checks are for sure required here, otherwise we may end up
> > returning a frequency which we aren't allowed to. Also note that 'freq' here
> > isn't the target-freq, but the entry in the freq-table.
> > 
> > This routine should be returning a valid freq within the ranges specified by
> > policy->min/max.
> 
> Which in principle may not be possible if the range doesn't include any
> frequency in the table, eg. min == max and between the table entries.

By within ranges I meant, policy->min <= freq <= policy->max, and that's how all
our checks are. So even if the table will have a single valid frequency, we will
return that only.

> However, the CPU has to run at *some* frequency, even if there's none in the
> min/max range.

I completely agree. But the error will be fired only if there is no frequency
within ranges we can switch to. And that's a bug somewhere else then.

> And if we are sure that there is at least one valid frequency between min
> and max, please note that target_freq has already been clamped between them,

Yeah, its already clamped by the freq-change helpers in cpufreq core, but others
may not be doing it properly.

> > Also note that these routines shall *never* return -EINVAL, otherwise it is
> > mostly a bug we are hitting.
> 
> So make them explicitly return a valid frequency every time.

I thought about return Index 0 on such errors, will that be fine ? Anyway the
new patches have added a WARN() for such cases.

> > We have enough checks in place to make sure that there is at least one valid
> > entry in the freq-table which is >= policy->min and <= policy->max.
> 
> That assuming that the driver will always do the right thing in its ->verify
> callback.

Yeah.

-- 
viresh


[PATCH v2] udp reuseport: fix packet of same flow hashed to different socket

2016-06-07 Thread Su Xuemin
From: "Su, Xuemin" 

There is a corner case in which udp packets belonging to a same
flow are hashed to different socket when hslot->count changes from 10
to 11:

1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
and always passes 'daddr' to udp_ehashfn().

2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
INADDR_ANY instead of some specific addr.

That means when hslot->count changes from 10 to 11, the hash calculated by
udp_ehashfn() is also changed, and the udp packets belonging to a same
flow will be hashed to different socket.

This is easily reproduced:
1) Create 10 udp sockets and bind all of them to 0.0.0.0:4.
2) From the same host send udp packets to 127.0.0.1:4, record the
socket index which receives the packets.
3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
is 4 + UDP_HASH_SIZE(4096), this makes the new socket put into the
same hslot as the aformentioned 10 sockets, and makes the hslot->count
change from 10 to 11.
4) From the same host send udp packets to 127.0.0.1:4, and the socket
index which receives the packets will be different from the one received
in step 2.
This should not happen as the socket bound to 0.0.0.0:44096 should not
change the behavior of the sockets bound to 0.0.0.0:4.

The fix here is that when searching udp_table->hash, if the socket
supports reuseport, pass inet_sk(sk)->inet_rcv_saddr to udp_ehashfn()
instead of daddr. When the sockets are bound to some specific addr,
inet_sk(sk)->inet_rcv_saddr should equal to daddr, and when the sockets
are bould to INADDR_ANY, this will pass INADDR_ANY to udp_ehashfn() as
what is done when searching udp_table->hash2.

It's the same case for IPv6, and this patch also fixes that.

Signed-off-by: Su, Xuemin 
---
The patch v1 does not fix the code in IPv6. Thank Eric Dumazet for
pointing that.
And I use this tree to generate this patch, hope it's correct:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

 net/ipv4/udp.c | 4 +++-
 net/ipv6/udp.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d56c055..57c38f6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -577,7 +577,9 @@ begin:
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
-   hash = udp_ehashfn(net, daddr, hnum,
+   hash = udp_ehashfn(net,
+  inet_sk(sk)->inet_rcv_saddr,
+  hnum,
   saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2da1896..41ca493 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -290,7 +290,9 @@ begin:
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
-   hash = udp6_ehashfn(net, daddr, hnum,
+   hash = udp6_ehashfn(net,
+   >sk_v6_rcv_saddr,
+   hnum,
saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
-- 
1.8.3.1




[PATCH v2] udp reuseport: fix packet of same flow hashed to different socket

2016-06-07 Thread Su Xuemin
From: "Su, Xuemin" 

There is a corner case in which udp packets belonging to a same
flow are hashed to different socket when hslot->count changes from 10
to 11:

1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
and always passes 'daddr' to udp_ehashfn().

2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
INADDR_ANY instead of some specific addr.

That means when hslot->count changes from 10 to 11, the hash calculated by
udp_ehashfn() is also changed, and the udp packets belonging to a same
flow will be hashed to different socket.

This is easily reproduced:
1) Create 10 udp sockets and bind all of them to 0.0.0.0:4.
2) From the same host send udp packets to 127.0.0.1:4, record the
socket index which receives the packets.
3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
is 4 + UDP_HASH_SIZE(4096), this makes the new socket put into the
same hslot as the aformentioned 10 sockets, and makes the hslot->count
change from 10 to 11.
4) From the same host send udp packets to 127.0.0.1:4, and the socket
index which receives the packets will be different from the one received
in step 2.
This should not happen as the socket bound to 0.0.0.0:44096 should not
change the behavior of the sockets bound to 0.0.0.0:4.

The fix here is that when searching udp_table->hash, if the socket
supports reuseport, pass inet_sk(sk)->inet_rcv_saddr to udp_ehashfn()
instead of daddr. When the sockets are bound to some specific addr,
inet_sk(sk)->inet_rcv_saddr should equal to daddr, and when the sockets
are bould to INADDR_ANY, this will pass INADDR_ANY to udp_ehashfn() as
what is done when searching udp_table->hash2.

It's the same case for IPv6, and this patch also fixes that.

Signed-off-by: Su, Xuemin 
---
The patch v1 does not fix the code in IPv6. Thank Eric Dumazet for
pointing that.
And I use this tree to generate this patch, hope it's correct:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

 net/ipv4/udp.c | 4 +++-
 net/ipv6/udp.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d56c055..57c38f6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -577,7 +577,9 @@ begin:
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
-   hash = udp_ehashfn(net, daddr, hnum,
+   hash = udp_ehashfn(net,
+  inet_sk(sk)->inet_rcv_saddr,
+  hnum,
   saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2da1896..41ca493 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -290,7 +290,9 @@ begin:
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
-   hash = udp6_ehashfn(net, daddr, hnum,
+   hash = udp6_ehashfn(net,
+   >sk_v6_rcv_saddr,
+   hnum,
saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
-- 
1.8.3.1




Re: [alsa-devel] [PATCH v2 6/9] ASoC: mediatek: add mt2701 platform driver implementation.

2016-06-07 Thread Garlic Tseng
On Tue, 2016-06-07 at 17:31 +0100, Mark Brown wrote:
> On Fri, Jun 03, 2016 at 12:56:21PM +0800, Garlic Tseng wrote:
> 
> > +   if (val < 0 || val > MT2701_I2S_NUM) {
> > +   dev_err(afe->dev, "%s, num not available, num %d, val %d\n",
> > +   __func__, num, val);
> > +   return -1;
> 
> Real error codes please.

OK I'll fix it.

> 
> > +static const struct snd_kcontrol_new mt2701_afe_multi_ch_out_asrc3[] = {
> > +   SOC_DAPM_SINGLE_AUTODISABLE("Multi ch asrc out3", PWR2_TOP_CON, 7, 1,
> > +   1),
> > +};
> 
> On/off controls should end in Switch.

Do you means that the name should end in Switch? Something like "Multi
ch asrc out3 Switch" (or maybe a shorter one)

I'll fix it (if I don't misunderstand the comment)

Thanks!

> ___
> Alsa-devel mailing list
> alsa-de...@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel




Re: [alsa-devel] [PATCH v2 6/9] ASoC: mediatek: add mt2701 platform driver implementation.

2016-06-07 Thread Garlic Tseng
On Tue, 2016-06-07 at 17:31 +0100, Mark Brown wrote:
> On Fri, Jun 03, 2016 at 12:56:21PM +0800, Garlic Tseng wrote:
> 
> > +   if (val < 0 || val > MT2701_I2S_NUM) {
> > +   dev_err(afe->dev, "%s, num not available, num %d, val %d\n",
> > +   __func__, num, val);
> > +   return -1;
> 
> Real error codes please.

OK I'll fix it.

> 
> > +static const struct snd_kcontrol_new mt2701_afe_multi_ch_out_asrc3[] = {
> > +   SOC_DAPM_SINGLE_AUTODISABLE("Multi ch asrc out3", PWR2_TOP_CON, 7, 1,
> > +   1),
> > +};
> 
> On/off controls should end in Switch.

Do you means that the name should end in Switch? Something like "Multi
ch asrc out3 Switch" (or maybe a shorter one)

I'll fix it (if I don't misunderstand the comment)

Thanks!

> ___
> Alsa-devel mailing list
> alsa-de...@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel




Re: [PATCH] dmaengine: xilinx_vdma: Use dma_pool_zalloc

2016-06-07 Thread Vinod Koul
On Wed, Jun 08, 2016 at 12:48:38AM +0530, Amitoj Kaur Chawla wrote:
> Dma_pool_zalloc combines dma_pool_alloc and memset 0.
> 
> The Coccinelle semantic patch used to make this change is as follows:
> @@
> type T;
> T *d;
> expression e;
> statement S;
> @@
> 
> d =
> -dma_pool_alloc
> +dma_pool_zalloc
>  (...);
> if (!d) S
> -   memset(d, 0, sizeof(T));

Thanks for your patch, but I have already applied a similar patch fixing
this.

-- 
~Vinod


Re: [PATCH] dmaengine: xilinx_vdma: Use dma_pool_zalloc

2016-06-07 Thread Vinod Koul
On Wed, Jun 08, 2016 at 12:48:38AM +0530, Amitoj Kaur Chawla wrote:
> Dma_pool_zalloc combines dma_pool_alloc and memset 0.
> 
> The Coccinelle semantic patch used to make this change is as follows:
> @@
> type T;
> T *d;
> expression e;
> statement S;
> @@
> 
> d =
> -dma_pool_alloc
> +dma_pool_zalloc
>  (...);
> if (!d) S
> -   memset(d, 0, sizeof(T));

Thanks for your patch, but I have already applied a similar patch fixing
this.

-- 
~Vinod


[PATCH 3/5] cputime: allow irq time accounting to be selected as an option

2016-06-07 Thread riel
From: Rik van Riel 

Allow CONFIG_IRQ_TIME_ACCOUNTING to be selected as an option, on top
of CONFIG_VIRT_CPU_ACCOUNTING_GEN (and potentially others?).

This allows for the irq time accounting code to be used with nohz_idle
CPUs, which is how several distributions ship their kernels. Using the
same code for several timer modes also allows us to drop duplicate code.

Signed-off-by: Rik van Riel 
---
 init/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..4c7ee4f136cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -375,9 +375,11 @@ config VIRT_CPU_ACCOUNTING_GEN
 
  If unsure, say N.
 
+endchoice
+
 config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
-   depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
+   depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
help
  Select this option to enable fine granularity task irq time
  accounting. This is done by reading a timestamp on each
@@ -386,8 +388,6 @@ config IRQ_TIME_ACCOUNTING
 
  If in doubt, say N here.
 
-endchoice
-
 config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
depends on MULTIUSER
-- 
2.5.5



[PATCH 3/5] cputime: allow irq time accounting to be selected as an option

2016-06-07 Thread riel
From: Rik van Riel 

Allow CONFIG_IRQ_TIME_ACCOUNTING to be selected as an option, on top
of CONFIG_VIRT_CPU_ACCOUNTING_GEN (and potentially others?).

This allows for the irq time accounting code to be used with nohz_idle
CPUs, which is how several distributions ship their kernels. Using the
same code for several timer modes also allows us to drop duplicate code.

Signed-off-by: Rik van Riel 
---
 init/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..4c7ee4f136cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -375,9 +375,11 @@ config VIRT_CPU_ACCOUNTING_GEN
 
  If unsure, say N.
 
+endchoice
+
 config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
-   depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
+   depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
help
  Select this option to enable fine granularity task irq time
  accounting. This is done by reading a timestamp on each
@@ -386,8 +388,6 @@ config IRQ_TIME_ACCOUNTING
 
  If in doubt, say N here.
 
-endchoice
-
 config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
depends on MULTIUSER
-- 
2.5.5



[PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq

2016-06-07 Thread riel
From: Rik van Riel 

Drop local_irq_save/restore from irqtime_account_irq.
Instead, have softirq and hardirq track their time spent
independently, with the softirq code subtracting hardirq
time that happened during the duration of the softirq run.

The softirq code can be interrupted by hardirq code at
any point in time, but it can check whether it got a
consistent snapshot of the timekeeping variables it wants,
and loop around in the unlikely case that it did not.

Signed-off-by: Rik van Riel 
---
 kernel/sched/cputime.c | 54 --
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e009077aeab6..466aff107f73 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -26,7 +26,9 @@
 DEFINE_PER_CPU(u64, cpu_hardirq_time);
 DEFINE_PER_CPU(u64, cpu_softirq_time);
 
-static DEFINE_PER_CPU(u64, irq_start_time);
+static DEFINE_PER_CPU(u64, hardirq_start_time);
+static DEFINE_PER_CPU(u64, softirq_start_time);
+static DEFINE_PER_CPU(u64, prev_hardirq_time);
 static int sched_clock_irqtime;
 
 void enable_sched_clock_irqtime(void)
@@ -53,36 +55,66 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
  * softirq -> hardirq, hardirq -> softirq
  *
  * When exiting hardirq or softirq time, account the elapsed time.
+ *
+ * When exiting softirq time, subtract the amount of hardirq time that
+ * interrupted this softirq run, to avoid double accounting of that time.
  */
 void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
-   unsigned long flags;
-   s64 delta;
+   u64 prev_softirq_start;
+   u64 prev_hardirq;
+   u64 hardirq_time;
+   s64 delta = 0;
int cpu;
 
if (!sched_clock_irqtime)
return;
 
-   local_irq_save(flags);
-
cpu = smp_processor_id();
-   delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-   __this_cpu_add(irq_start_time, delta);
+   prev_hardirq = __this_cpu_read(prev_hardirq_time);
+   prev_softirq_start = __this_cpu_read(softirq_start_time);
+   /*
+* Softirq context may get interrupted by hardirq context,
+* on the same CPU. At softirq 
+*/
+   if (irqtype == HARDIRQ_OFFSET) {
+   delta = sched_clock_cpu(cpu) - 
__this_cpu_read(hardirq_start_time);
+   __this_cpu_add(hardirq_start_time, delta);
+   } else do {
+   hardirq_time = READ_ONCE(per_cpu(cpu_hardirq_time, cpu));
+   u64 now = sched_clock_cpu(cpu);
+
+   delta = now - prev_softirq_start;
+   if (in_serving_softirq()) {
+   /*
+* Leaving softirq context. Avoid double counting by
+* subtracting hardirq time from this interval.
+*/
+   delta -= hardirq_time - prev_hardirq;
+   } else {
+   /* Entering softirq context. Note start times. */
+   __this_cpu_write(softirq_start_time, now);
+   __this_cpu_write(prev_hardirq_time, hardirq_time);
+   }
+   /*
+* If a hardirq happened during this calculation, it may not
+* have gotten a consistent snapshot. Try again.
+*/
+   } while (hardirq_time != READ_ONCE(per_cpu(cpu_hardirq_time, cpu)));
 
-   irq_time_write_begin();
/*
 * We do not account for softirq time from ksoftirqd here.
 * We want to continue accounting softirq time to ksoftirqd thread
 * in that case, so as not to confuse scheduler with a special task
 * that do not consume any time, but still wants to run.
 */
-   if (hardirq_count())
+   if (irqtype == HARDIRQ_OFFSET && hardirq_count())
__this_cpu_add(cpu_hardirq_time, delta);
-   else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+   else if (irqtype == SOFTIRQ_OFFSET && in_serving_softirq() &&
+   curr != this_cpu_ksoftirqd())
__this_cpu_add(cpu_softirq_time, delta);
 
irq_time_write_end();
-   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-- 
2.5.5



[PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq

2016-06-07 Thread riel
From: Rik van Riel 

Add an irq type parameter and documentation to irqtime_account_irq,
this can be used to distinguish between transitioning from process
context to hardirq time, and from process context to softirq time.

This is necessary to be able to remove the local_irq_disable from
irqtime_account_irq.

Signed-off-by: Rik van Riel 
---
 include/linux/hardirq.h | 20 ++--
 include/linux/vtime.h   | 12 ++--
 kernel/sched/cputime.c  |  9 -
 kernel/softirq.c|  6 +++---
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..1ebb31f56285 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,11 +32,11 @@ extern void rcu_nmi_exit(void);
  * always balanced, so the interrupted value of ->hardirq_context
  * will always be restored.
  */
-#define __irq_enter()  \
-   do {\
-   account_irq_enter_time(current);\
-   preempt_count_add(HARDIRQ_OFFSET);  \
-   trace_hardirq_enter();  \
+#define __irq_enter()  \
+   do {\
+   account_irq_enter_time(current, HARDIRQ_OFFSET);\
+   preempt_count_add(HARDIRQ_OFFSET);  \
+   trace_hardirq_enter();  \
} while (0)
 
 /*
@@ -47,11 +47,11 @@ extern void irq_enter(void);
 /*
  * Exit irq context without processing softirqs:
  */
-#define __irq_exit()   \
-   do {\
-   trace_hardirq_exit();   \
-   account_irq_exit_time(current); \
-   preempt_count_sub(HARDIRQ_OFFSET);  \
+#define __irq_exit()   \
+   do {\
+   trace_hardirq_exit();   \
+   account_irq_exit_time(current, HARDIRQ_OFFSET); \
+   preempt_count_sub(HARDIRQ_OFFSET);  \
} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 3b384bf5ce1a..58f036f3ebea 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -112,21 +112,21 @@ static inline void vtime_account_irq_enter(struct 
task_struct *tsk)
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, int irqtype);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, int irqtype) { 
}
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk, int irqtype)
 {
vtime_account_irq_enter(tsk);
-   irqtime_account_irq(tsk);
+   irqtime_account_irq(tsk, irqtype);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, int irqtype)
 {
vtime_account_irq_exit(tsk);
-   irqtime_account_irq(tsk);
+   irqtime_account_irq(tsk, irqtype);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 2f862dfdb520..e009077aeab6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,8 +46,15 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
+ *
+ * There are six possible transitions:
+ * process -> softirq, softirq -> process
+ * process -> hardirq, hardirq -> process
+ * softirq -> hardirq, hardirq -> softirq
+ *
+ * When exiting hardirq or softirq time, account the elapsed time.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
unsigned long flags;
s64 delta;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 17caf4b63342..a311c9622c86 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -245,7 +245,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
current->flags &= ~PF_MEMALLOC;
 
pending = local_softirq_pending();
-   account_irq_enter_time(current);
+   account_irq_enter_time(current, SOFTIRQ_OFFSET);
 
__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
in_hardirq = lockdep_softirq_start();
@@ -295,7 +295,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
}
 
lockdep_softirq_end(in_hardirq);
-   

[PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq

2016-06-07 Thread riel
From: Rik van Riel 

Drop local_irq_save/restore from irqtime_account_irq.
Instead, have softirq and hardirq track their time spent
independently, with the softirq code subtracting hardirq
time that happened during the duration of the softirq run.

The softirq code can be interrupted by hardirq code at
any point in time, but it can check whether it got a
consistent snapshot of the timekeeping variables it wants,
and loop around in the unlikely case that it did not.

Signed-off-by: Rik van Riel 
---
 kernel/sched/cputime.c | 54 --
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e009077aeab6..466aff107f73 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -26,7 +26,9 @@
 DEFINE_PER_CPU(u64, cpu_hardirq_time);
 DEFINE_PER_CPU(u64, cpu_softirq_time);
 
-static DEFINE_PER_CPU(u64, irq_start_time);
+static DEFINE_PER_CPU(u64, hardirq_start_time);
+static DEFINE_PER_CPU(u64, softirq_start_time);
+static DEFINE_PER_CPU(u64, prev_hardirq_time);
 static int sched_clock_irqtime;
 
 void enable_sched_clock_irqtime(void)
@@ -53,36 +55,66 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
  * softirq -> hardirq, hardirq -> softirq
  *
  * When exiting hardirq or softirq time, account the elapsed time.
+ *
+ * When exiting softirq time, subtract the amount of hardirq time that
+ * interrupted this softirq run, to avoid double accounting of that time.
  */
 void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
-   unsigned long flags;
-   s64 delta;
+   u64 prev_softirq_start;
+   u64 prev_hardirq;
+   u64 hardirq_time;
+   s64 delta = 0;
int cpu;
 
if (!sched_clock_irqtime)
return;
 
-   local_irq_save(flags);
-
cpu = smp_processor_id();
-   delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-   __this_cpu_add(irq_start_time, delta);
+   prev_hardirq = __this_cpu_read(prev_hardirq_time);
+   prev_softirq_start = __this_cpu_read(softirq_start_time);
+   /*
+* Softirq context may get interrupted by hardirq context,
+* on the same CPU. At softirq 
+*/
+   if (irqtype == HARDIRQ_OFFSET) {
+   delta = sched_clock_cpu(cpu) - 
__this_cpu_read(hardirq_start_time);
+   __this_cpu_add(hardirq_start_time, delta);
+   } else do {
+   hardirq_time = READ_ONCE(per_cpu(cpu_hardirq_time, cpu));
+   u64 now = sched_clock_cpu(cpu);
+
+   delta = now - prev_softirq_start;
+   if (in_serving_softirq()) {
+   /*
+* Leaving softirq context. Avoid double counting by
+* subtracting hardirq time from this interval.
+*/
+   delta -= hardirq_time - prev_hardirq;
+   } else {
+   /* Entering softirq context. Note start times. */
+   __this_cpu_write(softirq_start_time, now);
+   __this_cpu_write(prev_hardirq_time, hardirq_time);
+   }
+   /*
+* If a hardirq happened during this calculation, it may not
+* have gotten a consistent snapshot. Try again.
+*/
+   } while (hardirq_time != READ_ONCE(per_cpu(cpu_hardirq_time, cpu)));
 
-   irq_time_write_begin();
/*
 * We do not account for softirq time from ksoftirqd here.
 * We want to continue accounting softirq time to ksoftirqd thread
 * in that case, so as not to confuse scheduler with a special task
 * that do not consume any time, but still wants to run.
 */
-   if (hardirq_count())
+   if (irqtype == HARDIRQ_OFFSET && hardirq_count())
__this_cpu_add(cpu_hardirq_time, delta);
-   else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+   else if (irqtype == SOFTIRQ_OFFSET && in_serving_softirq() &&
+   curr != this_cpu_ksoftirqd())
__this_cpu_add(cpu_softirq_time, delta);
 
irq_time_write_end();
-   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-- 
2.5.5



[PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq

2016-06-07 Thread riel
From: Rik van Riel 

Add an irq type parameter and documentation to irqtime_account_irq,
this can be used to distinguish between transitioning from process
context to hardirq time, and from process context to softirq time.

This is necessary to be able to remove the local_irq_disable from
irqtime_account_irq.

Signed-off-by: Rik van Riel 
---
 include/linux/hardirq.h | 20 ++--
 include/linux/vtime.h   | 12 ++--
 kernel/sched/cputime.c  |  9 -
 kernel/softirq.c|  6 +++---
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..1ebb31f56285 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,11 +32,11 @@ extern void rcu_nmi_exit(void);
  * always balanced, so the interrupted value of ->hardirq_context
  * will always be restored.
  */
-#define __irq_enter()  \
-   do {\
-   account_irq_enter_time(current);\
-   preempt_count_add(HARDIRQ_OFFSET);  \
-   trace_hardirq_enter();  \
+#define __irq_enter()  \
+   do {\
+   account_irq_enter_time(current, HARDIRQ_OFFSET);\
+   preempt_count_add(HARDIRQ_OFFSET);  \
+   trace_hardirq_enter();  \
} while (0)
 
 /*
@@ -47,11 +47,11 @@ extern void irq_enter(void);
 /*
  * Exit irq context without processing softirqs:
  */
-#define __irq_exit()   \
-   do {\
-   trace_hardirq_exit();   \
-   account_irq_exit_time(current); \
-   preempt_count_sub(HARDIRQ_OFFSET);  \
+#define __irq_exit()   \
+   do {\
+   trace_hardirq_exit();   \
+   account_irq_exit_time(current, HARDIRQ_OFFSET); \
+   preempt_count_sub(HARDIRQ_OFFSET);  \
} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 3b384bf5ce1a..58f036f3ebea 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -112,21 +112,21 @@ static inline void vtime_account_irq_enter(struct 
task_struct *tsk)
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, int irqtype);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, int irqtype) { 
}
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk, int irqtype)
 {
vtime_account_irq_enter(tsk);
-   irqtime_account_irq(tsk);
+   irqtime_account_irq(tsk, irqtype);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, int irqtype)
 {
vtime_account_irq_exit(tsk);
-   irqtime_account_irq(tsk);
+   irqtime_account_irq(tsk, irqtype);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 2f862dfdb520..e009077aeab6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,8 +46,15 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
+ *
+ * There are six possible transitions:
+ * process -> softirq, softirq -> process
+ * process -> hardirq, hardirq -> process
+ * softirq -> hardirq, hardirq -> softirq
+ *
+ * When exiting hardirq or softirq time, account the elapsed time.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
unsigned long flags;
s64 delta;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 17caf4b63342..a311c9622c86 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -245,7 +245,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
current->flags &= ~PF_MEMALLOC;
 
pending = local_softirq_pending();
-   account_irq_enter_time(current);
+   account_irq_enter_time(current, SOFTIRQ_OFFSET);
 
__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
in_hardirq = lockdep_softirq_start();
@@ -295,7 +295,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
}
 
lockdep_softirq_end(in_hardirq);
-   account_irq_exit_time(current);
+   

Re: [PATCH] dmaengine: edma: Use early completion for intermediate paRAM set in slave_sg

2016-06-07 Thread Vinod Koul
On Tue, Jun 07, 2016 at 11:19:44AM +0300, Peter Ujfalusi wrote:
> The driver limits the physical number of paRAM slots to be used by channels.
> If the transfer needs more slots (more SGs) then the transfer is broken up
> to smaller chunks. When the chunk is finished the driver will rewrite the
> physical slots and continues the transfer. This set up time can take some
> time and we might miss DMA events. If the intermediate set completion is
> using early completion (the interrupt will happen when the last slot is
> issued to the TPTC and not when the transfer is finished by the TPTC) we
> will have a bit more time to update the paRAM slots and less likely to have
> missed events.

Applied, thanks

-- 
~Vinod


Re: [PATCH] dmaengine: edma: Use early completion for intermediate paRAM set in slave_sg

2016-06-07 Thread Vinod Koul
On Tue, Jun 07, 2016 at 11:19:44AM +0300, Peter Ujfalusi wrote:
> The driver limits the physical number of paRAM slots to be used by channels.
> If the transfer needs more slots (more SGs) then the transfer is broken up
> to smaller chunks. When the chunk is finished the driver will rewrite the
> physical slots and continues the transfer. This set up time can take some
> time and we might miss DMA events. If the intermediate set completion is
> using early completion (the interrupt will happen when the last slot is
> issued to the TPTC and not when the transfer is finished by the TPTC) we
> will have a bit more time to update the paRAM slots and less likely to have
> missed events.

Applied, thanks

-- 
~Vinod


Re: [PATCH 4.6 000/121] 4.6.2-stable review

2016-06-07 Thread Guenter Roeck

On 06/07/2016 06:09 PM, Greg Kroah-Hartman wrote:

On Tue, Jun 07, 2016 at 06:40:54AM -0700, Guenter Roeck wrote:

On 06/05/2016 02:42 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.6.2 release.
There are 121 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Tue Jun  7 21:43:16 UTC 2016.
Anything received after that time might be too late.



Build results:
total: 148 pass: 132 fail: 16
Failed builds:
mips:defconfig (binutils 2.22)
mips:allnoconfig
mips:defconfig (binutils 2.24)
mips:allmodconfig
mips:allnoconfig
mips:bcm47xx_defconfig (binutils 2.25)
mips:bcm63xx_defconfig
mips:nlm_xlp_defconfig
mips:ath79_defconfig
mips:ar7_defconfig
mips:e55_defconfig
mips:cavium_octeon_defconfig
mips:malta_defconfig
mips:defconfig
unicore32:defconfig
unicore32:allnoconfig

Qemu test results:
total: 107 pass: 98 fail: 9
Failed tests:
mips:malta_defconfig:nosmp
mips:malta_defconfig:smp
mips64:malta_defconfig:nosmp
mips64:malta_defconfig:smp
mipsel:malta_defconfig:nosmp
mipsel:malta_defconfig:smp
mips64el:malta_defconfig:nosmp
mips64el:malta_defconfig:smp
mips64el:fuloong2e_defconfig:fulong2e

Details are available at http://kerneltests.org/builders.


Thanks for these, I think I've fixed them all now.



I did a quick test; looks like your fix for mips is working
for 4.4, 4.5, and 4.6. I'll let you know if the complete build
shows a problem.

Guenter



Re: [PATCH 4.6 000/121] 4.6.2-stable review

2016-06-07 Thread Guenter Roeck

On 06/07/2016 06:09 PM, Greg Kroah-Hartman wrote:

On Tue, Jun 07, 2016 at 06:40:54AM -0700, Guenter Roeck wrote:

On 06/05/2016 02:42 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.6.2 release.
There are 121 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Tue Jun  7 21:43:16 UTC 2016.
Anything received after that time might be too late.



Build results:
total: 148 pass: 132 fail: 16
Failed builds:
mips:defconfig (binutils 2.22)
mips:allnoconfig
mips:defconfig (binutils 2.24)
mips:allmodconfig
mips:allnoconfig
mips:bcm47xx_defconfig (binutils 2.25)
mips:bcm63xx_defconfig
mips:nlm_xlp_defconfig
mips:ath79_defconfig
mips:ar7_defconfig
mips:e55_defconfig
mips:cavium_octeon_defconfig
mips:malta_defconfig
mips:defconfig
unicore32:defconfig
unicore32:allnoconfig

Qemu test results:
total: 107 pass: 98 fail: 9
Failed tests:
mips:malta_defconfig:nosmp
mips:malta_defconfig:smp
mips64:malta_defconfig:nosmp
mips64:malta_defconfig:smp
mipsel:malta_defconfig:nosmp
mipsel:malta_defconfig:smp
mips64el:malta_defconfig:nosmp
mips64el:malta_defconfig:smp
mips64el:fuloong2e_defconfig:fulong2e

Details are available at http://kerneltests.org/builders.


Thanks for these, I think I've fixed them all now.



I did a quick test; looks like your fix for mips is working
for 4.4, 4.5, and 4.6. I'll let you know if the complete build
shows a problem.

Guenter



[PATCH v5 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug

2016-06-07 Thread Wanpeng Li
From: Wanpeng Li 

Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU 
hotplug") 
set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario:

| steal is smaller than rq->prev_steal_time we end up with an insane large
| value which then gets added to rq->prev_steal_time, resulting in a permanent
| wreckage of the accounting.

However, it is still buggy.

rq->prev_steal_time = 0:

As Rik pointed out:

| setting rq->prev_irq_time to 0 in the guest, and then getting a giant value 
from 
| the host, could result in a very large of steal_jiffies.

rq->prev_steal_time_rq = 0:

| steal = paravirt_steal_clock(cpu_of(rq));
| steal -= rq->prev_steal_time_rq;
|
| if (unlikely(steal > delta))
|steal = delta;
|
| rq->prev_steal_time_rq += steal;
| delta -= steal;
|
| rq->clock_task += delta;

steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq 
grows in delta granularity, rq->clock_task can't ramp up until 
rq->prev_steal_time_rq 
catches up steal clock since delta value will be 0 after reducing steal time 
from 
normal execution time. That's why I obersved that cpuhg/1-12 continue running 
until rq->prev_steal_time_rq catches up steal clock timestamp.

I believe rq->prev_irq_time has similar issue. So this patch fix it by 
reverting  
commit e9532e69b8d1.

Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU 
hotplug")'
Acked-by: Rik van Riel 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v4 -> v5:
 * revert commit e9532e69b8d1 

 kernel/sched/core.c  |  1 -
 kernel/sched/sched.h | 13 -
 2 files changed, 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..7d45bb3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7213,7 +7213,6 @@ static void sched_rq_cpu_starting(unsigned int cpu)
struct rq *rq = cpu_rq(cpu);
 
rq->calc_load_update = calc_load_update;
-   account_reset_rq(rq);
update_max_interval();
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..de607e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1809,16 +1809,3 @@ static inline void cpufreq_trigger_update(u64 time) {}
 #else /* arch_scale_freq_capacity */
 #define arch_scale_freq_invariant()(false)
 #endif
-
-static inline void account_reset_rq(struct rq *rq)
-{
-#ifdef CONFIG_IRQ_TIME_ACCOUNTING
-   rq->prev_irq_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT
-   rq->prev_steal_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
-   rq->prev_steal_time_rq = 0;
-#endif
-}
-- 
1.9.1



[PATCH v5 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug

2016-06-07 Thread Wanpeng Li
From: Wanpeng Li 

Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU 
hotplug") 
set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario:

| steal is smaller than rq->prev_steal_time we end up with an insane large
| value which then gets added to rq->prev_steal_time, resulting in a permanent
| wreckage of the accounting.

However, it is still buggy.

rq->prev_steal_time = 0:

As Rik pointed out:

| setting rq->prev_irq_time to 0 in the guest, and then getting a giant value 
from 
| the host, could result in a very large of steal_jiffies.

rq->prev_steal_time_rq = 0:

| steal = paravirt_steal_clock(cpu_of(rq));
| steal -= rq->prev_steal_time_rq;
|
| if (unlikely(steal > delta))
|steal = delta;
|
| rq->prev_steal_time_rq += steal;
| delta -= steal;
|
| rq->clock_task += delta;

steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq 
grows in delta granularity, rq->clock_task can't ramp up until 
rq->prev_steal_time_rq 
catches up steal clock since delta value will be 0 after reducing steal time 
from 
normal execution time. That's why I obersved that cpuhg/1-12 continue running 
until rq->prev_steal_time_rq catches up steal clock timestamp.

I believe rq->prev_irq_time has similar issue. So this patch fix it by 
reverting  
commit e9532e69b8d1.

Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU 
hotplug")'
Acked-by: Rik van Riel 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v4 -> v5:
 * revert commit e9532e69b8d1 

 kernel/sched/core.c  |  1 -
 kernel/sched/sched.h | 13 -
 2 files changed, 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..7d45bb3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7213,7 +7213,6 @@ static void sched_rq_cpu_starting(unsigned int cpu)
struct rq *rq = cpu_rq(cpu);
 
rq->calc_load_update = calc_load_update;
-   account_reset_rq(rq);
update_max_interval();
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..de607e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1809,16 +1809,3 @@ static inline void cpufreq_trigger_update(u64 time) {}
 #else /* arch_scale_freq_capacity */
 #define arch_scale_freq_invariant()(false)
 #endif
-
-static inline void account_reset_rq(struct rq *rq)
-{
-#ifdef CONFIG_IRQ_TIME_ACCOUNTING
-   rq->prev_irq_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT
-   rq->prev_steal_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
-   rq->prev_steal_time_rq = 0;
-#endif
-}
-- 
1.9.1



[PATCH v5 1/3] KVM: fix steal clock warp during guest cpu hotplug

2016-06-07 Thread Wanpeng Li
From: Wanpeng Li 

Sometimes, after CPU hotplug you can observe a spike in stolen time
(100%) followed by the CPU being marked as 100% idle when it's actually
busy with a CPU hog task.  The trace looks like the following:

cpuhp/1-12[001] d.h1   167.461657: account_process_tick: steal = 
1291385514, prev_steal_time = 0
cpuhp/1-12[001] d.h1   167.461659: account_process_tick: steal_jiffies = 
1291
-0 [001] d.h1   167.462663: account_process_tick: steal = 18732255, 
prev_steal_time = 129100
-0 [001] d.h1   167.462664: account_process_tick: steal_jiffies = 
18446744072437

The sudden decrease of "steal" causes steal_jiffies to underflow.
The root cause is kvm_steal_time being reset to 0 after hot-plugging
back in a CPU.  Instead, the preexisting value can be used, which is
what the core scheduler code expects.

John Stultz also reported a similar issue after guest S3.

Suggested-by: Paolo Bonzini 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: John Stultz 
Signed-off-by: Wanpeng Li 
---
v4 -> v5:
 * improve commit message
v2 -> v3:
 * fix the root cause
v1 -> v2:
 * update patch subject, description and comments
 * deal with the case where steal time suddenly increases by a ludicrous amount

 arch/x86/kernel/kvm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f..1ef5e48 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
if (!has_steal_clock)
return;
 
-   memset(st, 0, sizeof(*st));
-
wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
pr_info("kvm-stealtime: cpu %d, msr %llx\n",
cpu, (unsigned long long) slow_virt_to_phys(st));
-- 
1.9.1



[PATCH v5 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting

2016-06-07 Thread Wanpeng Li
From: Wanpeng Li 

This patch adds guest steal-time support to full dynticks CPU
time accounting. After the following commit:

ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy 
granularity")

... time sampling became jiffy based, even if it's still listened
to ring boundaries, so steal_account_process_tick() is reused
to account how many 'ticks' are stolen-time, after the last accumulation.

Suggested-by: Rik van Riel 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v4 -> v5:
 * apply same logic to account_idle_time, so change get_vtime_delta instead
v3 -> v4:
 * fix grammar errors, thanks Ingo
 * cleanup fragile codes, thanks Ingo
v2 -> v3:
 * convert steal time jiffies to cputime
v1 -> v2:
 * fix divide zero bug, thanks Rik

 kernel/sched/cputime.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..b62f9f8 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
 }
 
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(void)
 {
 #ifdef CONFIG_PARAVIRT
if (static_key_false(_steal_enabled)) {
@@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
return steal_jiffies;
}
 #endif
-   return false;
+   return 0;
 }
 
 /*
@@ -681,12 +681,17 @@ static cputime_t vtime_delta(struct task_struct *tsk)
 static cputime_t get_vtime_delta(struct task_struct *tsk)
 {
unsigned long now = READ_ONCE(jiffies);
-   unsigned long delta = now - tsk->vtime_snap;
+   cputime_t delta_time, steal_time;
 
+   steal_time = jiffies_to_cputime(steal_account_process_tick());
+   delta_time = jiffies_to_cputime(now - tsk->vtime_snap);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap = now;
 
-   return jiffies_to_cputime(delta);
+   if (steal_time < delta_time)
+   delta_time -= steal_time;
+
+   return delta_time;
 }
 
 static void __vtime_account_system(struct task_struct *tsk)
-- 
1.9.1



[PATCH v5 1/3] KVM: fix steal clock warp during guest cpu hotplug

2016-06-07 Thread Wanpeng Li
From: Wanpeng Li 

Sometimes, after CPU hotplug you can observe a spike in stolen time
(100%) followed by the CPU being marked as 100% idle when it's actually
busy with a CPU hog task.  The trace looks like the following:

cpuhp/1-12[001] d.h1   167.461657: account_process_tick: steal = 
1291385514, prev_steal_time = 0
cpuhp/1-12[001] d.h1   167.461659: account_process_tick: steal_jiffies = 
1291
-0 [001] d.h1   167.462663: account_process_tick: steal = 18732255, 
prev_steal_time = 129100
-0 [001] d.h1   167.462664: account_process_tick: steal_jiffies = 
18446744072437

The sudden decrease of "steal" causes steal_jiffies to underflow.
The root cause is kvm_steal_time being reset to 0 after hot-plugging
back in a CPU.  Instead, the preexisting value can be used, which is
what the core scheduler code expects.

John Stultz also reported a similar issue after guest S3.

Suggested-by: Paolo Bonzini 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: John Stultz 
Signed-off-by: Wanpeng Li 
---
v4 -> v5:
 * improve commit message
v2 -> v3:
 * fix the root cause
v1 -> v2:
 * update patch subject, description and comments
 * deal with the case where steal time suddenly increases by a ludicrous amount

 arch/x86/kernel/kvm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f..1ef5e48 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
if (!has_steal_clock)
return;
 
-   memset(st, 0, sizeof(*st));
-
wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
pr_info("kvm-stealtime: cpu %d, msr %llx\n",
cpu, (unsigned long long) slow_virt_to_phys(st));
-- 
1.9.1



[PATCH v5 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting

2016-06-07 Thread Wanpeng Li
From: Wanpeng Li 

This patch adds guest steal-time support to full dynticks CPU
time accounting. After the following commit:

ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy 
granularity")

... time sampling became jiffy based, even if it's still listened
to ring boundaries, so steal_account_process_tick() is reused
to account how many 'ticks' are stolen-time, after the last accumulation.

Suggested-by: Rik van Riel 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Frederic Weisbecker 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v4 -> v5:
 * apply same logic to account_idle_time, so change get_vtime_delta instead
v3 -> v4:
 * fix grammar errors, thanks Ingo
 * cleanup fragile codes, thanks Ingo
v2 -> v3:
 * convert steal time jiffies to cputime
v1 -> v2:
 * fix divide zero bug, thanks Rik

 kernel/sched/cputime.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..b62f9f8 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
 }
 
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(void)
 {
 #ifdef CONFIG_PARAVIRT
if (static_key_false(_steal_enabled)) {
@@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
return steal_jiffies;
}
 #endif
-   return false;
+   return 0;
 }
 
 /*
@@ -681,12 +681,17 @@ static cputime_t vtime_delta(struct task_struct *tsk)
 static cputime_t get_vtime_delta(struct task_struct *tsk)
 {
unsigned long now = READ_ONCE(jiffies);
-   unsigned long delta = now - tsk->vtime_snap;
+   cputime_t delta_time, steal_time;
 
+   steal_time = jiffies_to_cputime(steal_account_process_tick());
+   delta_time = jiffies_to_cputime(now - tsk->vtime_snap);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap = now;
 
-   return jiffies_to_cputime(delta);
+   if (steal_time < delta_time)
+   delta_time -= steal_time;
+
+   return delta_time;
 }
 
 static void __vtime_account_system(struct task_struct *tsk)
-- 
1.9.1



RE: [PATCH v10 2/7] usb: mux: add generic code for dual role port mux

2016-06-07 Thread Jun Li
Hi,
> -Original Message-
> From: Felipe Balbi [mailto:felipe.ba...@linux.intel.com]
> Sent: Tuesday, June 07, 2016 11:05 PM
> To: Roger Quadros ; Lu Baolu ;
> Jun Li ; Peter Chen 
> Cc: Mathias Nyman ; Greg Kroah-Hartman
> ; Lee Jones ; Heikki
> Krogerus ; Liam Girdwood
> ; Mark Brown ; linux-
> u...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v10 2/7] usb: mux: add generic code for dual role port
> mux
> 
> 
> Hi,
> 
> Roger Quadros  writes:
> >> I might be able to find some time to implement a proof of concept
> >> which would allow your platforms to get dual-role with code we
> >> already have, but I need DWC3's OTG support which, I'm assuming, you
> >> already have :-)
> >>
> >> If you wanna try something offline, just ping me ;-) I'll be happy to
> >> help.
> >
> > What you are proposing is a dwc3 only solution. With the otg/dual-role
> > series we are trying to be generic as much as possible.
> 
> Well, if there is a need for that, sure. Take MUSB for instance. It makes
> use of nothing of the sorts, because it doesn't have to.
> 
> > Whether controller drivers want to use it or not is upto the driver
> > maintainers but we should at least ensure that user space ABI if any,
> > is consistent across different implementations.
> 
> Role decisions should not be exposed to userspace unless as debug feature
> (using e.g. DebugFS). That should be done either by the HW or within the
> kernel.

In many cases the role decision is made by usersapce, this also should be
covered.
This patchset also expose it to userspace but I think it isn't for debug:
/sys/bus/platform/devices/.../portmux.N/state

Li Jun
> 
> If we're discussing userspace ABI here, there's something very wrong with
> OTG/DRD layer design.
> 
> >>> How are you switching the port mux between host and peripheral? Only
> >>> by sysfs or do you have a GPIO for ID pin as well?
> >>
> >> depends. Some SoCs have GPIO-controller muxes while some just have
> >> mux's select signals (one for ID, one for VBUS) mapped on xHCI's
> >> address space.
> >>
> >>> What happens to the gadget controller when the port is muxed to the
> >>> host controller?  Is it stopped or it continues to run?
> >>
> >> it continues running, but that's pretty irrelevant for Intel's
> >> dual-role
> >
> > Isn't that unnecessary waste of power? Or you have firmware assisted
> > low power mode?
> 
> that's an implementation detail which brings nothing to this discussion,
> right? :-)
> 
> We can, certainly, put the other side to D3.
> 
> >> setup. We have an actual physical (inside the die, though) mux which
> >> muxes USB signals to XHCI (not DWC3's XHCI) or to a peripheral-only
> >> DWC3.
> >>
> >
> > Probably irrelevant for Intel's dual-role but many platforms that
> > share the port can't have device controller running when port is in
> > host mode and vice versa.
> 
> but that doesn't mean we need an entire new layer added to the kernel
> ;-)
> 
> DWC3 already gives us all the information necessary to make a decision on
> which role we should assume. Just consider your options. Here's how things
> would look like without any OTG/DRD layer:
> 
> -> DWC3 OTG IRQ
>  -> readl(OSTS);
>   -> if (OSTS & BIT(4))
>-> dwc3_host_exit(); __dwc3_gadget_start();
>   -> else
>-> __dwc3_gadget_stop(); dwc3_host_init();
> 
> Can you draw something similar for your proposed OTG/DRD layer?
> 
> I remember there were at least two schedule_work(). IIRC it looked
> something like below:
> 
> -> DWC3 OTG IRQ
>  -> readl(OSTS);
>   -> if (OSTS & BIT(4))
>-> otg_set_mode(PERIPHERAL);
> -> schedule_work();
>  -> otg_ops->stop_host();
>   -> usb_del_hcd();
>  -> otg_ops->start_peripheral();
>   -> usb_gadget_add_udc();
>   -> else
>-> otg_set_mode(HOST);
> -> schedule_work();
>  -> otg_ops->stop_peripheral();
>   -> usb_gadget_del_udc();
>  -> otg_ops->start_host();
>   -> usb_add_hcd();
> 
> I'm probably missing some steps there.
> 
> > So there has to be a central point of control where the respective
> > controllers are started/stopped.
> 
> some implementations might need this, yes. DWC3 and MUSB don't seem to be
> this type of system.
> 
> > That is the other point we are trying to address with the common
> > otg/dual-role code.
> >
> > Even in the TI dwc3 implementation we use dwc3's XHCI so I guess we
> > need to stop the host controller for device mode, right?
> 
> yes, see above. We already have that code.
> 
> > If so then who will deal with start/stop of the controllers then?
> 
> dwc3 itself.
> 
> --
> balbi


RE: [PATCH v10 2/7] usb: mux: add generic code for dual role port mux

2016-06-07 Thread Jun Li
Hi,
> -Original Message-
> From: Felipe Balbi [mailto:felipe.ba...@linux.intel.com]
> Sent: Tuesday, June 07, 2016 11:05 PM
> To: Roger Quadros ; Lu Baolu ;
> Jun Li ; Peter Chen 
> Cc: Mathias Nyman ; Greg Kroah-Hartman
> ; Lee Jones ; Heikki
> Krogerus ; Liam Girdwood
> ; Mark Brown ; linux-
> u...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v10 2/7] usb: mux: add generic code for dual role port
> mux
> 
> 
> Hi,
> 
> Roger Quadros  writes:
> >> I might be able to find some time to implement a proof of concept
> >> which would allow your platforms to get dual-role with code we
> >> already have, but I need DWC3's OTG support which, I'm assuming, you
> >> already have :-)
> >>
> >> If you wanna try something offline, just ping me ;-) I'll be happy to
> >> help.
> >
> > What you are proposing is a dwc3 only solution. With the otg/dual-role
> > series we are trying to be generic as much as possible.
> 
> Well, if there is a need for that, sure. Take MUSB for instance. It makes
> use of nothing of the sorts, because it doesn't have to.
> 
> > Whether controller drivers want to use it or not is upto the driver
> > maintainers but we should at least ensure that user space ABI if any,
> > is consistent across different implementations.
> 
> Role decisions should not be exposed to userspace unless as debug feature
> (using e.g. DebugFS). That should be done either by the HW or within the
> kernel.

In many cases the role decision is made by usersapce, this also should be
covered.
This patchset also expose it to userspace but I think it isn't for debug:
/sys/bus/platform/devices/.../portmux.N/state

Li Jun
> 
> If we're discussing userspace ABI here, there's something very wrong with
> OTG/DRD layer design.
> 
> >>> How are you switching the port mux between host and peripheral? Only
> >>> by sysfs or do you have a GPIO for ID pin as well?
> >>
> >> depends. Some SoCs have GPIO-controller muxes while some just have
> >> mux's select signals (one for ID, one for VBUS) mapped on xHCI's
> >> address space.
> >>
> >>> What happens to the gadget controller when the port is muxed to the
> >>> host controller?  Is it stopped or it continues to run?
> >>
> >> it continues running, but that's pretty irrelevant for Intel's
> >> dual-role
> >
> > Isn't that unnecessary waste of power? Or you have firmware assisted
> > low power mode?
> 
> that's an implementation detail which brings nothing to this discussion,
> right? :-)
> 
> We can, certainly, put the other side to D3.
> 
> >> setup. We have an actual physical (inside the die, though) mux which
> >> muxes USB signals to XHCI (not DWC3's XHCI) or to a peripheral-only
> >> DWC3.
> >>
> >
> > Probably irrelevant for Intel's dual-role but many platforms that
> > share the port can't have device controller running when port is in
> > host mode and vice versa.
> 
> but that doesn't mean we need an entire new layer added to the kernel
> ;-)
> 
> DWC3 already gives us all the information necessary to make a decision on
> which role we should assume. Just consider your options. Here's how things
> would look like without any OTG/DRD layer:
> 
> -> DWC3 OTG IRQ
>  -> readl(OSTS);
>   -> if (OSTS & BIT(4))
>-> dwc3_host_exit(); __dwc3_gadget_start();
>   -> else
>-> __dwc3_gadget_stop(); dwc3_host_init();
> 
> Can you draw something similar for your proposed OTG/DRD layer?
> 
> I remember there were at least two schedule_work(). IIRC it looked
> something like below:
> 
> -> DWC3 OTG IRQ
>  -> readl(OSTS);
>   -> if (OSTS & BIT(4))
>-> otg_set_mode(PERIPHERAL);
> -> schedule_work();
>  -> otg_ops->stop_host();
>   -> usb_del_hcd();
>  -> otg_ops->start_peripheral();
>   -> usb_gadget_add_udc();
>   -> else
>-> otg_set_mode(HOST);
> -> schedule_work();
>  -> otg_ops->stop_peripheral();
>   -> usb_gadget_del_udc();
>  -> otg_ops->start_host();
>   -> usb_add_hcd();
> 
> I'm probably missing some steps there.
> 
> > So there has to be a central point of control where the respective
> > controllers are started/stopped.
> 
> some implementations might need this, yes. DWC3 and MUSB don't seem to be
> this type of system.
> 
> > That is the other point we are trying to address with the common
> > otg/dual-role code.
> >
> > Even in the TI dwc3 implementation we use dwc3's XHCI so I guess we
> > need to stop the host controller for device mode, right?
> 
> yes, see above. We already have that code.
> 
> > If so then who will deal with start/stop of the controllers then?
> 
> dwc3 itself.
> 
> --
> balbi


[PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle

2016-06-07 Thread riel
This patch series seems to make irq time accounting work with
nohz_idle, by having it re-use the same strategy used for steal
time accounting in Wanpeng Li's patch.

It applies on top of an earlier version of Wanpeng Li's patch.

It gets rid of some code duplication, but needs a little bit more
work. Specifically, selecting CONFIG_IRQ_TIME_ACCOUNTING at the
same time as CONFIG_TICK_BASED_ACCOUNTING probably breaks :)

I am posting this because it works, and because I would like to
know what other changes I need to make at the same time.



[PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle

2016-06-07 Thread riel
This patch series seems to make irq time accounting work with
nohz_idle, by having it re-use the same strategy used for steal
time accounting in Wanpeng Li's patch.

It applies on top of an earlier version of Wanpeng Li's patch.

It gets rid of some code duplication, but needs a little bit more
work. Specifically, selecting CONFIG_IRQ_TIME_ACCOUNTING at the
same time as CONFIG_TICK_BASED_ACCOUNTING probably breaks :)

I am posting this because it works, and because I would like to
know what other changes I need to make at the same time.



  1   2   3   4   5   6   7   8   9   10   >