Re: [PATCH v4 2/5] irqchip, gicv3: Workaround for Cavium ThunderX erratum 23154

2015-09-07 Thread Suzuki K. Poulose

On 14/08/15 19:28, Robert Richter wrote:

From: Robert Richter 

This patch implements Cavium ThunderX erratum 23154.

The gicv3 of ThunderX requires a modified version for reading the IAR
status to ensure data synchronization. Since this is in the fast-path
and called with each interrupt, runtime patching is used using jump
label patching for smallest overhead (no-op). This is the same
technique as used for tracepoints.

v4:
  * simplify code to only use cpus_have_cap() in gicv3_enable_quirks()

v3:
  * fix erratum to be dependend from midr
  * use arm64 errata framework

v2:
  * implement code in a single asm() to keep instruction sequence
  * added comment to the code that explains the erratum
  * apply workaround also if running as guest, thus check MIDR

Signed-off-by: Robert Richter 
---
  arch/arm64/Kconfig  | 11 ++
  arch/arm64/include/asm/cpufeature.h |  3 ++-
  arch/arm64/include/asm/cputype.h| 18 +---
  arch/arm64/kernel/cpu_errata.c  |  9 
  drivers/irqchip/irq-gic-v3.c| 42 -
  5 files changed, 74 insertions(+), 9 deletions(-)



...


  };
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index c52f7ba205b4..4211c39b8744 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -107,7 +107,7 @@ static void gic_redist_wait_for_rwp(void)


...


+}
+
  static void __maybe_unused gic_write_pmr(u64 val)
  {
asm volatile("msr_s " __stringify(ICC_PMR_EL1) ", %0" : : "r" (val));
@@ -766,6 +798,12 @@ static const struct irq_domain_ops gic_irq_domain_ops = {
.free = gic_irq_domain_free,
  };

+static void gicv3_enable_quirks(void)
+{
+   if (cpus_have_cap(ARM64_WORKAROUND_CAVIUM_23154))
+   static_key_slow_inc(&is_cavium_thunderx);


May be you could use the enable() method added to struct arm64_cpu_capability
here to perform the above operation, added by James :

commit 1c0763037f1e1caef739e36e09c6d41ed7b61b2d
Author: James Morse 
Date:   Tue Jul 21 13:23:28 2015 +0100

arm64: kernel: Add cpufeature 'enable' callback



+}
+
  static int __init gic_of_init(struct device_node *node, struct device_node 
*parent)
  {
void __iomem *dist_base;
@@ -825,6 +863,8 @@ static int __init gic_of_init(struct device_node *node, 
struct device_node *pare
gic_data.nr_redist_regions = nr_redist_regions;
gic_data.redist_stride = redist_stride;

+   gicv3_enable_quirks();
+


than adding a hook here ?

Cheers
Suzuki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/8] Allow GFP_NOFS allocation to fail

2015-09-07 Thread Tetsuo Handa
Michal Hocko wrote:
> As the VM cannot do much about these requests we should face the reality
> and allow those allocations to fail. Johannes has already posted the
> patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
> but the discussion died pretty quickly.

Addition of __GFP_NOFAIL to some locations is accepted, but otherwise
this patchset seems to be stalled.

> With all the patches applied none of the 4 filesystems gets aborted
> transactions and RO remount (well xfs didn't need any special
> treatment). This is obviously not sufficient to claim that failing
> GFP_NOFS is OK now but I think it is a good start for the further
> discussion. I would be grateful if FS people could have a look at those
> patches.  I have simply used __GFP_NOFAIL in the critical paths. This
> might be not the best strategy but it sounds like a good first step.

I posted my comment at
https://osdn.jp/projects/tomoyo/lists/archive/users-en/2015-September/000630.html
 .

> The third patch allows GFP_NOFS to fail and I believe it should see much
> more testing coverage. It would be really great if it could sit in the
> mmotm tree for few release cycles so that we can catch more fallouts.

Guessing from responses to this patchset, sitting in the mmotm tree can
hardly acquire testing coverage. Also, FS is not the only location that
needs to be tested. If you really want to push "GFP_NOFS can fail" patch,
I think you need to make a lot of effort to encourage kernel developers to
test using mandatory fault injection.

> Thoughts? Opinions?

To me, fixing callers (adding __GFP_NORETRY to callers) in a step-by-step
fashion after adding proactive countermeasure sounds better than changing
the default behavior (implicitly applying __GFP_NORETRY inside).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V1] audit: add warning that an old auditd may be starved out by a new auditd

2015-09-07 Thread Richard Guy Briggs
Nothing prevents a new auditd starting up and replacing a valid
audit_pid when an old auditd is still running, effectively starving out
the old auditd since audit_pid no longer points to the old valid auditd.

There isn't an easy way to detect if an old auditd is still running on
the existing audit_pid other than attempting to send a message to see if
it fails.  If no message to auditd has been attempted since auditd died
unnaturally or got killed, audit_pid will still indicate it is alive.

Signed-off-by: Richard Guy Briggs 
---
Note: Would it be too bold to actually block the registration of a new
auditd if the netlink_getsockbyportid() call succeeded?  Would other
checks be appropriate?

 kernel/audit.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 18cdfe2..1fa1e0d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -872,6 +872,11 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (s.mask & AUDIT_STATUS_PID) {
int new_pid = s.pid;
 
+   if (audit_pid && new_pid &&
+   !IS_ERR(netlink_getsockbyportid(audit_sock, 
audit_nlk_portid)))
+   pr_warn("auditd replaced by new auditd before 
normal shutdown: "
+   "(old)audit_pid=%d (by)pid=%d 
new_pid=%d",
+   audit_pid, pid, new_pid);
if ((!new_pid) && (task_tgid_vnr(current) != audit_pid))
return -EACCES;
if (audit_enabled != AUDIT_OFF)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum 23144

2015-09-07 Thread Robert Richter
On 07.09.15 17:44:41, Marc Zyngier wrote:
> On 25/08/15 11:18, Ganapatrao Kulkarni wrote:
> > The patch below adds a workaround for gicv3 in a numa environment. It
> > is on top of Robert's recent gicv3 errata patch submission v4 and my
> > arm64 numa patches v5.
> > 
> > This implements a workaround for gicv3-its erratum 23144 on Cavium's
> > ThunderX dual-socket platforms, where LPI cannot be routed to a
> > redistributors present on a foreign node.
> > 
> > v2:
> > updatated as per Marc Zyngier's review comments.
> > 
> > Signed-off-by: Ganapatrao Kulkarni 
> > Signed-off-by: Robert Richter 
> > ---
> >  drivers/irqchip/irq-gic-v3-its.c | 53 
> > +---
> >  1 file changed, 44 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> > b/drivers/irqchip/irq-gic-v3-its.c
> > index 614a367..d3fe0a4 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -40,7 +40,8 @@
> >  #include "irqchip.h"
> >  
> >  #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING  (1ULL << 0)
> > -#define ITS_FLAGS_CAVIUM_THUNDERX  (1ULL << 1)
> > +#define ITS_WORKAROUND_CAVIUM_22375(1ULL << 1)
> > +#define ITS_WORKAROUND_CAVIUM_23144(1ULL << 2)
> 
> Please move this to Robert's series, as it doesn't make much sense to
> add a quirk flag just to modify it in the next patch. This will help
> declutter this patch.

I will merge the bits in and rebase and rework this one on top (we
will post this separately due to dependencies to other patch sets).

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum 23144

2015-09-07 Thread Marc Zyngier
On 25/08/15 11:18, Ganapatrao Kulkarni wrote:
> The patch below adds a workaround for gicv3 in a numa environment. It
> is on top of Robert's recent gicv3 errata patch submission v4 and my
> arm64 numa patches v5.
> 
> This implements a workaround for gicv3-its erratum 23144 on Cavium's
> ThunderX dual-socket platforms, where LPI cannot be routed to a
> redistributors present on a foreign node.
> 
> v2:
> updatated as per Marc Zyngier's review comments.
> 
> Signed-off-by: Ganapatrao Kulkarni 
> Signed-off-by: Robert Richter 
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 53 
> +---
>  1 file changed, 44 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 614a367..d3fe0a4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -40,7 +40,8 @@
>  #include "irqchip.h"
>  
>  #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING  (1ULL << 0)
> -#define ITS_FLAGS_CAVIUM_THUNDERX  (1ULL << 1)
> +#define ITS_WORKAROUND_CAVIUM_22375(1ULL << 1)
> +#define ITS_WORKAROUND_CAVIUM_23144(1ULL << 2)

Please move this to Robert's series, as it doesn't make much sense to
add a quirk flag just to modify it in the next patch. This will help
declutter this patch.

>  
>  #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING  (1 << 0)
>  
> @@ -73,6 +74,7 @@ struct its_node {
>   struct list_headits_device_list;
>   u64 flags;
>   u32 ite_size;
> + int numa_node;
>  };
>  
>  #define ITS_ITT_ALIGNSZ_256
> @@ -607,11 +609,20 @@ static void its_eoi_irq(struct irq_data *d)
>  static int its_set_affinity(struct irq_data *d, const struct cpumask 
> *mask_val,
>   bool force)
>  {
> - unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + unsigned int cpu;
> + const struct cpumask *cpu_mask = cpu_online_mask;
>   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>   struct its_collection *target_col;
>   u32 id = its_get_event_id(d);
>  
> + /* lpi cannot be routed to a redistributor that is on a foreign node */
> + if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144) {
> + cpu_mask = cpumask_of_node(its_dev->its->numa_node);
> + if (!cpumask_intersects(mask_val, cpu_mask))
> + return -EINVAL;
> + }
> +
> + cpu = cpumask_any_and(mask_val, cpu_mask);
>   if (cpu >= nr_cpu_ids)
>   return -EINVAL;
>  
> @@ -1338,9 +1349,14 @@ static void its_irq_domain_activate(struct irq_domain 
> *domain,
>  {
>   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>   u32 event = its_get_event_id(d);
> + const struct cpumask *cpu_mask = cpu_online_mask;
> +
> + /* get the cpu_mask of local node */
> + if (IS_ENABLED(CONFIG_NUMA))
> + cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>  
>   /* Bind the LPI to the first possible CPU */
> - its_dev->event_map.col_map[event] = cpumask_first(cpu_online_mask);
> + its_dev->event_map.col_map[event] = cpumask_first(cpu_mask);
>  
>   /* Map the GIC IRQ and event to the device */
>   its_send_mapvi(its_dev, d->hwirq, event);
> @@ -1423,11 +1439,19 @@ static int its_force_quiescent(void __iomem *base)
>   }
>  }
>  
> -static void its_enable_cavium_thunderx(void *data)
> +static void its_enable_cavium_thunderx_22375(void *data)
>  {
> struct its_node *its = data;
>  
> -   its->flags |= ITS_FLAGS_CAVIUM_THUNDERX;
> + its->flags |= ITS_WORKAROUND_CAVIUM_22375;
> +}
> +
> +static void its_enable_cavium_thunderx_23144(void *data)
> +{
> + struct its_node *its = data;
> +
> + if (num_possible_nodes() > 1)
> + its->flags |= ITS_WORKAROUND_CAVIUM_23144;
>  }
>  
>  static const struct gic_capabilities its_errata[] = {
> @@ -1435,10 +1459,16 @@ static const struct gic_capabilities its_errata[] = {
> .desc   = "ITS: Cavium errata 22375, 24313",
> .iidr   = 0xa100034c,   /* ThunderX pass 1.x */
> .mask   = 0x0fff,
> -   .init   = its_enable_cavium_thunderx,
> -   },
> -   {
> -   }
> + .init   = its_enable_cavium_thunderx_22375,
> + },
> + {
> + .desc   = "ITS: Cavium errata 23144",
> + .iidr   = 0xa100034c,   /* ThunderX pass 1.x */
> + .mask   = 0x0fff,
> + .init   = its_enable_cavium_thunderx_23144,
> + },
> + {
> + }
>  };
>  
>  static void its_enable_quirks(struct its_node *its)
> @@ -1456,6 +1486,7 @@ static int its_probe(struct device_node *node, struct 
> irq_domain *parent)
>   u32 val;
>   u64 baser, tmp;
>   int err;
> + int numa_node;
>  
>   err = of_address_to_resource(node, 0, &res);
>   if (err) {
> @@ -1463,6 +14

Re: [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-09-07 Thread Stefano Stabellini
On Mon, 7 Sep 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> Cc: Wei Liu 
> 
> Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
> for the PV MMU code is kept in order to have at least one even if we
> ever decide to drop of code section.
> 
> Changes in v4:
> - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
> - Use the field lru in the page to get a list of pages when
> decreasing the memory reservation. It avoids to use a static
> array to store the pages (see v3).
> - Update comment for EXTENT_ORDER.
> 
> Changes in v3:
> - Fix errors reported by checkpatch.pl
> - s/mfn/gfn/ based on the new naming
> - Rather than splitting the page into 4KB chunk, use the
> extent_order field to allocate directly a Linux page size. This
> is avoid lots of code for no benefits.
> 
> Changes in v2:
> - Use xen_apply_to_page to split a page in 4K chunk
> - It's not necessary to have a smaller frame list. Re-use
> PAGE_SIZE
> - Convert reserve_additional_memory to use XEN_... macro
> ---
>  drivers/xen/balloon.c | 59 
> ++-
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index c79329f..3babf13 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,11 @@
>  #include 
>  #include 
>  
> +/* Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
>  /*
>   * balloon_process() state:
>   *
> @@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long 
> credit)
>   nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> +  * different page granularity.
> +  */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  /*
>   * add_memory() will build page tables for the new memory so
>   * the p2m must contain invalid entries so the correct
> @@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long 
> credit)
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>   int rc;
> - unsigned long  pfn, i;
> + unsigned long i;
>   struct page   *page;
>   struct xen_memory_reservation reservation = {
>   .address_bits = 0,
> - .extent_order = 0,
> + .extent_order = EXTENT_ORDER,
>   .domid= DOMID_SELF
>   };
>  
> @@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>   nr_pages = i;
>   break;
>   }
> - frame_list[i] = page_to_pfn(page);
> +
> + /* XENMEM_populate_physmap requires a PFN based on Xen
> +  * granularity.
> +  */
> + frame_list[i] = page_to_xen_pfn(page);
>   page = balloon_next_page(page);
>   }
>  
> @@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>   page = balloon_retrieve(false);
>   BUG_ON(page == NULL);
>  
> - pfn = page_to_pfn(page);
> -
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> +  * different page granularity.
> +  */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
> +
>   set_phys_to_machine(pfn, frame_list[i]);
>  
>   /* Link back into the page tables if not highmem. */
> @@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  {
>   enum bp_state state = BP_DONE;
> - unsigned long  pfn, i;
> - struct page   *page;
> + unsigned long i;
> + struct page *page, *tmp;
>   int ret;
>   struct xen_memory_reserv

Fwd: Use-after-free in page_cache_async_readahead

2015-09-07 Thread Andrey Konovalov
On Thu, Sep 3, 2015 at 1:49 PM, Andrey Konovalov  wrote:
> On Wed, Sep 2, 2015 at 9:40 PM, Tejun Heo  wrote:
>> Hello, Andrey.
>
> Hello Tejun,
>
>> On Wed, Sep 02, 2015 at 01:08:52PM +0200, Andrey Konovalov wrote:
>>> While running KASAN on 4.2 with Trinity I got the following report:
>>>
>>> ==
>>> BUG: KASan: use after free in page_cache_async_readahead+0x2cb/0x3f0
>>> at addr 880034bf6690
>>> Read of size 8 by task sshd/2571
>>> =
>>> BUG kmalloc-16 (Tainted: GW  ): kasan: bad access detected
>>> -
>>>
>>> Disabling lock debugging due to kernel taint
>>> INFO: Allocated in bdi_init+0x168/0x960 age=554826 cpu=0 pid=6
>>
>> Can you please verify that the following patch fixes the issue?
>
> I've hit this bug only twice during 24 hours of fuzzing, so there's no
> fast way to verify this.
> I'll be testing with your patch now, and I'll let you know if I hit
> the bug again.

Hello Tejun,

I haven't seen any reports while testing with your patch for the last
few days, so I think it's safe to say that your patch fixes the issue.

Thanks!

>
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/5] irqchip, gicv3: Updates and Cavium ThunderX errata workarounds

2015-09-07 Thread Marc Zyngier
Hi Robert,

On 14/08/15 19:28, Robert Richter wrote:
> From: Robert Richter 
> 
> This patch series adds gicv3 updates and workarounds for HW errata in
> Cavium's ThunderX GICV3.
> 
> The first one is an unchanged resubmission of a patch from a gicv3
> series I sent a while ago.
> 
> The next patches implement the workarounds for ThunderX's gicv3. Patch
> #2 implements the cpu workaround for gicv3 on ThunderX. Patch #3 is a
> prerequisit for patch #5. Patch #4 adds generic code to parse the hw
> revision provided by an IIDR. This patch is used for the implementa-
> tion of the actual gicv3-its workaround in #5.
> 
> All current review comments addressed so far with v4.

There has been a small number of comments on this series. Would you mind
respining it so that it could make it a a 4.3-rc?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: kernel: Use a separate stack for irq interrupts.

2015-09-07 Thread Jungseok Lee
On Sep 8, 2015, at 1:06 AM, James Morse wrote:
> On 07/09/15 16:48, Jungseok Lee wrote:
>> On Sep 7, 2015, at 11:36 PM, James Morse wrote:
>> 
>> Hi James,
>> 
>>> Having to handle interrupts on top of an existing kernel stack means the
>>> kernel stack must be large enough to accomodate both the maximum kernel
>>> usage, and the maximum irq handler usage. Switching to a different stack
>>> when processing irqs allows us to make the stack size smaller.
>>> 
>>> Maximum kernel stack usage (running ltp and generating usb+ethernet
>>> interrupts) was 7256 bytes. With this patch, the same workload gives
>>> a maximum stack usage of 5816 bytes.
>> 
>> I'd like to know how to measure the max stack depth.
>> AFAIK, a stack tracer on ftrace does not work well. Did you dump a stack
>> region and find or track down an untouched region? 
> 
> I enabled the 'Trace max stack' option under menuconfig 'Kernel Hacking' ->
> 'Tracers', then looked in debugfs:/tracing/stack_max_size.
> 
> What problems did you encounter?
> (I may be missing something…)

When I enabled the feature, all entries had *0* size except the last entry.
It can be reproduced easily as looking in debugs:/tracing/stack_trace.

You can track down my report and Akashi's changes with the following links:
- http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
- https://lkml.org/lkml/2015/7/13/29

Although it is impossible to measure an exact depth at this moment, the feature
could be utilized to check improvement.

Cc'ing Akashi for additional comments if needed.

Best Regards
Jungseok Lee--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 2/4] irqchip: GICv3: set non-percpu irqs status with _IRQ_MOVE_PCNTXT

2015-09-07 Thread Jiang Liu
On 2015/9/7 22:56, Marc Zyngier wrote:
> Hi Thomas,
> 
> On 07/09/15 14:24, Thomas Gleixner wrote:
>> On Mon, 7 Sep 2015, Marc Zyngier wrote:
>>> On 06/09/15 06:56, Jiang Liu wrote:
 On 2015/9/6 12:23, Yang Yingliang wrote:
> Use irq_settings_set_move_pcntxt() helper irqs status with
> _IRQ_MOVE_PCNTXT. So that it can do set affinity when calling
> irq_set_affinity_locked().
 Hi Yingliang,
We could only set _IRQ_MOVE_PCNTCT flag to enable migrating
 IRQ in process context if your hardware platform supports atomically
 change IRQ configuration. Not sure whether that's true for GICv3.
 If GICv3 doesn't support atomically change irq configuration, this
 change may cause trouble.
>>>
>>> I think it boils down to what exactly "process context" means here. If
>>> this means "we do not need to mask the interrupt" while moving it, then
>>> it should be fine (the GIC architecture guarantees that a pending
>>> interrupt will be migrated).
>>>
>>> Is there any other requirement for this flag?
>>
>> The history of this flag is as follows:
>>
>> On x86 interrupts can only be safely migrated while the interrupt is
>> handled.
> 
> Woa! That's creative! :-) I suppose this doesn't work very well with CPU
> hotplug though...
X86 has special handling of this case when hot-removing a CPU.
Basically, it does:
1) mask an irq
2) migrate irq to other cpus with set_affinity
3) redirect(retrigger) irq to other CPUs if it's pending on the CPU to
be removed.
Thanks!
Gerry

> 
>> With the introduction of IRQ remapping this requirement
>> changed. Remapped interrupts can be migrated in any context.
>>
>> If you look at irq_set_affinity_locked()
>>
>>if (irq_can_move_pcntxt(data) {
>>   irq_do_set_affinity(data,...)
>> chip->irq_set_affinity(data,...);
>>} else {
>>   irqd_set_move_pending(data);
>>}
>>
>> So if IRQ_MOVE_PCNTXT is not set, we handle the migration of the
>> interrupt from next the interrupt. If it's set set_affinity() is
>> called right away.
> 
> OK, that is now starting to make more sense.
> 
>> All architectures which do not select GENERIC_PENDING_IRQ are using
>> the direct method.
> 
> Right. On ARM, only the direct method makes sense so far (we have no
> constraint such as the one you describe above).
> 
> So I wonder why we bother introducing the IRQ_MOVE_PCNTXT flag on ARM at
> all. Is that just because migration.c is only compiled when
> GENERIC_PENDING_IRQ is set?
> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 5/5] irqchip, gicv3-its: Workaround for Cavium ThunderX errata 22375, 24313

2015-09-07 Thread Marc Zyngier
On 14/08/15 19:28, Robert Richter wrote:
> From: Robert Richter 
> 
> This implements two gicv3-its errata workarounds for ThunderX. Both
> with small impact affecting only ITS table allocation.
> 
>  erratum 22375: only alloc 8MB table size
>  erratum 24313: ignore memory access type
> 
> The fixes are in ITS initialization and basically ignore memory access
> type and table size provided by the TYPER and BASER registers.
> 
> v3:
>  * fix erratum to be dependend from iidr
> 
> Signed-off-by: Robert Richter 
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 35 +++
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 697421e834ee..30459df2ee2c 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -39,7 +39,8 @@
>  #include "irq-gic-common.h"
>  #include "irqchip.h"
>  
> -#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING(1 << 0)
> +#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING(1ULL << 0)
> +#define ITS_FLAGS_CAVIUM_THUNDERX(1ULL << 1)

I think you might need something slightly more explicit, as I'd expect
some ulterior revision of ThunderX to be eventually fixed...
ITS_FLAGS_THUNDERX_BOGUS_TYPER? Or something based on the errata numbers?

>  
>  #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING  (1 << 0)
>  
> @@ -803,9 +804,22 @@ static int its_alloc_tables(struct its_node *its)
>   int i;
>   int psz = SZ_64K;
>   u64 shr = GITS_BASER_InnerShareable;
> - u64 cache = GITS_BASER_WaWb;
> - u64 typer = readq_relaxed(its->base + GITS_TYPER);
> - u32 ids = GITS_TYPER_DEVBITS(typer);
> + u64 cache;
> + u64 typer;
> + u32 ids;
> +
> + if (its->flags & ITS_FLAGS_CAVIUM_THUNDERX) {
> + /*
> +  * erratum 22375: only alloc 8MB table size
> +  * erratum 24313: ignore memory access type
> +  */
> + cache   = 0;
> + ids = 0x13; /* 20 bits, 8MB */
> + } else {

You can move the typer definition here, as it is only used here.

> + cache   = GITS_BASER_WaWb;
> + typer   = readq_relaxed(its->base + GITS_TYPER);
> + ids = GITS_TYPER_DEVBITS(typer);
> + }
>  
>   for (i = 0; i < GITS_BASER_NR_REGS; i++) {
>   u64 val = readq_relaxed(its->base + GITS_BASER + i * 8);
> @@ -1391,8 +1405,21 @@ static int its_force_quiescent(void __iomem *base)
>   }
>  }
>  
> +static void its_enable_cavium_thunderx(void *data)
> +{
> + struct its_node *its = data;
> +
> + its->flags |= ITS_FLAGS_CAVIUM_THUNDERX;
> +}
> +
>  static const struct gic_capabilities its_errata[] = {
>   {
> + .desc   = "ITS: Cavium errata 22375, 24313",
> + .iidr   = 0xa100034c,   /* ThunderX pass 1.x */
> + .mask   = 0x0fff,
> + .init   = its_enable_cavium_thunderx,
> + },
> + {
>   }
>  };
>  
> 

Otherwise looks OK to me.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ASoC: atmel-classd: DT binding for Class D audio amplifier driver

2015-09-07 Thread Mark Brown
On Sun, Sep 06, 2015 at 05:44:30PM +0800, Wu, Songjun wrote:
> On 9/3/2015 19:43, Mark Brown wrote:

> >Why is this a separate DT node?  It seems that this IP is entirely self
> >contained so I'm not clear why we need a separate node for the card, the
> >card is usually a separate node because it ties together multiple
> >different devices in the system but that's not the case here.

> The classD can finish the audio function without other devices.
> But I want to reuse the code in ASoC, leave many things(like creating PCM,
> DMA operations) to ASoC, then the driver can only focus on how to configure
> classD.
> The classD IP is divided to tree parts logically, platform, CPU dai,
> and codec, and these parts are registered to ASoC.

> This separate DT node is needed in ASoC, ties these tree parts in ClassD.

Sure, there's no problem at all having that structure in software but it
should be possible to do this without having to represent this structure
in DT.  It should be possible to register the card at the same time as
the rest of the components rather than needing the separate device in
the DT.


signature.asc
Description: Digital signature


Re: [PATCH v4 4/5] irqchip, gicv3-its: Add HW revision detection and configuration

2015-09-07 Thread Marc Zyngier
Hi Robert,

On 14/08/15 19:28, Robert Richter wrote:
> From: Robert Richter 
> 
> Some GIC revisions require an individual configuration to esp. add
> workarounds for HW bugs. This patch implements generic code to parse
> the hw revision provided by an IIDR register value and runs specific
> code if hw matches. There are functions that read the IIDR registers
> for GICV3 and ITS (GICD_IIDR/GITS_IIDR) and then go through a list of
> init functions to be called for specific versions.
> 
> A MIDR register value may also be used, this is especially useful for
> hw detection from a guest.

I don't think this sentence is relevant anymore.

> 
> The patch is needed to implement workarounds for HW errata in Cavium's
> ThunderX GICV3.
> 
> v4:
>  * only enable hw detection for its in its_enable_quirks()
>  * removed gicv3_check_capabilities()
> 
> v3:
>  * use arm64 errata framework for midr check
> 
> v2:
>  * adding MIDR check
> 
> Signed-off-by: Robert Richter 
> ---
>  drivers/irqchip/irq-gic-common.c | 11 +++
>  drivers/irqchip/irq-gic-common.h |  9 +
>  drivers/irqchip/irq-gic-v3-its.c | 15 +++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-common.c 
> b/drivers/irqchip/irq-gic-common.c
> index 9448e391cb71..ee789b07f2d1 100644
> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -21,6 +21,17 @@
>  
>  #include "irq-gic-common.h"
>  
> +void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap,
> + void *data)

Let's call a duck a duck, and replace all occurrences of
capabilit{y,ies} with "quirk".

> +{
> + for (; cap->desc; cap++) {
> + if (cap->iidr != (cap->mask & iidr))
> + continue;
> + cap->init(data);
> + pr_info("%s\n", cap->desc);
> + }
> +}
> +
>  int gic_configure_irq(unsigned int irq, unsigned int type,
>  void __iomem *base, void (*sync_access)(void))
>  {
> diff --git a/drivers/irqchip/irq-gic-common.h 
> b/drivers/irqchip/irq-gic-common.h
> index 35a9884778bd..ca12635bbe3c 100644
> --- a/drivers/irqchip/irq-gic-common.h
> +++ b/drivers/irqchip/irq-gic-common.h
> @@ -20,10 +20,19 @@
>  #include 
>  #include 
>  
> +struct gic_capabilities {
> + const char *desc;
> + void (*init)(void *data);
> + u32 iidr;
> + u32 mask;
> +};
> +
>  int gic_configure_irq(unsigned int irq, unsigned int type,
> void __iomem *base, void (*sync_access)(void));
>  void gic_dist_config(void __iomem *base, int gic_irqs,
>void (*sync_access)(void));
>  void gic_cpu_config(void __iomem *base, void (*sync_access)(void));
> +void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap,
> + void *data);
>  
>  #endif /* _IRQ_GIC_COMMON_H */
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 06131db7a198..697421e834ee 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  
> +#include "irq-gic-common.h"
>  #include "irqchip.h"
>  
>  #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING(1 << 0)
> @@ -1390,6 +1391,18 @@ static int its_force_quiescent(void __iomem *base)
>   }
>  }
>  
> +static const struct gic_capabilities its_errata[] = {
> + {
> + }
> +};
> +
> +static void its_enable_quirks(struct its_node *its)
> +{
> + u32 iidr = readl_relaxed(its->base + GITS_IIDR);
> +
> + gic_check_capabilities(iidr, its_errata, its);
> +}
> +
>  static int its_probe(struct device_node *node, struct irq_domain *parent)
>  {
>   struct resource res;
> @@ -1448,6 +1461,8 @@ static int its_probe(struct device_node *node, struct 
> irq_domain *parent)
>   }
>   its->cmd_write = its->cmd_base;
>  
> + its_enable_quirks(its);
> +
>   err = its_alloc_tables(its);
>   if (err)
>   goto out_free_cmd;
> 

Otherwise looks good to me.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: dts: omap3-igep: Move eth IRQ pinmux to IGEPv2 common dtsi

2015-09-07 Thread Javier Martinez Canillas
Only the IGEPv2 boards have a LAN9221i chip connected to the GPMC
so the pinmux configuration for the GPIO connected to the IRQ line
of the LAN chip should not be defined in the IGEP common dtsi but
in the one common to the IGEPv2 boards.

While there, use the OMAP3_CORE1_IOPAD() macro for the padconf reg.

Suggested-by: Ladislav Michl 
Signed-off-by: Javier Martinez Canillas 

---

 arch/arm/boot/dts/omap3-igep.dtsi| 6 --
 arch/arm/boot/dts/omap3-igep0020-common.dtsi | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/boot/dts/omap3-igep.dtsi 
b/arch/arm/boot/dts/omap3-igep.dtsi
index d5e5cd449b16..2230e1c03320 100644
--- a/arch/arm/boot/dts/omap3-igep.dtsi
+++ b/arch/arm/boot/dts/omap3-igep.dtsi
@@ -78,12 +78,6 @@
>;
};
 
-   smsc9221_pins: pinmux_smsc9221_pins {
-   pinctrl-single,pins = <
-   0x1a2 (PIN_INPUT | MUX_MODE4)   /* 
mcspi1_cs2.gpio_176 */
-   >;
-   };
-
i2c1_pins: pinmux_i2c1_pins {
pinctrl-single,pins = <
0x18a (PIN_INPUT | MUX_MODE0)   /* i2c1_scl.i2c1_scl */
diff --git a/arch/arm/boot/dts/omap3-igep0020-common.dtsi 
b/arch/arm/boot/dts/omap3-igep0020-common.dtsi
index e458c2185e3c..5ad688c57a00 100644
--- a/arch/arm/boot/dts/omap3-igep0020-common.dtsi
+++ b/arch/arm/boot/dts/omap3-igep0020-common.dtsi
@@ -156,6 +156,12 @@
OMAP3_CORE1_IOPAD(0x217a, PIN_INPUT | MUX_MODE0)
/* uart2_rx.uart2_rx */
>;
};
+
+   smsc9221_pins: pinmux_smsc9221_pins {
+   pinctrl-single,pins = <
+   OMAP3_CORE1_IOPAD(0x21d2, PIN_INPUT | MUX_MODE4)
/* mcspi1_cs2.gpio_176 */
+   >;
+   };
 };
 
 &omap3_pmx_core2 {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] leds: leds-ipaq-micro: Fix coding style issues

2015-09-07 Thread Muhammad Falak R Wani
Spaces at the starting of a line are removed, indentation using
tab, instead of space. Also, line width of more than 80 characters
is also taken care of.
Two warnings are left alone to aid better readability.

Signed-off-by: Muhammad Falak R Wani 
---
 drivers/leds/leds-ipaq-micro.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/leds/leds-ipaq-micro.c b/drivers/leds/leds-ipaq-micro.c
index 1206215..fa262b6 100644
--- a/drivers/leds/leds-ipaq-micro.c
+++ b/drivers/leds/leds-ipaq-micro.c
@@ -16,9 +16,9 @@
 #define LED_YELLOW 0x00
 #define LED_GREEN  0x01
 
-#define LED_EN  (1 << 4)/* LED ON/OFF 0:off, 1:on  
 */
-#define LED_AUTOSTOP(1 << 5)/* LED ON/OFF auto stop set 0:disable, 
1:enable */
-#define LED_ALWAYS  (1 << 6)/* LED Interrupt Mask 0:No mask, 
1:mask */
+#define LED_EN   (1 << 4) /* LED ON/OFF 0:off, 1:on   
*/
+#define LED_AUTOSTOP (1 << 5) /* LED ON/OFF auto stop set 0:disable, 1:enable 
*/
+#define LED_ALWAYS   (1 << 6) /* LED Interrupt Mask 0:No mask, 1:mask 
*/
 
 static void micro_leds_brightness_set(struct led_classdev *led_cdev,
  enum led_brightness value)
@@ -79,14 +79,14 @@ static int micro_leds_blink_set(struct led_classdev 
*led_cdev,
};
 
msg.tx_data[0] = LED_GREEN;
-if (*delay_on > IPAQ_LED_MAX_DUTY ||
+   if (*delay_on > IPAQ_LED_MAX_DUTY ||
*delay_off > IPAQ_LED_MAX_DUTY)
-return -EINVAL;
+   return -EINVAL;
 
-if (*delay_on == 0 && *delay_off == 0) {
-*delay_on = 100;
-*delay_off = 100;
-}
+   if (*delay_on == 0 && *delay_off == 0) {
+   *delay_on = 100;
+   *delay_off = 100;
+   }
 
msg.tx_data[1] = 0;
if (*delay_on >= IPAQ_LED_MAX_DUTY)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] ASoC: atmel-classd: add the Audio Class D Amplifier code

2015-09-07 Thread Mark Brown
On Sun, Sep 06, 2015 at 05:44:21PM +0800, Wu, Songjun wrote:
> On 9/3/2015 19:37, Mark Brown wrote:
> >On Tue, Sep 01, 2015 at 01:41:40PM +0800, Songjun Wu wrote:

> >>+static const char * const eqcfg_bass_text[] = {
> >>+   "-12 dB", "-6 dB", "0 dB", "+6 dB", "+12 dB"
> >>+};
> >
> >>+static const unsigned int eqcfg_bass_value[] = {
> >>+   CLASSD_INTPMR_EQCFG_B_CUT_12,
> >>+   CLASSD_INTPMR_EQCFG_B_CUT_6, CLASSD_INTPMR_EQCFG_FLAT,
> >>+   CLASSD_INTPMR_EQCFG_B_BOOST_6, CLASSD_INTPMR_EQCFG_B_BOOST_12
> >>+};

> >This should be a Volume control with TLV information, as should the
> >following few controls.

> The Volume control with TLV information is not suitable for this case.
> Bass, Medium, and treble are mutually exclusive.
> So I think the SOC_ENUM control is suitable for this case.
> The register layout is not very good,
> The register is defined as below.
> •  EQCFG: Equalization Selection
> Value Name   Description
> 0 FLAT   Flat Response
> 1 BBOOST12   Bass boost +12 dB
> 2 BBOOST6Bass boost +6 dB
> 3 BCUT12 Bass cut -12 dB
> 4 BCUT6  Bass cut -6 dB
> 5 MBOOST3Medium boost +3 dB
> 6 MBOOST8Medium boost +8 dB
> 7 MCUT3  Medium cut -3 dB
> 8 MCUT8  Medium cut -8 dB
> 9 TBOOST12   Treble boost +12 dB
> 10TBOOST6Treble boost +6 dB
> 11TCUT12 Treble cut -12 dB
> 12TCUT6  Treble cut -6 dB

OK, so that's not actually what the code was doing - it had separate
enums for bass, mid and treble.  If you make this a single enum with all
the above options in it that seems like the best way of handling things.

> >>+static const struct snd_kcontrol_new atmel_classd_snd_controls[] = {
> >>+SOC_SINGLE_TLV("Left Volume", CLASSD_INTPMR,
> >>+   CLASSD_INTPMR_ATTL_SHIFT, 78, 1, classd_digital_tlv),
> >>+
> >>+SOC_SINGLE_TLV("Right Volume", CLASSD_INTPMR,
> >>+   CLASSD_INTPMR_ATTR_SHIFT, 78, 1, classd_digital_tlv),
> >
> >This should be a single stereo control rather than separate left and
> >right controls.

> Since the classD IP defines two register fields to control left volume and
> right volume respectively, I think it's better to provide two controls to
> user.

No, this is really common, we combine them in Linux to present a
consistent interface to userspace.

> >>+   dev_info(dev,
> >>+   "Atmel Class D Amplifier (CLASSD) device at 0x%p (irq %d)\n",
> >>+   io_base, dd->irq);

> >This is a bit noisy and not really based on interaction with the
> >hardware...  dev_dbg() seems better.

> This information will occur only once when linux kernel starts.
> It shows the classD is loaded to linux kernel.
> I think it's better to provide more information to user.

This stuff all adds up and since it'll go out on the console by default
it both makes things more noisy and slows down boot - printing on the
serial port isn't free.  If we want to have this sort of information we
printed we should really do it in the driver core so it appears
consistently for all devices rather than having individual code in each
driver.


signature.asc
Description: Digital signature


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-07 Thread Vincent Guittot
On 7 September 2015 at 17:37, Dietmar Eggemann  wrote:
> On 04/09/15 00:51, Steve Muckle wrote:
>> Hi Morten, Dietmar,
>>
>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>> ...
>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus 
>>> the
>>> + * recent utilization of currently non-runnable tasks on a CPU. It 
>>> represents
>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>
>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>> __update_load_avg(). If there is now an assumption that util_avg may be
>> used directly as a capacity value, should it be changed to
>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>> always be or if they can be combined.
>
> You're referring to the code line
>
> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> in __update_load_avg()?
>
> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
> load related.

I agree with Steve that there is an issue from a unit point of view

sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
load because of << SCHED_LOAD_SHIFT)

Before this patch , the translation from load to capacity unit was
done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"

So you still have to change the unit from load to capacity with a "/
SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.

sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;


Regards,
Vincent


>
> LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
> SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
> scale_load_down() are also NOPs so this area is probably
> worth a separate clean-up.
> Beyond that, I'm not sure if the current functionality is
> broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?
>
>>
>>> + * capacity_orig is the cpu_capacity available at * the highest frequency
>>
>> spurious *
>>
>> thanks,
>> Steve
>>
>
> Fixed.
>
> Thanks,
>
> -- Dietmar
>
> -- >8 --
>
> From: Dietmar Eggemann 
> Date: Fri, 14 Aug 2015 17:23:13 +0100
> Subject: [PATCH] sched/fair: Get rid of scaling utilization by capacity_orig
>
> Utilization is currently scaled by capacity_orig, but since we now have
> frequency and cpu invariant cfs_rq.avg.util_avg, frequency and cpu scaling
> now happens as part of the utilization tracking itself.
> So cfs_rq.avg.util_avg should no longer be scaled in cpu_util().
>
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Signed-off-by: Dietmar Eggemann 
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/fair.c | 38 ++
>  1 file changed, 22 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2074d45a67c2..a73ece2372f5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4824,33 +4824,39 @@ static int select_idle_sibling(struct task_struct *p, 
> int target)
>  done:
> return target;
>  }
> +
>  /*
>   * cpu_util returns the amount of capacity of a CPU that is used by CFS
>   * tasks. The unit of the return value must be the one of capacity so we can
>   * compare the utilization with the capacity of the CPU that is available for
>   * CFS task (ie cpu_capacity).
> - * cfs.avg.util_avg is the sum of running time of runnable tasks on a
> - * CPU. It represents the amount of utilization of a CPU in the range
> - * [0..SCHED_LOAD_SCALE]. The utilization of a CPU can't be higher than the
> - * full capacity of the CPU because it's about the running time on this CPU.
> - * Nevertheless, cfs.avg.util_avg can be higher than SCHED_LOAD_SCALE
> - * because of unfortunate rounding in util_avg or just
> - * after migrating tasks until the average stabilizes with the new running
> - * time. So we need to check that the utilization stays into the range
> - * [0..cpu_capacity_orig] and cap if necessary.
> - * Without capping the utilization, a group could be seen as overloaded (CPU0
> - * utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
> - * available capacity.
> + *
> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
> + * recent utilization of currently non-runnable tasks on a CPU. It represents
> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
> + * capacity_orig is the cpu_capacity available at the highest frequency
> + * (arch_scale_freq_capacity()).
> + * The utilization of a CPU converges towards a sum equal to or less than the
> + * current capacity (capacity_curr <= capacity_orig) of the CPU because it is
> + * the running time on this CPU scaled by capacity_curr.
> + *
> + * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
> + * higher than capacity_orig because of unfortunate rounding in
> + * cfs.avg.util_avg or just after migrating tasks and new 

[PATCH v2 1/9] [picked] powerpc: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
Allow it to be used from SPU, since it should not have unwanted
side-effects.

[ Picked-by: Michael Ellerman  ]

Signed-off-by: Mathieu Desnoyers 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h  | 1 +
 arch/powerpc/include/asm/unistd.h  | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 4d65499..126d0c4 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -369,3 +369,4 @@ SYSCALL_SPU(bpf)
 COMPAT_SYS(execveat)
 PPC64ONLY(switch_endian)
 SYSCALL_SPU(userfaultfd)
+SYSCALL_SPU(membarrier)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 4a055b6..13411be 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define __NR_syscalls  365
+#define __NR_syscalls  366
 
 #define __NR__exit __NR_exit
 #define NR_syscalls__NR_syscalls
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 6ad58d4..6337738 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -387,5 +387,6 @@
 #define __NR_execveat  362
 #define __NR_switch_endian 363
 #define __NR_userfaultfd   364
+#define __NR_membarrier365
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/9] sparc/sparc64: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
Signed-off-by: Mathieu Desnoyers 
Acked-by: "David S. Miller" 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: sparcli...@vger.kernel.org
---
 arch/sparc/include/uapi/asm/unistd.h | 3 ++-
 arch/sparc/kernel/systbls_32.S   | 2 +-
 arch/sparc/kernel/systbls_64.S   | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/uapi/asm/unistd.h 
b/arch/sparc/include/uapi/asm/unistd.h
index 6f35f4d..efe9479 100644
--- a/arch/sparc/include/uapi/asm/unistd.h
+++ b/arch/sparc/include/uapi/asm/unistd.h
@@ -416,8 +416,9 @@
 #define __NR_memfd_create  348
 #define __NR_bpf   349
 #define __NR_execveat  350
+#define __NR_membarrier351
 
-#define NR_syscalls351
+#define NR_syscalls352
 
 /* Bitmask values returned from kern_features system call.  */
 #define KERN_FEATURE_MIXED_MODE_STACK  0x0001
diff --git a/arch/sparc/kernel/systbls_32.S b/arch/sparc/kernel/systbls_32.S
index e31a905..cc23b62 100644
--- a/arch/sparc/kernel/systbls_32.S
+++ b/arch/sparc/kernel/systbls_32.S
@@ -87,4 +87,4 @@ sys_call_table:
 /*335*/.long sys_syncfs, sys_sendmmsg, sys_setns, 
sys_process_vm_readv, sys_process_vm_writev
 /*340*/.long sys_ni_syscall, sys_kcmp, sys_finit_module, 
sys_sched_setattr, sys_sched_getattr
 /*345*/.long sys_renameat2, sys_seccomp, sys_getrandom, 
sys_memfd_create, sys_bpf
-/*350*/.long sys_execveat
+/*350*/.long sys_execveat, sys_membarrier
diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S
index d72f76a..f229468 100644
--- a/arch/sparc/kernel/systbls_64.S
+++ b/arch/sparc/kernel/systbls_64.S
@@ -88,7 +88,7 @@ sys_call_table32:
.word sys_syncfs, compat_sys_sendmmsg, sys_setns, 
compat_sys_process_vm_readv, compat_sys_process_vm_writev
 /*340*/.word sys_kern_features, sys_kcmp, sys_finit_module, 
sys_sched_setattr, sys_sched_getattr
.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, 
sys_bpf
-/*350*/.word sys32_execveat
+/*350*/.word sys32_execveat, sys_membarrier
 
 #endif /* CONFIG_COMPAT */
 
@@ -168,4 +168,4 @@ sys_call_table:
.word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, 
sys_process_vm_writev
 /*340*/.word sys_kern_features, sys_kcmp, sys_finit_module, 
sys_sched_setattr, sys_sched_getattr
.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, 
sys_bpf
-/*350*/.word sys64_execveat
+/*350*/.word sys64_execveat, sys_membarrier
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2 5/9] alpha: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
[ Untested on this architecture. To try it out: fetch linux-next/akpm,
  apply this patch, build/run a membarrier-enabled kernel, and do make
  kselftest. ]

Signed-off-by: Mathieu Desnoyers 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: Richard Henderson 
CC: Ivan Kokshaysky 
CC: Matt Turner 
CC: linux-al...@vger.kernel.org
---
 arch/alpha/include/asm/unistd.h  | 2 +-
 arch/alpha/include/uapi/asm/unistd.h | 1 +
 arch/alpha/kernel/systbls.S  | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/asm/unistd.h b/arch/alpha/include/asm/unistd.h
index a56e608..07aa4ca 100644
--- a/arch/alpha/include/asm/unistd.h
+++ b/arch/alpha/include/asm/unistd.h
@@ -3,7 +3,7 @@
 
 #include 
 
-#define NR_SYSCALLS514
+#define NR_SYSCALLS515
 
 #define __ARCH_WANT_OLD_READDIR
 #define __ARCH_WANT_STAT64
diff --git a/arch/alpha/include/uapi/asm/unistd.h 
b/arch/alpha/include/uapi/asm/unistd.h
index aa33bf5..7725619 100644
--- a/arch/alpha/include/uapi/asm/unistd.h
+++ b/arch/alpha/include/uapi/asm/unistd.h
@@ -475,5 +475,6 @@
 #define __NR_getrandom 511
 #define __NR_memfd_create  512
 #define __NR_execveat  513
+#define __NR_membarrier514
 
 #endif /* _UAPI_ALPHA_UNISTD_H */
diff --git a/arch/alpha/kernel/systbls.S b/arch/alpha/kernel/systbls.S
index 9b62e3f..1ea64f4 100644
--- a/arch/alpha/kernel/systbls.S
+++ b/arch/alpha/kernel/systbls.S
@@ -532,6 +532,7 @@ sys_call_table:
.quad sys_getrandom
.quad sys_memfd_create
.quad sys_execveat
+   .quad sys_membarrier
 
.size sys_call_table, . - sys_call_table
.type sys_call_table, @object
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/9] parisc: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
Signed-off-by: Mathieu Desnoyers 
Tested-by: Helge Deller 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: "James E.J. Bottomley" 
CC: linux-par...@vger.kernel.org
---
 arch/parisc/include/uapi/asm/unistd.h | 3 ++-
 arch/parisc/kernel/syscall_table.S| 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/include/uapi/asm/unistd.h 
b/arch/parisc/include/uapi/asm/unistd.h
index 2e639d7..dadcada 100644
--- a/arch/parisc/include/uapi/asm/unistd.h
+++ b/arch/parisc/include/uapi/asm/unistd.h
@@ -358,8 +358,9 @@
 #define __NR_memfd_create  (__NR_Linux + 340)
 #define __NR_bpf   (__NR_Linux + 341)
 #define __NR_execveat  (__NR_Linux + 342)
+#define __NR_membarrier(__NR_Linux + 343)
 
-#define __NR_Linux_syscalls(__NR_execveat + 1)
+#define __NR_Linux_syscalls(__NR_membarrier + 1)
 
 
 #define __IGNORE_select/* newselect */
diff --git a/arch/parisc/kernel/syscall_table.S 
b/arch/parisc/kernel/syscall_table.S
index 8eefb12..4e77991 100644
--- a/arch/parisc/kernel/syscall_table.S
+++ b/arch/parisc/kernel/syscall_table.S
@@ -438,6 +438,7 @@
ENTRY_SAME(memfd_create)/* 340 */
ENTRY_SAME(bpf)
ENTRY_COMP(execveat)
+   ENTRY_SAME(membarrier)
 
 
 .ifne (. - 90b) - (__NR_Linux_syscalls * (91b - 90b))
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2 9/9] s390/s390x: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
[ Untested on this architecture. To try it out: fetch linux-next/akpm,
  apply this patch, build/run a membarrier-enabled kernel, and do make
  kselftest. ]

Signed-off-by: Mathieu Desnoyers 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: Martin Schwidefsky 
CC: Heiko Carstens 
CC: linux-s...@vger.kernel.org
---
 arch/s390/include/uapi/asm/unistd.h | 3 ++-
 arch/s390/kernel/syscalls.S | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/uapi/asm/unistd.h 
b/arch/s390/include/uapi/asm/unistd.h
index 59d2bb4..2f1de70 100644
--- a/arch/s390/include/uapi/asm/unistd.h
+++ b/arch/s390/include/uapi/asm/unistd.h
@@ -290,7 +290,8 @@
 #define __NR_s390_pci_mmio_write   352
 #define __NR_s390_pci_mmio_read353
 #define __NR_execveat  354
-#define NR_syscalls 355
+#define __NR_membarrier355
+#define NR_syscalls 356
 
 /* 
  * There are some system calls that are not present on 64 bit, some
diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
index f3f4a13..914c098 100644
--- a/arch/s390/kernel/syscalls.S
+++ b/arch/s390/kernel/syscalls.S
@@ -363,3 +363,4 @@ SYSCALL(sys_bpf,compat_sys_bpf)
 SYSCALL(sys_s390_pci_mmio_write,compat_sys_s390_pci_mmio_write)
 SYSCALL(sys_s390_pci_mmio_read,compat_sys_s390_pci_mmio_read)
 SYSCALL(sys_execveat,compat_sys_execveat)
+SYSCALL(sys_membarrier,sys_membarrier)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2 7/9] arm64: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
arm64 sys_membarrier number is already wired for arm64 through
asm-generic/unistd.h, but needs to be allocated separately for
the 32-bit compability layer of arm64.

[ Untested on this architecture. To try it out: fetch linux-next/akpm,
  apply this patch, build/run a membarrier-enabled kernel, and do make
  kselftest. ]

Signed-off-by: Mathieu Desnoyers 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: Catalin Marinas 
CC: Will Deacon 
---
 arch/arm64/include/asm/unistd.h   | 2 +-
 arch/arm64/include/asm/unistd32.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3bc498c..e70f7e7 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
 #define __ARM_NR_compat_cacheflush (__ARM_NR_COMPAT_BASE+2)
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE+5)
 
-#define __NR_compat_syscalls   388
+#define __NR_compat_syscalls   389
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index cef934a..d97be80 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -797,3 +797,5 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 387
 __SYSCALL(__NR_execveat, compat_sys_execveat)
+#define __NR_membarrier 388
+__SYSCALL(__NR_membarrier, sys_membarrier)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2 8/9] ia64: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
[ Untested on this architecture. To try it out: fetch linux-next/akpm,
  apply this patch, build/run a membarrier-enabled kernel, and do make
  kselftest. ]

Signed-off-by: Mathieu Desnoyers 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
---
 arch/ia64/include/asm/unistd.h  | 2 +-
 arch/ia64/include/uapi/asm/unistd.h | 1 +
 arch/ia64/kernel/entry.S| 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/unistd.h b/arch/ia64/include/asm/unistd.h
index 95c39b9..1d54e17 100644
--- a/arch/ia64/include/asm/unistd.h
+++ b/arch/ia64/include/asm/unistd.h
@@ -11,7 +11,7 @@
 
 
 
-#define NR_syscalls319 /* length of syscall table */
+#define NR_syscalls320 /* length of syscall table */
 
 /*
  * The following defines stop scripts/checksyscalls.sh from complaining about
diff --git a/arch/ia64/include/uapi/asm/unistd.h 
b/arch/ia64/include/uapi/asm/unistd.h
index 4610795..b7aae55 100644
--- a/arch/ia64/include/uapi/asm/unistd.h
+++ b/arch/ia64/include/uapi/asm/unistd.h
@@ -332,5 +332,6 @@
 #define __NR_memfd_create  1340
 #define __NR_bpf   1341
 #define __NR_execveat  1342
+#define __NR_membarrier1343
 
 #endif /* _UAPI_ASM_IA64_UNISTD_H */
diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index ae0de7b..1ce01f9 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -1768,5 +1768,6 @@ sys_call_table:
data8 sys_memfd_create  // 1340
data8 sys_bpf
data8 sys_execveat
+   data8 sys_membarrier
 
.org sys_call_table + 8*NR_syscalls // guard against failures to 
increase NR_syscalls
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2 6/9] arm: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
[ Untested on this architecture. To try it out: fetch linux-next/akpm,
  apply this patch, build/run a membarrier-enabled kernel, and do make
  kselftest. ]

Signed-off-by: Mathieu Desnoyers 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: Russell King 
---
 arch/arm/include/asm/unistd.h  | 2 +-
 arch/arm/include/uapi/asm/unistd.h | 1 +
 arch/arm/kernel/calls.S| 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 32640c4..d93876c 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -19,7 +19,7 @@
  * This may need to be greater than __NR_last_syscall+1 in order to
  * account for the padding in the syscall table
  */
-#define __NR_syscalls  (388)
+#define __NR_syscalls  (389)
 
 /*
  * *NOTE*: This is a ghost syscall private to the kernel.  Only the
diff --git a/arch/arm/include/uapi/asm/unistd.h 
b/arch/arm/include/uapi/asm/unistd.h
index 0c3f5a0..436bb32 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -414,6 +414,7 @@
 #define __NR_memfd_create  (__NR_SYSCALL_BASE+385)
 #define __NR_bpf   (__NR_SYSCALL_BASE+386)
 #define __NR_execveat  (__NR_SYSCALL_BASE+387)
+#define __NR_membarrier(__NR_SYSCALL_BASE+388)
 
 /*
  * The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 05745eb..310699c 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -397,6 +397,7 @@
 /* 385 */  CALL(sys_memfd_create)
CALL(sys_bpf)
CALL(sys_execveat)
+   CALL(sys_membarrier)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/9] Allocate sys_membarrier on main architectures

2015-09-07 Thread Mathieu Desnoyers
Following feedback from architecture maintainers, this is v2 of this
patchset. Status:

* Picked into maintainer's tree:
  - powerpc
* Ready to be picked into maintainer's tree (acked/tested):
  - mips, sparc/sparc64, parisc,
* Awaiting feedback/testing:
  - arm, arm64, ia64, s390/s390x

Thanks,

Mathieu

Mathieu Desnoyers (9):
  [picked] powerpc: allocate sys_membarrier system call number
  mips: allocate sys_membarrier system call number
  sparc/sparc64: allocate sys_membarrier system call number
  parisc: allocate sys_membarrier system call number
  alpha: allocate sys_membarrier system call number
  arm: allocate sys_membarrier system call number
  arm64: allocate sys_membarrier system call number
  ia64: allocate sys_membarrier system call number
  s390/s390x: allocate sys_membarrier system call number

 arch/alpha/include/asm/unistd.h|  2 +-
 arch/alpha/include/uapi/asm/unistd.h   |  1 +
 arch/alpha/kernel/systbls.S|  1 +
 arch/arm/include/asm/unistd.h  |  2 +-
 arch/arm/include/uapi/asm/unistd.h |  1 +
 arch/arm/kernel/calls.S|  1 +
 arch/arm64/include/asm/unistd.h|  2 +-
 arch/arm64/include/asm/unistd32.h  |  2 ++
 arch/ia64/include/asm/unistd.h |  2 +-
 arch/ia64/include/uapi/asm/unistd.h|  1 +
 arch/ia64/kernel/entry.S   |  1 +
 arch/mips/include/uapi/asm/unistd.h| 15 +--
 arch/mips/kernel/scall32-o32.S |  1 +
 arch/mips/kernel/scall64-64.S  |  1 +
 arch/mips/kernel/scall64-n32.S |  1 +
 arch/mips/kernel/scall64-o32.S |  1 +
 arch/parisc/include/uapi/asm/unistd.h  |  3 ++-
 arch/parisc/kernel/syscall_table.S |  1 +
 arch/powerpc/include/asm/systbl.h  |  1 +
 arch/powerpc/include/asm/unistd.h  |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h |  1 +
 arch/s390/include/uapi/asm/unistd.h|  3 ++-
 arch/s390/kernel/syscalls.S|  1 +
 arch/sparc/include/uapi/asm/unistd.h   |  3 ++-
 arch/sparc/kernel/systbls_32.S |  2 +-
 arch/sparc/kernel/systbls_64.S |  4 ++--
 26 files changed, 39 insertions(+), 17 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/9] mips: allocate sys_membarrier system call number

2015-09-07 Thread Mathieu Desnoyers
Signed-off-by: Mathieu Desnoyers 
Acked-by: Ralf Baechle 
CC: Andrew Morton 
CC: linux-...@vger.kernel.org
CC: linux-m...@linux-mips.org
---
 arch/mips/include/uapi/asm/unistd.h | 15 +--
 arch/mips/kernel/scall32-o32.S  |  1 +
 arch/mips/kernel/scall64-64.S   |  1 +
 arch/mips/kernel/scall64-n32.S  |  1 +
 arch/mips/kernel/scall64-o32.S  |  1 +
 5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/uapi/asm/unistd.h 
b/arch/mips/include/uapi/asm/unistd.h
index d0bdfaa..b107983 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -378,16 +378,17 @@
 #define __NR_bpf   (__NR_Linux + 355)
 #define __NR_execveat  (__NR_Linux + 356)
 #define __NR_mlock2(__NR_Linux + 357)
+#define __NR_membarrier(__NR_Linux + 358)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls357
+#define __NR_Linux_syscalls358
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux 4000
-#define __NR_O32_Linux_syscalls357
+#define __NR_O32_Linux_syscalls358
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -713,16 +714,17 @@
 #define __NR_bpf   (__NR_Linux + 315)
 #define __NR_execveat  (__NR_Linux + 316)
 #define __NR_mlock2(__NR_Linux + 317)
+#define __NR_membarrier(__NR_Linux + 318)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls317
+#define __NR_Linux_syscalls318
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux  5000
-#define __NR_64_Linux_syscalls 317
+#define __NR_64_Linux_syscalls 318
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -1052,15 +1054,16 @@
 #define __NR_bpf   (__NR_Linux + 319)
 #define __NR_execveat  (__NR_Linux + 320)
 #define __NR_mlock2(__NR_Linux + 321)
+#define __NR_membarrier(__NR_Linux + 322)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls321
+#define __NR_Linux_syscalls322
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux 6000
-#define __NR_N32_Linux_syscalls321
+#define __NR_N32_Linux_syscalls322
 
 #endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index b0b377a..9265542 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -600,3 +600,4 @@ EXPORT(sys_call_table)
PTR sys_bpf /* 4355 */
PTR sys_execveat
PTR sys_mlock2
+   PTR sys_membarrier
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index f12eb03..79d4fb0 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -437,4 +437,5 @@ EXPORT(sys_call_table)
PTR sys_bpf /* 5315 */
PTR sys_execveat
PTR sys_mlock2
+   PTR sys_membarrier
.size   sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index ecdd65a..235892a 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -430,4 +430,5 @@ EXPORT(sysn32_call_table)
PTR sys_bpf
PTR compat_sys_execveat /* 6320 */
PTR sys_mlock2
+   PTR sys_membarrier
.size   sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 7a8b2df..c051bd3 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -585,4 +585,5 @@ EXPORT(sys32_call_table)
PTR sys_bpf /* 4355 */
PTR compat_sys_execveat
PTR sys_mlock2
+   PTR sys_membarrier
.size   sys32_call_table,.-sys32_call_table
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio-blk: use VIRTIO_BLK_F_WCE and VIRTIO_BLK_F_CONFIG_WCE in virtio1

2015-09-07 Thread Paolo Bonzini


On 22/08/2015 00:53, Paolo Bonzini wrote:
> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
> (up to 2x, though more realistically +30-40%) in latency-bound workloads.
> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.
> 
> It will be restored in the next revision of the virtio 1.0 standard, so
> do the same in Linux.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  drivers/block/virtio_blk.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index d4d05f064d39..ea2c17c66dfb 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -478,8 +478,7 @@ static int virtblk_get_cache_mode(struct virtio_device 
> *vdev)
>  struct virtio_blk_config, wce,
>  &writeback);
>   if (err)
> - writeback = virtio_has_feature(vdev, VIRTIO_BLK_F_WCE) ||
> - virtio_has_feature(vdev, VIRTIO_F_VERSION_1);
> + writeback = virtio_has_feature(vdev, VIRTIO_BLK_F_WCE);
>  
>   return writeback;
>  }
> @@ -840,7 +839,7 @@ static unsigned int features_legacy[] = {
>  static unsigned int features[] = {
>   VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
>   VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
> - VIRTIO_BLK_F_TOPOLOGY,
> + VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
>   VIRTIO_BLK_F_MQ,
>  };
>  
> 

Ping?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: kernel: Use a separate stack for irq interrupts.

2015-09-07 Thread James Morse
On 07/09/15 16:48, Jungseok Lee wrote:
> On Sep 7, 2015, at 11:36 PM, James Morse wrote:
> 
> Hi James,
> 
>> Having to handle interrupts on top of an existing kernel stack means the
>> kernel stack must be large enough to accomodate both the maximum kernel
>> usage, and the maximum irq handler usage. Switching to a different stack
>> when processing irqs allows us to make the stack size smaller.
>>
>> Maximum kernel stack usage (running ltp and generating usb+ethernet
>> interrupts) was 7256 bytes. With this patch, the same workload gives
>> a maximum stack usage of 5816 bytes.
> 
> I'd like to know how to measure the max stack depth.
> AFAIK, a stack tracer on ftrace does not work well. Did you dump a stack
> region and find or track down an untouched region? 

I enabled the 'Trace max stack' option under menuconfig 'Kernel Hacking' ->
'Tracers', then looked in debugfs:/tracing/stack_max_size.

What problems did you encounter?
(I may be missing something...)


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] selftests: add membarrier syscall test

2015-09-07 Thread Mathieu Desnoyers
- On Sep 3, 2015, at 11:36 PM, Michael Ellerman m...@ellerman.id.au wrote:

> On Thu, 2015-09-03 at 15:47 +, Mathieu Desnoyers wrote:
>> - On Sep 3, 2015, at 5:33 AM, Michael Ellerman m...@ellerman.id.au wrote:
>> 
>> > On Tue, 2015-09-01 at 11:32 -0700, Andy Lutomirski wrote:
>> >> On Tue, Sep 1, 2015 at 10:11 AM, Mathieu Desnoyers
>> >>  wrote:
>> >> > Just to make sure I understand: should we expect that
>> >> > everyone will issue "make headers_install" on their system
>> >> > before doing a make kselftest ?
>> >> >
>> >> > I see that a few selftests (e.g. memfd) are adding the
>> >> > source tree include paths to the compiler include paths,
>> >> > which I guess is to ensure that the kselftest will
>> >> > work even if the system headers are not up to date.
>> >> 
>> >> It would be really nice if there were a clean way for selftests to
>> >> include the kernel headers.
>> > 
>> > What's wrong with make headers_install?
>> > 
>> > Or do you mean when writing the tests? That we could fix by adding the
>> > ../../../../usr/include path to CFLAGS in lib.mk. And fixing all the tests 
>> > that
>> > overwrite CFLAGS to append to CFLAGS.
>> > 
>> >> Perhaps make should build the exportable headers somewhere as a 
>> >> dependency of
>> >> kselftests.
>> > 
>> > Yeah the top-level kselftest target could do that I think.
>> > 
>> > Folks who don't want the headers installed can just run the selftests 
>> > Makefile
>> > directly.
>> > 
>> > Does this work for you?
>> > 
>> > diff --git a/Makefile b/Makefile
>> > index c361593..c8841d3 100644
>> > --- a/Makefile
>> > +++ b/Makefile
>> > @@ -1080,7 +1080,7 @@ headers_check: headers_install
>> > # Kernel selftest
>> > 
>> > PHONY += kselftest
>> > -kselftest:
>> > +kselftest: headers_install
>> >$(Q)$(MAKE) -C tools/testing/selftests run_tests
>> 
>> My personal experience is that make headers_install does not necessarily play
>> well with the distribution header file hierarchy, which requires some tweaks
>> to be done by the users (e.g. asm vs x86_64-linux-gnu).
> 
> OK, I've never had issues. What exactly are you doing and how is it going 
> wrong?

After some investigation, I noticed the following:

1) I first ran make headers_install as root, which installed the
headers within my build tree. I later tried it again as user, and
it failed due to permission issues (my bad). This is where I tried
to install it into my system rather than under my build directory,
which caused a mess.

2) Since make kselftest should be run as root (according to make
help), this means that all the output files generated by the build
are owned by root. It leads to permissions issues when trying to
rebuild the tests as user afterward. Perhaps we could introduce a
distinction between make kselftest_build and make kselftest_run ?
The former could be executed as user, and the latter as root.

> 
>> Also, headers_install typically expects a INSTALL_HDR_PATH.
> 
> You can specify it, but the default is just usr/, ie. in the kernel directory,
> that is what I was proposing. (Actually it's $(objtree)/usr).

OK, trying it out.

> 
>> It would be interesting if we could install the kernel headers into a
>> specific location that is then re-used by kselftest, so using it without too
>> much manual configuration does not require to overwrite the distribution
>> header files to run tests.
> 
> I think we can do that now, ie:
> 
>  $ ls /usr/include/linux/membarrier.h
>  ls: cannot access /usr/include/linux/membarrier.h: No such file or directory
> 
>  $ cd linux-next
>  $ make mrproper
>  $ make headers_install
>  ...
>  $ ls usr/include/linux/membarrier.h
>  usr/include/linux/membarrier.h
>  $ make -C tools/testing/selftests TARGETS=membarrier
>  make: Entering directory
>  '/home/michael/work/topics/selftests/linux-next/tools/testing/selftests'
>  for TARGET in membarrier; do \
>   make -C $TARGET; \
>  done;
>  make[1]: Entering directory
>  
> '/home/michael/work/topics/selftests/linux-next/tools/testing/selftests/membarrier'
>  gcc -g -I../../../../usr/include/ membarrier_test.c -o membarrier_test
>  make[1]: Leaving directory
>  
> '/home/michael/work/topics/selftests/linux-next/tools/testing/selftests/membarrier'
>  make: Leaving directory
>  '/home/michael/work/topics/selftests/linux-next/tools/testing/selftests'
> 
>  $ ./tools/testing/selftests/membarrier/membarrier_test
>  membarrier MEMBARRIER_CMD_QUERY failed. Function not implemented.
>  $
> 
> 
> So that seems to be working for me. Are you doing some different work flow, or
> am I just missing something?

When doing make headers_install, it indeed installs
membarrier.h where we expect it under the build output
dir:

$ ls usr/include/linux/membarrier.h 
usr/include/linux/membarrier.h

However, if I issue 

$ make -C tools/testing/selftests TARGETS=membarrier
make: Entering directory `/home/efficios/git/linux-next/tools/testing/selftests'
for TARGET in membarrier; do \
make -C $TARGET; \
  

Re: [PATCH v4 0/3] mtd: nand: jz4780: Add NAND and BCH drivers

2015-09-07 Thread Ezequiel Garcia
On 7 September 2015 at 11:54, Alex Smith  wrote:
> On 06/09/2015 21:38, Ezequiel Garcia wrote:
>> On 27 Jul 02:50 PM, Alex Smith wrote:
>>> Hi,
>>>
>>> This series adds support for the BCH controller and NAND devices on
>>> the Ingenic JZ4780 SoC.
>>>
>>> Tested on the MIPS Creator Ci20 board. All dependencies are now in
>>> mainline so it should be possible to compile test now.
>>>
>>> This version of the series has been rebased on 4.2-rc4, and also adds
>>> an additional patch to fix an issue that was encountered in the
>>> external Ci20 3.18 kernel branch.
>>>
>>> Review and feedback welcome.
>>>
>>
>> The NEMC driver seems to be upstream. Any chance you submit devicetree
>> changes as well for Ci20 (so we can actually test this)?
>
> Sure, can do. The pinctrl driver is not yet upstream (needs some work) which 
> is why I didn't add the DT changes initially, but at least if you boot the 
> board from the NAND then U-Boot should have left everything in a state usable 
> by the kernel.
>

Great, thanks! I definitely look forward to test this.
-- 
Ezequiel GarcĂ­a, VanguardiaSur
www.vanguardiasur.com.ar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] acpi: Add basic device probing infrastructure

2015-09-07 Thread Lorenzo Pieralisi
[+M.Salter]

On Fri, Sep 04, 2015 at 06:06:48PM +0100, Marc Zyngier wrote:
> IRQ controllers and timers are the two types of device the kernel
> requires before being able to use the device driver model.
> 
> ACPI so far lacks a proper probing infrastructure similar to the one
> we have with DT, where we're able to declare IRQ chips and
> clocksources inside the driver code, and let the core code pick it up
> and call us back on a match. This leads to all kind of really ugly
> hacks all over the arm64 code and even in the ACPI layer.
> 
> In order to allow some basic probing based on the ACPI tables,
> introduce "struct acpi_probe_entry" which contains just enough
> data and callbacks to match a table, an optional subtable, and
> call a probe function. A driver can, at build time, register itself
> and expect being called if the right entry exists in the ACPI
> table.
> 
> A acpi_probe_device_init() is provided, taking an ACPI table
> identifier, and iterating over the registered entries.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/acpi/scan.c   | 41 
>  include/asm-generic/vmlinux.lds.h | 11 
>  include/linux/acpi.h  | 56 
> +++
>  3 files changed, 108 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index ec25635..9e920ec 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -2793,3 +2793,44 @@ int __init acpi_scan_init(void)
>   mutex_unlock(&acpi_scan_lock);
>   return result;
>  }
> +
> +static const struct acpi_probe_entry device_acpi_probe_end
> + __used __section(__device_acpi_probe_table_end);
> +extern struct acpi_probe_entry __device_acpi_probe_table[];
> +static struct acpi_probe_entry *ape;
> +static int acpi_probe_count;
> +static DEFINE_SPINLOCK(acpi_probe_lock);
> +
> +static int __init acpi_match_madt(struct acpi_subtable_header *header,
> +   const unsigned long end)
> +{
> + if (!ape->validate_subtbl || ape->validate_subtbl(header, ape))
> + if (!ape->probe_subtbl(header, end))
> + acpi_probe_count++;
> +
> + return 0;
> +}
> +
> +int __init acpi_probe_device_table(const char *id)
> +{
> + int count = 0;
> +
> + if (acpi_disabled)
> + return 0;
> +
> + spin_lock(&acpi_probe_lock);
> + for (ape = __device_acpi_probe_table; ape->probe_table; ape++) {
> + if (!ACPI_COMPARE_NAME(id, ape->id))
> + continue;
> + if (ACPI_COMPARE_NAME(ACPI_SIG_MADT, ape->id)) {
> + acpi_probe_count = 0;
> + acpi_table_parse_madt(ape->type, acpi_match_madt, 0);
> + count += acpi_probe_count;
> + } else {
> + count = acpi_table_parse(ape->id, ape->probe_table);
> + }
> + }
> + spin_unlock(&acpi_probe_lock);
> +
> + return count;
> +}

We should add a mechanism to prevent re-parsing the same entries
multiple times (in case this function is called with the same
signature multiple times). We could create a separate table of device
entries, per-subsystem, that we want to parse (irqchip specific table,
timers, etc.) instead of adding all the devices to the same table (ie
linker section), you can do this already with the current patchset by
just choosing different table names as DT does.

We may also want to extend this set so that it can be used to parse the
same table, same subtype multiple times at different stages in the boot
path (but let's first see if it is a) really needed b) feasible).

Basically it is to avoid parsing the MADT multiple times:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/340267.html

Those can be extensions to the current patchset (because basically
they are not real issues at present), it is just a heads-up.

Thanks for putting it together !
Lorenzo

> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 8bd374d..875397a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -181,6 +181,16 @@
>  #define CPUIDLE_METHOD_OF_TABLES() OF_TABLE(CONFIG_CPU_IDLE, cpuidle_method)
>  #define EARLYCON_OF_TABLES() OF_TABLE(CONFIG_SERIAL_EARLYCON, earlycon)
>  
> +#ifdef CONFIG_ACPI
> +#define ACPI_PROBE_TABLE(name)   
> \
> + . = ALIGN(8);   \
> + VMLINUX_SYMBOL(__##name##_acpi_probe_table) = .;\
> + *(__##name##_acpi_probe_table)  \
> + *(__##name##_acpi_probe_table_end)
> +#else
> +#define ACPI_PROBE_TABLE(name)
> +#endif
> +
>  #define KERNEL_DTB() \
>   STRUCT_ALIGN(); \
>   VMLINUX_SYMBOL(__dtb_start) = .; 

[PATCH v2 1/1] Add Corsair Vengeance K90 driver

2015-09-07 Thread Clément Vuchener
This patch implements a HID driver for the Corsair Vengeance K90 keyboard. 

It fixes the behaviour of the keys using incorrect HID usage codes and exposes 
the macro playback mode and current profile to the user space through sysfs 
attributes. It also adds two LED class devices controlling the "record" LED and 
the backlight.

Signed-off-by: Clément Vuchener 
---
 Documentation/ABI/testing/sysfs-driver-hid-corsair |  15 +
 drivers/hid/Kconfig|  10 +
 drivers/hid/Makefile   |   1 +
 drivers/hid/hid-core.c |   1 +
 drivers/hid/hid-corsair.c  | 555 +
 drivers/hid/hid-ids.h  |   3 +
 6 files changed, 585 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-hid-corsair
 create mode 100644 drivers/hid/hid-corsair.c

diff --git a/Documentation/ABI/testing/sysfs-driver-hid-corsair 
b/Documentation/ABI/testing/sysfs-driver-hid-corsair
new file mode 100644
index 000..b8827f0
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-hid-corsair
@@ -0,0 +1,15 @@
+What:  /sys/bus/drivers/corsair//macro_mode
+Date:  August 2015
+KernelVersion: 4.2
+Contact:   Clement Vuchener 
+Description:   Get/set the current playback mode. "SW" for software mode
+   where G-keys triggers their regular key codes. "HW" for
+   hardware playback mode where the G-keys play their macro
+   from the on-board memory.
+
+
+What:  /sys/bus/drivers/corsair//current_profile
+Date:  August 2015
+KernelVersion: 4.2
+Contact:   Clement Vuchener 
+Description:   Get/set the current selected profile. Values are from 1 to 3.
diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig
index 6ab51ae..3fe9678 100644
--- a/drivers/hid/Kconfig
+++ b/drivers/hid/Kconfig
@@ -171,6 +171,16 @@ config HID_CHICONY
---help---
Support for Chicony Tactical pad.
 
+config HID_CORSAIR
+   tristate "Corsair devices"
+   depends on HID && USB && LEDS_CLASS
+   ---help---
+   Support for Corsair devices that are not fully compliant with the
+   HID standard.
+
+   Supported devices:
+   - Vengeance K90
+
 config HID_PRODIKEYS
tristate "Prodikeys PC-MIDI Keyboard support"
depends on HID && SND
diff --git a/drivers/hid/Makefile b/drivers/hid/Makefile
index e6441bc..edaa0f2 100644
--- a/drivers/hid/Makefile
+++ b/drivers/hid/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_HID_BELKIN)  += hid-belkin.o
 obj-$(CONFIG_HID_BETOP_FF) += hid-betopff.o
 obj-$(CONFIG_HID_CHERRY)   += hid-cherry.o
 obj-$(CONFIG_HID_CHICONY)  += hid-chicony.o
+obj-$(CONFIG_HID_CORSAIR)  += hid-corsair.o
 obj-$(CONFIG_HID_CP2112)   += hid-cp2112.o
 obj-$(CONFIG_HID_CYPRESS)  += hid-cypress.o
 obj-$(CONFIG_HID_DRAGONRISE)   += hid-dr.o
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index bcd914a..d5fc4d1 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1828,6 +1828,7 @@ static const struct hid_device_id 
hid_have_special_driver[] = {
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
USB_DEVICE_ID_CHICONY_WIRELESS2) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_AK1D) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
USB_DEVICE_ID_CHICONY_ACER_SWITCH12) },
+   { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K90) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CREATIVELABS, 
USB_DEVICE_ID_PRODIKEYS_PCMIDI) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CYGNAL, USB_DEVICE_ID_CYGNAL_CP2112) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, 
USB_DEVICE_ID_CYPRESS_BARCODE_1) },
diff --git a/drivers/hid/hid-corsair.c b/drivers/hid/hid-corsair.c
new file mode 100644
index 000..580c214
--- /dev/null
+++ b/drivers/hid/hid-corsair.c
@@ -0,0 +1,555 @@
+/*
+ * HID driver for Corsair devices
+ *
+ * Supported devices:
+ *  - Vengeance K90 Keyboard
+ *
+ * Copyright (c) 2015 Clement Vuchener
+ */
+
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "hid-ids.h"
+
+struct k90_led {
+   struct led_classdev cdev;
+   int brightness;
+   struct work_struct work;
+   int removed;
+};
+
+struct k90_drvdata {
+   int current_profile;
+   int macro_mode;
+   int meta_locked;
+   struct k90_led backlight;
+   struct k90_led record_led;
+};
+
+#define K90_GKEY_COUNT 18
+
+static int k90_usage_to_gkey(unsigned int usage)
+{
+   /* G1 (0xd0) to G16 (0xdf) */
+   if (usage >= 0xd0 && usage <= 0xdf)
+   return usage - 0xd0 + 1;
+   /* G17 (0xe8) to G18 (0xe9) */
+   if (usage >= 0xe8 && usage <= 0

[PATCH v2 0/1] Corsair Vengeance K90 driver

2015-09-07 Thread Clément Vuchener
I removed the k90_profile class completely. I cannot write a good enough ABI 
with what I know of the keyboard so I am leaving that part out of the kernel. 
If I change my mind in the future, it will be done in another patch.

I also fixed a bug I had when unregistering the led device. Work was being 
scheduled after the led device was unregistered.

On the name change, I kept a lot of K90 references. As far as I know, the only 
similar keyboard is the K60 that shares the same firmware but does not have all 
the special keys and backlight, and for which the hid-generic driver should be 
enough. The more recent RGB keyboard series uses a different protocol from what 
I have seen from the unofficial userspace driver (CKB from MSC).

changes in v2:
 - Removed the k90_profile class and devices
 - Renamed driver for a more generic name ("corsair" driver in hid-corsair.c)
 - Fixed led devices clean up (hang when unplugging and led state reset)
 - Added dependency on USB and LEDS_CLASS in Kconfig

Clément Vuchener (1):
  Add Corsair Vengeance K90 driver

 Documentation/ABI/testing/sysfs-driver-hid-corsair |  15 +
 drivers/hid/Kconfig|  10 +
 drivers/hid/Makefile   |   1 +
 drivers/hid/hid-core.c |   1 +
 drivers/hid/hid-corsair.c  | 555 +
 drivers/hid/hid-ids.h  |   3 +
 6 files changed, 585 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-hid-corsair
 create mode 100644 drivers/hid/hid-corsair.c

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1] usb: core: driver: Use kmalloc_array

2015-09-07 Thread Muhammad Falak R Wani
Use kmalloc_array instead of kmalloc to allocate memory for an array.
Also, remove the dev_warn for a memory leak, making the if check more
sleek.

Signed-off-by: Muhammad Falak R Wani 
---
On suggestion by Joe Perches 

Changes since v0
-remove dev_warn for memory leak
-remove unnecessary parens for if
---
 drivers/usb/core/driver.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index 818369a..e0636c1 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -416,12 +416,10 @@ static int usb_unbind_interface(struct device *dev)
if (ep->streams == 0)
continue;
if (j == 0) {
-   eps = kmalloc(USB_MAXENDPOINTS * sizeof(void *),
+   eps = kmalloc_array(USB_MAXENDPOINTS, sizeof(void *),
  GFP_KERNEL);
-   if (!eps) {
-   dev_warn(dev, "oom, leaking streams\n");
+   if (!eps)
break;
-   }
}
eps[j++] = ep;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test

2015-09-07 Thread Denys Vlasenko
This new test checks that all x86 registers are preserved across
32-bit syscalls. It tests syscalls through VDSO (if available)
and through INT 0x80, normally and under ptrace.

If kernel is a 64-bit one, high registers (r8..r15) are poisoned
before the syscall is called and are checked afterwards.

They must be either preserved, or cleared to zero (but r11 is special);
r12..15 must be preserved for INT 0x80.

EFLAGS is checked for changes too, but change there is not
considered to be a bug (paravirt kernels do not preserve
arithmetic flags).

Run-tested on 64-bit kernel:

$ ./test_syscall_vdso_32
[RUN]   Executing 6-argument 32-bit syscall via VDSO
[OK]Arguments are preserved across syscall
[NOTE]  R11 has changed:00200ed7 - assuming clobbered by SYSRET insn
[OK]R8..R15 did not leak kernel data
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]Arguments are preserved across syscall
[OK]R8..R15 did not leak kernel data
[RUN]   Running tests under ptrace
[RUN]   Executing 6-argument 32-bit syscall via VDSO
[OK]Arguments are preserved across syscall
[OK]R8..R15 did not leak kernel data
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]Arguments are preserved across syscall
[OK]R8..R15 did not leak kernel data

On 32-bit paravirt kernel:

$ ./test_syscall_vdso_32
[NOTE]  Not a 64-bit kernel, won't test R8..R15 leaks
[RUN]   Executing 6-argument 32-bit syscall via VDSO
[WARN]  Flags before=00200ed7 id 0 00 o d i s z 0 a 0 p 1 c
[WARN]  Flags  after=00200246 id 0 00 i z 0 0 p 1
[WARN]  Flags change=0c91 0 00 o d s 0 a 0 0 c
[OK]Arguments are preserved across syscall
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]Arguments are preserved across syscall
[RUN]   Running tests under ptrace
[RUN]   Executing 6-argument 32-bit syscall via VDSO
[OK]Arguments are preserved across syscall
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]Arguments are preserved across syscall

Signed-off-by: Denys Vlasenko 
CC: Linus Torvalds 
CC: Steven Rostedt 
CC: Ingo Molnar 
CC: Borislav Petkov 
CC: "H. Peter Anvin" 
CC: Andy Lutomirski 
CC: Oleg Nesterov 
CC: Frederic Weisbecker 
CC: Alexei Starovoitov 
CC: Will Drewry 
CC: Kees Cook 
CC: x...@kernel.org
CC: linux-kernel@vger.kernel.org
---

Changes in v2:
 does not fail if VDSO can't be found;
 tests INT 80 syscall method;
 tests syscalls under ptrace;
 switched to /* */ comments

Changes in v3:
 added checking for r8..r15 info leaks

Changes in v4:
 re-added Makefile change

 tools/testing/selftests/x86/Makefile|   2 +-
 tools/testing/selftests/x86/test_syscall_vdso.c | 401 
 tools/testing/selftests/x86/thunks_32.S |  55 
 3 files changed, 457 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/test_syscall_vdso.c
 create mode 100644 tools/testing/selftests/x86/thunks_32.S

diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index caa60d5..84effa6 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -5,7 +5,7 @@ include ../lib.mk
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs ldt_gdt syscall_nt
-TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn
+TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn 
test_syscall_vdso
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 BINARIES_32 := $(TARGETS_C_32BIT_ALL:%=%_32)
@@ -60,3 +60,4 @@ endif
 
 # Some tests have additional dependencies.
 sysret_ss_attrs_64: thunks.S
+test_syscall_vdso_32: thunks_32.S
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c 
b/tools/testing/selftests/x86/test_syscall_vdso.c
new file mode 100644
index 000..0792aef
--- /dev/null
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -0,0 +1,401 @@
+/*
+ * 32-bit syscall ABI conformance test.
+ *
+ * Copyright (c) 2015 Denys Vlasenko
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+/*
+ * Can be built statically:
+ * gcc -Os -Wall -static -m32 test_syscall_vdso.c thunks_32.S
+ */
+#undef _GNU_SOURCE
+#define _GNU_SOURCE 1
+#undef __USE_GNU
+#define __USE_GNU 1
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#if !defined(__i386__)
+int main(int argc, char **argv, char **envp)
+{
+   printf("[SKIP]\tNot a 32-bit x86 userspace\n");
+   return 0;
+}
+#else
+
+long syscal

Re: [RFC PATCH 1/3] arm64: entry: Remove unnecessary calculation for S_SP in EL1h

2015-09-07 Thread Jungseok Lee
On Sep 7, 2015, at 11:56 PM, Mark Rutland wrote:

Hi Mark,

> On Fri, Sep 04, 2015 at 03:23:05PM +0100, Jungseok Lee wrote:
>> Under EL1h, S_SP data is not seen in kernel_exit. Thus, x21 calculation
>> is not needed in kernel_entry. Currently, S_SP information is vaild only
>> when sp_el0 is used.
> 
> I don't think this is true. The generic BUG implementation will grab the
> saved SP from the pt_regs, and with this change we'll report whatever
> happened to be in x21 instead.
> 
>> Signed-off-by: Jungseok Lee 
>> ---
>> arch/arm64/kernel/entry.S | 2 --
>> 1 file changed, 2 deletions(-)
>> 
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index e163518..d23ca0d 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -91,8 +91,6 @@
>>  get_thread_info tsk // Ensure MDSCR_EL1.SS is clear,
>>  ldr x19, [tsk, #TI_FLAGS]   // since we can unmask debug
>>  disable_step_tsk x19, x20   // exceptions when scheduling.
>> -.else
>> -add x21, sp, #S_FRAME_SIZE
>>  .endif
>>  mrs x22, elr_el1
>>  mrs x23, spsr_el1
> 
> Immediately after this we do:
> 
>   stp lr, x21, [sp, #S_LR]
> 
> To store the LR and SP to the pt_regs which bug_handler would use.
> 
> Am I missing smoething?

No, You're right. As James mentioned, x21 is used in do_sp_pc_abort.

Thanks for the comment.

Best Regards
Jungseok Lee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: fix bug which lowmem size is limited to 760MB

2015-09-07 Thread Nicolas Pitre
On Mon, 7 Sep 2015, Arnd Bergmann wrote:

> On Monday 07 September 2015 11:34:36 Nicolas Pitre wrote:
> > 
> > That shifts the risk to user space though.  But if there is a regression 
> > there, it will manifest itself on all systems and not only with some 
> > particular hardware.
> 
> I'd consider that a good thing, as it makes it easier to test when
> you see the same behavior on systems with any memory size.

Sure, that was my point, although I admitedly didn't say it clearly.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: kernel: Use a separate stack for irq interrupts.

2015-09-07 Thread Jungseok Lee
On Sep 7, 2015, at 11:36 PM, James Morse wrote:

Hi James,

> Having to handle interrupts on top of an existing kernel stack means the
> kernel stack must be large enough to accomodate both the maximum kernel
> usage, and the maximum irq handler usage. Switching to a different stack
> when processing irqs allows us to make the stack size smaller.
> 
> Maximum kernel stack usage (running ltp and generating usb+ethernet
> interrupts) was 7256 bytes. With this patch, the same workload gives
> a maximum stack usage of 5816 bytes.

I'd like to know how to measure the max stack depth.
AFAIK, a stack tracer on ftrace does not work well. Did you dump a stack
region and find or track down an untouched region? 

I will leave comments after reading and playing with this change carefully.

Best Regards
Jungseok Lee

> Signed-off-by: James Morse 
> ---
> arch/arm64/include/asm/irq.h | 12 +
> arch/arm64/include/asm/thread_info.h |  8 --
> arch/arm64/kernel/entry.S| 33 ---
> arch/arm64/kernel/irq.c  | 52 
> arch/arm64/kernel/smp.c  |  4 +++
> arch/arm64/kernel/stacktrace.c   |  4 ++-
> 6 files changed, 107 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> index bbb251b14746..050d4196c736 100644
> --- a/arch/arm64/include/asm/irq.h
> +++ b/arch/arm64/include/asm/irq.h
> @@ -2,14 +2,20 @@
> #define __ASM_IRQ_H
> 
> #include 
> +#include 
> 
> #include 
> +#include 
> +
> +DECLARE_PER_CPU(unsigned long, irq_sp);
> 
> struct pt_regs;
> 
> extern void migrate_irqs(void);
> extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
> 
> +extern int alloc_irq_stack(unsigned int cpu);
> +
> static inline void acpi_irq_init(void)
> {
>   /*
> @@ -21,4 +27,10 @@ static inline void acpi_irq_init(void)
> }
> #define acpi_irq_init acpi_irq_init
> 
> +static inline bool is_irq_stack(unsigned long sp)
> +{
> + struct thread_info *ti = get_thread_info(sp);
> + return (get_thread_info(per_cpu(irq_sp, ti->cpu)) == ti);
> +}
> +
> #endif
> diff --git a/arch/arm64/include/asm/thread_info.h 
> b/arch/arm64/include/asm/thread_info.h
> index dcd06d18a42a..b906254fc400 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -69,12 +69,16 @@ register unsigned long current_stack_pointer asm ("sp");
> /*
>  * how to get the thread information struct from C
>  */
> +static inline struct thread_info *get_thread_info(unsigned long sp)
> +{
> + return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
> +}
> +
> static inline struct thread_info *current_thread_info(void) 
> __attribute_const__;
> 
> static inline struct thread_info *current_thread_info(void)
> {
> - return (struct thread_info *)
> - (current_stack_pointer & ~(THREAD_SIZE - 1));
> + return get_thread_info(current_stack_pointer);
> }
> 
> #define thread_saved_pc(tsk)  \
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index e16351819fed..d42371f3f5a1 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -190,10 +190,37 @@ tsk .reqx28 // current thread_info
>  * Interrupt handling.
>  */
>   .macro  irq_handler
> - adrpx1, handle_arch_irq
> - ldr x1, [x1, #:lo12:handle_arch_irq]
> - mov x0, sp
> + mrs x21, tpidr_el1
> + adr_l   x20, irq_sp
> + add x20, x20, x21
> +
> + ldr x21, [x20]
> + mov x20, sp
> +
> + mov x0, x21
> + mov x1, x20
> + bl  irq_copy_thread_info
> +
> + /* test for recursive use of irq_sp */
> + cbz w0, 1f
> + mrs x30, elr_el1
> + mov sp, x21
> +
> + /*
> +  * Create a fake stack frame to bump unwind_frame() onto the original
> +  * stack. This relies on x29 not being clobbered by kernel_entry().
> +  */
> + pushx29, x30
> +
> +1:   ldr_l   x1, handle_arch_irq
> + mov x0, x20
>   blr x1
> +
> + mov x0, x20
> + mov x1, x21
> + bl  irq_copy_thread_info
> + mov sp, x20
> +
>   .endm
> 
>   .text
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index 463fa2e7e34c..10b57a006da8 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -26,11 +26,14 @@
> #include 
> #include 
> #include 
> +#include 
> #include 
> #include 
> 
> unsigned long irq_err_count;
> 
> +DEFINE_PER_CPU(unsigned long, irq_sp) = 0;
> +
> int arch_show_interrupts(struct seq_file *p, int prec)
> {
> #ifdef CONFIG_SMP
> @@ -55,6 +58,10 @@ void __init init_IRQ(void)
>   irqchip_init();
>   if (!handle_arch_irq)
>   panic("No interrupt controller found.");
> +
> + /* Allocate an irq stack for the boot cpu */
> + if (alloc_irq_stack(smp_processor_id()))
> + panic("Failed to allocate irq stack for boot cpu.");
> }
> 

[PATCH v4 4/4] ARM: dts: add suspend opp to exynos4412

2015-09-07 Thread Bartlomiej Zolnierkiewicz
Mark 800MHz OPP as a suspend opp for Exynos4412 based
boards so effectively cpufreq-dt driver behavior w.r.t.
suspend frequency matches what the old exynos-cpufreq
driver has been doing.

This patch fixes suspend/resume support on Exynos4412 based
Trats2 board and reboot hang on Exynos4412 based Odroid U3
board.

Cc: Thomas Abraham 
Cc: Javier Martinez Canillas 
Cc: Krzysztof Kozlowski 
Cc: Marek Szyprowski 
Cc: Tobias Jakobi 
Acked-by: Viresh Kumar 
Signed-off-by: Bartlomiej Zolnierkiewicz 
---
 arch/arm/boot/dts/exynos4412.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/exynos4412.dtsi 
b/arch/arm/boot/dts/exynos4412.dtsi
index ca0e3c1..294cfe4 100644
--- a/arch/arm/boot/dts/exynos4412.dtsi
+++ b/arch/arm/boot/dts/exynos4412.dtsi
@@ -98,6 +98,7 @@
opp-hz = /bits/ 64 <8>;
opp-microvolt = <100>;
clock-latency-ns = <20>;
+   opp-suspend;
};
opp07 {
opp-hz = /bits/ 64 <9>;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: fix bug which lowmem size is limited to 760MB

2015-09-07 Thread Arnd Bergmann
On Monday 07 September 2015 11:34:36 Nicolas Pitre wrote:
> 
> That shifts the risk to user space though.  But if there is a regression 
> there, it will manifest itself on all systems and not only with some 
> particular hardware.

I'd consider that a good thing, as it makes it easier to test when
you see the same behavior on systems with any memory size.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/4] cpufreq-dt: add suspend frequency support

2015-09-07 Thread Bartlomiej Zolnierkiewicz
Add suspend frequency support and if needed set it to
the frequency obtained from the suspend opp (can be defined
using opp-v2 bindings and is optional).

Cc: Viresh Kumar 
Cc: Thomas Abraham 
Cc: Javier Martinez Canillas 
Cc: Krzysztof Kozlowski 
Cc: Marek Szyprowski 
Cc: Tobias Jakobi 
Signed-off-by: Bartlomiej Zolnierkiewicz 
---
 drivers/cpufreq/cpufreq-dt.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index c3583cd..e08ae40 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -196,6 +196,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
struct device *cpu_dev;
struct regulator *cpu_reg;
struct clk *cpu_clk;
+   struct dev_pm_opp *suspend_opp;
unsigned long min_uV = ~0, max_uV = 0;
unsigned int transition_latency;
bool need_update = false;
@@ -329,6 +330,13 @@ static int cpufreq_init(struct cpufreq_policy *policy)
policy->driver_data = priv;
 
policy->clk = cpu_clk;
+
+   rcu_read_lock();
+   suspend_opp = dev_pm_opp_get_suspend_opp(cpu_dev);
+   if (suspend_opp)
+   policy->suspend_freq = dev_pm_opp_get_freq(suspend_opp) / 1000;
+   rcu_read_unlock();
+
ret = cpufreq_table_validate_and_show(policy, freq_table);
if (ret) {
dev_err(cpu_dev, "%s: invalid frequency table: %d\n", __func__,
@@ -419,6 +427,9 @@ static struct cpufreq_driver dt_cpufreq_driver = {
.ready = cpufreq_ready,
.name = "cpufreq-dt",
.attr = cpufreq_dt_attr,
+#ifdef CONFIG_PM
+   .suspend = cpufreq_generic_suspend,
+#endif
 };
 
 static int dt_cpufreq_probe(struct platform_device *pdev)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/4] PM / OPP: add dev_pm_opp_get_suspend_opp() helper

2015-09-07 Thread Bartlomiej Zolnierkiewicz
Add dev_pm_opp_get_suspend_opp() helper to obtain suspend opp.

Cc: Viresh Kumar 
Cc: Thomas Abraham 
Cc: Javier Martinez Canillas 
Cc: Krzysztof Kozlowski 
Cc: Marek Szyprowski 
Cc: Tobias Jakobi 
Signed-off-by: Bartlomiej Zolnierkiewicz 
---
 drivers/base/power/opp.c | 30 ++
 include/linux/pm_opp.h   |  6 ++
 2 files changed, 36 insertions(+)

diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index eb25449..3d948ea 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -341,6 +341,36 @@ unsigned long dev_pm_opp_get_max_clock_latency(struct 
device *dev)
 EXPORT_SYMBOL_GPL(dev_pm_opp_get_max_clock_latency);
 
 /**
+ * dev_pm_opp_get_suspend_opp() - Get suspend opp
+ * @dev:   device for which we do this operation
+ *
+ * Return: This function returns pointer to the suspend opp if it is
+ * defined, otherwise it returns NULL.
+ *
+ * Locking: This function must be called under rcu_read_lock(). opp is a rcu
+ * protected pointer. The reason for the same is that the opp pointer which is
+ * returned will remain valid for use with opp_get_{voltage, freq} only while
+ * under the locked area. The pointer returned must be used prior to unlocking
+ * with rcu_read_unlock() to maintain the integrity of the pointer.
+ */
+struct dev_pm_opp *dev_pm_opp_get_suspend_opp(struct device *dev)
+{
+   struct device_opp *dev_opp;
+   struct dev_pm_opp *opp;
+
+   opp_rcu_lockdep_assert();
+
+   dev_opp = _find_device_opp(dev);
+   if (IS_ERR(dev_opp))
+   opp = NULL;
+   else
+   opp = dev_opp->suspend_opp;
+
+   return opp;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_suspend_opp);
+
+/**
  * dev_pm_opp_get_opp_count() - Get number of opps available in the opp list
  * @dev:   device for which we do this operation
  *
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index cab7ba5..e817722 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -34,6 +34,7 @@ bool dev_pm_opp_is_turbo(struct dev_pm_opp *opp);
 
 int dev_pm_opp_get_opp_count(struct device *dev);
 unsigned long dev_pm_opp_get_max_clock_latency(struct device *dev);
+struct dev_pm_opp *dev_pm_opp_get_suspend_opp(struct device *dev);
 
 struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev,
  unsigned long freq,
@@ -80,6 +81,11 @@ static inline unsigned long 
dev_pm_opp_get_max_clock_latency(struct device *dev)
return 0;
 }
 
+static inline struct dev_pm_opp *dev_pm_opp_get_suspend_opp(struct device *dev)
+{
+   return NULL;
+}
+
 static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev,
unsigned long freq, bool available)
 {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/4] cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency

2015-09-07 Thread Bartlomiej Zolnierkiewicz
Some cpufreq drivers may set suspend frequency only for
selected setups but still would like to use the generic
suspend handler.  Thus don't treat !policy->suspend_freq
condition as an incorrect one.

Cc: Viresh Kumar 
Cc: Thomas Abraham 
Cc: Javier Martinez Canillas 
Cc: Krzysztof Kozlowski 
Cc: Marek Szyprowski 
Cc: Tobias Jakobi 
Signed-off-by: Bartlomiej Zolnierkiewicz 
---
 drivers/cpufreq/cpufreq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index b3d9368..a634fcb 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1626,8 +1626,8 @@ int cpufreq_generic_suspend(struct cpufreq_policy *policy)
int ret;
 
if (!policy->suspend_freq) {
-   pr_err("%s: suspend_freq can't be zero\n", __func__);
-   return -EINVAL;
+   pr_debug("%s: suspend_freq not defined\n", __func__);
+   return 0;
}
 
pr_debug("%s: Setting suspend-freq: %u\n", __func__,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/3] Implement IRQ stack on ARM64

2015-09-07 Thread Jungseok Lee
On Sep 7, 2015, at 11:33 PM, James Morse wrote:
> On 04/09/15 15:23, Jungseok Lee wrote:
>> ARM64 kernel allocates 16KB kernel stack when creating a process. In case
>> of low memory platforms with tough workloads on userland, this order-2
>> allocation request reaches to memory pressure and performance degradation
>> simultaenously since VM page allocator falls into slowpath frequently,
>> which triggers page reclaim and compaction.
>> 
>> I believe that one of the best solutions is to reduce kernel stack size.
>> According to the following data from stack tracer with some fixes, [1],
>> a separate IRQ stack would greatly help to decrease a kernel stack depth.
>> 
> 
> Hi Jungseok Lee,

Hi James Morse,

> I was working on a similar patch for irq stack, (patch as a follow up email).
> 
> I suggest we work together on a single implementation. I think the only
> major difference is that you're using sp_el0 as a temporary register to
> store a copy of the stack-pointer to find struct thread_info, whereas I was
> copying it between stacks (ends up as 2x ldp/stps), which keeps the change
> restricted to irq_stack setup code.
> 
> We should get some feedback as to which approach is preferred.

Great idea!
I'd really like to figure out the most ideal implementation of this feature.

Best Regards
Jungseok Lee--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/4] cpufreq-dt: add suspend frequency support

2015-09-07 Thread Bartlomiej Zolnierkiewicz
Hi,

This patch series adds suspend frequency support (using opp-v2
bindings and suspend-opp functionality) to cpufreq-dt driver and
then adds suspend opp for Exynos4412 based boards.

This patch series fixes suspend/resume support on Exynos4412
based Trats2 board and reboot hang on Exynos4412 based Odroid
U3 board.

Changes since v3:
- fixed dev_pm_opp_get_suspend_opp() locking
- shortened variable name in dev_pm_opp_get_suspend_opp()
- adjusted cpufreq_generic_suspend() to work with cpufreq-dt
- removed no longer needed cpufreq_dt_suspend()
- added Acked-by tag from Viresh to patch #4

Changes since v2:
- rewrote to use suspend-opp functionality

Changes since v1:
- removed superfluous ";"

Depends on:
- next-20150902 branch of linux-next kernel tree

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics


Bartlomiej Zolnierkiewicz (4):
  PM / OPP: add dev_pm_opp_get_suspend_opp() helper
  cpufreq: allow cpufreq_generic_suspend() to work without suspend
frequency
  cpufreq-dt: add suspend frequency support
  ARM: dts: add suspend opp to exynos4412

 arch/arm/boot/dts/exynos4412.dtsi |  1 +
 drivers/base/power/opp.c  | 30 ++
 drivers/cpufreq/cpufreq-dt.c  | 11 +++
 drivers/cpufreq/cpufreq.c |  4 ++--
 include/linux/pm_opp.h|  6 ++
 5 files changed, 50 insertions(+), 2 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall 

---
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: net...@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

Changes in v4:
- Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
- The grant callback doesn't allow anymore to use less data. An
helpers has been added in netback to handle this.

Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 --
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0x
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
xenvif_queue *queue,
return meta;
 }
 
+struct gop_frag_copy {
+   struct xenvif_queue *queue;
+   struct netrx_pending_operations *npo;
+   struct xenvif_rx_meta *meta;
+   int head;
+   int gso_type;
+
+   struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+ unsigned int offset,
+ unsigned int *len,
+ struct gop_frag_copy *info)
+{
+   struct gnttab_copy *copy_gop;
+   struct xen_page_foreign *foreign;
+   /* Convenient aliases */
+   struct xenvif_queue *queue = info->queue;
+   struct netrx_pending_operations *npo = info->npo;
+   struct page *page = info->page;
+
+   BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+   if (npo->copy_off == MAX_BUFFER_OFFSET)
+   info->meta = get_next_rx_buffer(queue, npo);
+
+   if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+   *len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+   copy_gop = npo->copy + npo->copy_prod++;
+   copy_gop->flags = GNTCOPY_de

[PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: net...@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

Changes in v4:
- s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ base on the new naming
- xennet_tx_setup_grant was calling itself resulting an
guest stall when using iperf.
- The grant callback doesn't allow anymore to change the len
(wasn't used here)
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- gnttab_page_grant_foreign_ref has been renamed to
gnttab_foreach_grant_foreign_ref_one

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
 drivers/net/xen-netfront.c | 122 -
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF  0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
-   unsigned long gfn;
+   struct page *page;
struct xen_netif_rx_request *req;
 
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
 
-   gfn = 
xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+   page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
req = RING_GET_REQUEST(&queue->rx, req_prod);
-   gnttab_grant_foreign_access_ref(ref,
-   queue->info->xbdev->otherend_id,
-   gfn,
-   0);
-
+   gnttab_page_grant_foreign_access_ref_one(ref,
+
queue->info->xbdev->otherend_id,
+page,
+0);
req->id = id;
req->gref = ref;
}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-   struct netfront_queue *queue, struct sk_buff *skb,
-   struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+   struct netfront_queue *queue;
+   struct sk_buff *skb;
+   struct page *page;
+   struct xen_netif_tx_request *tx; /* Last request */
+   unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
 {
+   struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
-   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   /* convenient aliases */
+   struct page *page = info->page;
+   struct netfront_queue *queue = info->queue;
+   struct sk_buff *skb = info->skb;
 
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_

[PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

Changes in v4:
- Rebase after d50babbe300eedf33ea5b00a12c5df3a05bd96c7 "
xen-blkfront: introduce blkfront_gather_backend_features()"
- Fix typoes
- Add Roger's acked-by

Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_GREFS
- Split lines over 80 characters whenever it's possible
- s/mfn/gfn/ based on the new naming
- The grant callback doesn't allow anymore to change the len
(wasn't used here).
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- Use gnttab_count_grant to get the number of grants in a sg
- Do some renaming to use the correct variable every time

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 324 ---
 1 file changed, 213 insertions(+), 111 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4232cbd..f2cdc73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
struct grant **grants_used;
struct grant **indirect_grants;
struct scatterlist *sg;
+   unsigned int num_sg;
 };
 
 struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info)\
+   __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE  \
+   __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int feature_persistent:1;
+   /* Number of 4KB segments handled */
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME   "xvd"   /* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME   \
+   (GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants)\
+   DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs)  ((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
@@ -466,14 +485,100 @@ static int blkif_queue_discard_req(struct request *req)
return 0;
 }
 
+struct setu

[PATCH v4 20/20] arm/xen: Add support for 64KB page granularity

2015-09-07 Thread Julien Grall
The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Russell King 

Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT

Changes in v4:
- Add Stefano's Reviewed-by

Changes in v3:
- s/MFN/GFN/ base on the new naming
- Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
- Drop Stefano's reviewed-by
- Add some docs in arch/arm/asm/xen/page.h

Changes in v2
- Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 15 +--
 arch/arm/xen/enlighten.c|  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
 
 #define INVALID_P2M_ENTRY  (~0UL)
 
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
 unsigned long __pfn_to_mfn(unsigned long pfn);
 extern struct rb_root phys_to_mach;
 
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
 #define bfn_to_local_pfn(bfn)  bfn_to_pfn(bfn)
 
 /* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v) (pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v) (pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 /* Only used in PV code. But ARM guests are always HVM. */
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
pr_info("Xen: initializing cpu%d\n", cpu);
vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-   info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-   info.offset = offset_in_page(vcpup);
+   info.mfn = virt_to_gfn(vcpup);
+   info.offset = xen_offset_in_page(vcpup);
 
err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = virt_to_gfn(shared_info_page);
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
BUG();
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-09-07 Thread Julien Grall
For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granularity, a single page will be spread over multiple
Xen frame.

To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.

Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.

Signed-off-by: Julien Grall 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Wei Liu 

Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
for the PV MMU code is kept in order to have at least one even if we
ever decide to drop of code section.

Changes in v4:
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
- Use the field lru in the page to get a list of pages when
decreasing the memory reservation. It avoids to use a static
array to store the pages (see v3).
- Update comment for EXTENT_ORDER.

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- Rather than splitting the page into 4KB chunk, use the
extent_order field to allocate directly a Linux page size. This
is avoid lots of code for no benefits.

Changes in v2:
- Use xen_apply_to_page to split a page in 4K chunk
- It's not necessary to have a smaller frame list. Re-use
PAGE_SIZE
- Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 59 ++-
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index c79329f..3babf13 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,11 @@
 #include 
 #include 
 
+/* Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
 /*
  * balloon_process() state:
  *
@@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* We don't support PV MMU when Linux and Xen is using
+* different page granularity.
+*/
+   BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 /*
  * add_memory() will build page tables for the new memory so
  * the p2m must contain invalid entries so the correct
@@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long 
credit)
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
int rc;
-   unsigned long  pfn, i;
+   unsigned long i;
struct page   *page;
struct xen_memory_reservation reservation = {
.address_bits = 0,
-   .extent_order = 0,
+   .extent_order = EXTENT_ORDER,
.domid= DOMID_SELF
};
 
@@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
nr_pages = i;
break;
}
-   frame_list[i] = page_to_pfn(page);
+
+   /* XENMEM_populate_physmap requires a PFN based on Xen
+* granularity.
+*/
+   frame_list[i] = page_to_xen_pfn(page);
page = balloon_next_page(page);
}
 
@@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
page = balloon_retrieve(false);
BUG_ON(page == NULL);
 
-   pfn = page_to_pfn(page);
-
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* We don't support PV MMU when Linux and Xen is using
+* different page granularity.
+*/
+   BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long pfn = page_to_pfn(page);
+
set_phys_to_machine(pfn, frame_list[i]);
 
/* Link back into the page tables if not highmem. */
@@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
 static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 {
enum bp_state state = BP_DONE;
-   unsigned long  pfn, i;
-   struct page   *page;
+   unsigned long i;
+   struct page *page, *tmp;
int ret;
struct xen_memory_reservation reservation = {
.address_bits = 0,
-   .extent_order = 0,
+   .extent_order = EXTENT_ORDER,
.domid= DOMID_SELF
};
+   LIST_HEAD(pages);
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
 

[PATCH v4 16/20] block/xen-blkback: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.

Signed-off-by: Julien Grall 
Acked-by: "Roger Pau Monné" 
---

Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.

Changes in v4:
- Add Roger's acked-by

Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
characters
---
 drivers/block/xen-blkback/blkback.c |  5 +++--
 drivers/block/xen-blkback/common.h  | 17 +
 drivers/block/xen-blkback/xenbus.c  |  9 ++---
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 954c002..802319a 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request 
*req,
seg[n].nsec = segments[i].last_sect -
segments[i].first_sect + 1;
seg[n].offset = (segments[i].first_sect << 9);
-   if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+   if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
(segments[i].last_sect < segments[i].first_sect)) {
rc = -EINVAL;
goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
req_operation = req->operation == BLKIF_OP_INDIRECT ?
req->u.indirect.indirect_op : req->operation;
+
if ((req->operation == BLKIF_OP_INDIRECT) &&
(req_operation != BLKIF_OP_READ) &&
(req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
seg[i].nsec = req->u.rw.seg[i].last_sect -
req->u.rw.seg[i].first_sect + 1;
seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
-   if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+   if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 
9)) ||
(req->u.rw.seg[i].last_sect <
 req->u.rw.seg[i].first_sect))
goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 45a044a..68e87a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
  */
 #define MAX_INDIRECT_SEGMENTS 256
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+#define SEGS_PER_INDIRECT_FRAME\
+   (XEN_PAGES_PER_INDIRECT_FRAME / XEN_PAGES_PER_SEGMENT)
+
 #define MAX_INDIRECT_PAGES \
((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 
1)/SEGS_PER_INDIRECT_FRAME)
-#define INDIRECT_PAGES(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+#define INDIRECT_PAGES(_segs) DIV_ROUND_UP(_segs, XEN_PAGES_PER_INDIRECT_FRAME)
 
 /* Not a real protocol.  Used to generate ring structs which contain
  * the elements common to all protocols only.  This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, 
grant_ref_t *gref,
{
struct blkif_sring *sring;
sring = (struct blkif_sring *)blkif->blk_ring;
-   BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * 
nr_grefs);
+   BACK_RING_INIT(&blkif->blk_rings.native, sring,
+ 

[PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity

2015-09-07 Thread Julien Grall
Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
 * control block: a page is allocating per CPU
 * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

* control block: sharing between multiple vCPUs. Although it will
require some bookkeeping in order to not free the page when the CPU
goes offline and the other CPUs sharing the page still there

* event array: always extend the array event by 64K (i.e 16 4K
chunk). That would require more care when we fail to expand the
event channel.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Note I haven't updated the suggestion to reduce the memory waste
after David's email [1]. I can do it if necessary.

Changes in v3:
- Add David and Stefano's reviewed-by

[1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include 
 #include 
 #include 
-#include 
 #endif
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 10/20] xen/xenbus: Use Xen page definition

2015-09-07 Thread Julien Grall
All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/MFN/GFN base on the new naming
- Add David and Stefano's reviewed-by

Changes in v2:
- Also update the ring mapping function
---
 drivers/xen/xenbus/xenbus_client.c | 6 +++---
 drivers/xen/xenbus/xenbus_probe.c  | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 2ba09c1..359e654 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void 
*vaddr,
}
grefs[i] = err;
 
-   vaddr = vaddr + PAGE_SIZE;
+   vaddr = vaddr + XEN_PAGE_SIZE;
}
 
return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device 
*dev,
if (!node)
return -ENOMEM;
 
-   area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+   area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
if (!area) {
kfree(node);
return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device 
*dev, void *vaddr)
unsigned long addr;
 
memset(&unmap[i], 0, sizeof(unmap[i]));
-   addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+   addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
unmap[i].host_addr = arbitrary_virt_to_machine(
lookup_address(addr, &level)).maddr;
unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 3cbe055..33a31cf 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -802,7 +802,8 @@ static int __init xenbus_init(void)
goto out_error;
xen_store_gfn = (unsigned long)v;
xen_store_interface =
-   xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
+   xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+ XEN_PAGE_SIZE);
break;
default:
pr_warn("Xenstore state unknown\n");
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 14/20] xen/grant-table: Make it running on 64KB granularity

2015-09-07 Thread Julien Grall
The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Stefano Stabellini 
Cc: Russell King 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Add Stefano's reviewed-by

Changes in v2
- Add David's reviewed-by
---
 arch/arm/xen/p2m.c| 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref 
*map_ops,
for (i = 0; i < count; i++) {
if (map_ops[i].status)
continue;
-   set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-   map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+   set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+   map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
}
 
return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref 
*unmap_ops,
int i;
 
for (i = 0; i < count; i++) {
-   set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+   set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
INVALID_P2M_ENTRY);
}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7b4e1cf..99ed9c2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
if (xen_auto_xlat_grant_frames.count)
return -EINVAL;
 
-   vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+   vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
if (vaddr == NULL) {
pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
return -ENOMEM;
}
for (i = 0; i < max_nr_gframes; i++)
-   pfn[i] = PFN_DOWN(addr) + i;
+   pfn[i] = XEN_PFN_DOWN(addr) + i;
 
xen_auto_xlat_grant_frames.vaddr = vaddr;
xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
/* Only version 1 is used, which will always be available. */
grant_table_version = 1;
-   grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+   grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
gnttab_interface = &gnttab_v1_ops;
 
pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 11/20] tty/hvc: xen: Use xen page definition

2015-09-07 Thread Julien Grall
The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: David Vrabel 
Cc: Boris Ostrovsky 
Cc: linuxppc-...@lists.ozlabs.org

Changes in v4:
- The ring is always 4K (i.e XEN_PAGE_SIZE), so no need to
map with PAGE_SIZE. This was correctly done in v2 but lost with
the rebase to the "s/mfn/gfn/" series

Changes in v3:
- Some changes has been moved in the series "Use correctly the
Xen memory terminologies in Linux".
- Add Stefano's reviewed-by
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..fa816b7 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
if (r < 0 || v == 0)
goto err;
gfn = v;
-   info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+   info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
if (info->intf == NULL)
goto err;
info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
xencons_disconnect_backend(info);
-   memset(info->intf, 0, PAGE_SIZE);
+   memset(info->intf, 0, XEN_PAGE_SIZE);
return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-09-07 Thread Julien Grall
The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

I kept the hypercall arguments in remap_data to avoid allocating them on
the stack every time that remap_pte_fn is called.
I will keep like that unless someone is strongly disagree.

Changes in v4:
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- The function to split a Linux page in mutiple Xen page has
been moved internally. It was the only use (not used anymore in
the balloon) and it's not quite clear what should be the common
interface. Differ the question until someone need to use it.
- s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
- Use DIV_ROUND_UP rather round_up and fix the usage in
xen_xlate_unmap_gfn_range

Changes in v2:
- Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 ++--
 drivers/xen/xlate_mmu.c | 124 
 2 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
return -EINVAL;
}
 
-   nr_pages = m.num;
+   nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
goto out_unlock;
}
if (xen_feature(XENFEAT_auto_translated_physmap)) {
-   ret = alloc_empty_pages(vma, m.num);
+   ret = alloc_empty_pages(vma, nr_pages);
if (ret < 0)
goto out_unlock;
} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
state.global_error  = 0;
state.version   = version;
 
+   BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
/* mmap_batch_fn guarantees ret == 0 */
BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
&pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
struct page **pages = vma->vm_private_data;
int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+   int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
int rc;
 
if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
return;
 
-   rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+   rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
if (rc == 0)
free_xenballooned_pages(numpgs, pages);
else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..5063c5e 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
 #include 
 #include 
 
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
-   unsigned int domid)
-{
-   int rc;
-   struct xen_add_to_physmap_range xatp = {
-   .domid = DOMID_SELF,
-   .foreign_domid = domid,
-   .size = 1,
-   .space = XENMAPSPACE_gmfn_foreign,
-   };
-   xen_ulong_t idx = fgfn;
-   xen_pfn_t gpfn = lpfn;
-   int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
-   set_xen_guest_handle(xatp.idxs, &idx);
-   set_xen_guest_handle(xatp.gpfns, &gpfn);
-   set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+xen_gfn_fn_t fn, void *data)
+{
+   unsigned long xen_pfn = 0;
+   struct page *page;
+   int i;
 
-   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-   return rc < 0 ? rc : err;
+   for (i = 0; i < nr_gfn; i++) {
+   if ((i % XEN_PFN_PER_PAGE) == 0) {
+   page = pages[i / XEN_PFN_PER_PAGE];
+   xen_pfn = page_to_xen_pfn(page);
+

[PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux

2015-09-07 Thread Julien Grall
Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

TODO list:
- Convert swiotlb to 64KB
- Convert xenfb to 64KB
- Support for multiple page ring support
- Support for 64KB in gnttdev
- Support of non-indirect grant with 64KB frontend
- It may be possible to move some common define between
netback/netfront and blkfront/blkback in an header

I've got most of the patches for the TODO items. I'm planning to send them as
a follow-up as it's not a requirement for a basic guests.

All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.

I know that Konrad as a test-suite for x86. Konrand, would it be possible to
give a run to for this series?

A branch based on the latest xentip/for-linus-4.3 can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vra...@citrix.com
Cc: konrad.w...@oracle.com
Cc: boris.ostrov...@oracle.com
Cc: wei.l...@citrix.com
Cc: roger@citrix.com

Status of each patch:

A: Reviewed-by - Acked-by
M: Patch modified in this series
m: Minor changes in this series (i.e renaming due to previous patches, typoes)
L: Missing Acked-by from a Linux maintainers (Boris, David, Konrad)
N: Missing Acked-by from a Netback maintainers (Ian or Wei)

Julien Grall (20):
A   net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
A   arm/xen: Drop pte_mfn and mfn_pte
A M L   xen: Add Xen specific page definition
A M xen/grant: Introduce helpers to split a page into grant
A   xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
A   block/xen-blkfront: Split blkif_queue_request in 2
A m block/xen-blkfront: Store a page rather a pfn in the grant structure
A   block/xen-blkfront: split get_grant in 2
A m L   xen/biomerge: Don't allow biovec's to be merged when Linux is not
  using 4KB pages
A   xen/xenbus: Use Xen page definition
A m L   tty/hvc: xen: Use xen page definition
  M L   xen/balloon: Don't rely on the page granularity is the same for Xen
  and Linux
A   xen/events: fifo: Make it running on 64KB granularity
A   xen/grant-table: Make it running on 64KB granularity
A m block/xen-blkfront: Make it running on 64KB page granularity
A   block/xen-blkback: Make it running on 64KB page granularity
A m net/xen-netfront: Make it running on 64KB page granularity
  m  N  net/xen-netback: Make it running on 64KB page granularity
A m xen/privcmd: Add support for Linux 64KB page granularity
A   arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h |  18 +-
 arch/arm/xen/enlighten.c|   6 +-
 arch/arm/xen/p2m.c  |   6 +-
 arch/x86/include/asm/xen/page.h |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  17 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c| 552 +++-
 drivers/net/xen-netback/common.h|  18 +-
 drivers/net/xen-netback/netback.c   | 163 +++
 drivers/net/xen-netfront.c  | 122 +---
 drivers/tty/hvc/hvc_xen.c   |   4 +-
 drivers/xen/balloon.c   |  59 +++-
 drivers/xen/biomerge.c  |   8 +
 drivers/xen/events/events_base.c|   2 +-
 drivers/xen/events/events_fifo.c|   2 +-
 drivers/xen/grant-table.c   |  32 ++-
 drivers/xen/privcmd.c   |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   3 +-
 drivers/xen/xlate_mmu.c | 124 +---
 include/xen/grant_table.h   |  51 
 include/xen/page.h  |  27 +-
 23 files changed, 855 insertions(+), 389 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure

2015-09-07 Thread Julien Grall
All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Roger, Stefano, I kept your Acked-by/Reviewed-by because the rebase was
minor. Let me know if you disagree.

Changes in v4:
- rebase after 7adf12b87f45a77d364464018fb8e9e1ac875152
"xen-blkfront: don't add indirect pages to list when
!feature_persistent"

Changes in v3:
- Use the correct indentation in get_grant. The current
indentation (i.e without this patch) was wrong because it was
using space rather than tabulation.
- Add Roger's acked and Stefano's reviewed
- s/mfn/gfn based on the new naming

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b11f084..556475d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
 
 struct grant {
grant_ref_t gref;
-   unsigned long pfn;
+   struct page *page;
struct list_head node;
 };
 
@@ -222,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
kfree(gnt_list_entry);
goto out_of_memory;
}
-   gnt_list_entry->pfn = page_to_pfn(granted_page);
+   gnt_list_entry->page = granted_page;
}
 
gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -237,7 +237,7 @@ out_of_memory:
 &info->grants, node) {
list_del(&gnt_list_entry->node);
if (info->feature_persistent)
-   __free_page(pfn_to_page(gnt_list_entry->pfn));
+   __free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
i--;
}
@@ -246,8 +246,8 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-   unsigned long pfn,
-   struct blkfront_info *info)
+  struct page *page,
+  struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
unsigned long buffer_gfn;
@@ -266,10 +266,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!pfn);
-   gnt_list_entry->pfn = pfn;
+   BUG_ON(!page);
+   gnt_list_entry->page = page;
}
-   buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+   buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
info->xbdev->otherend_id,
buffer_gfn, 0);
@@ -525,7 +525,7 @@ static int blkif_queue_rw_req(struct request *req)
 
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
-   unsigned long uninitialized_var(pfn);
+   struct page *uninitialized_var(page);
 
if (segments)
kunmap_atomic(segments);
@@ -542,15 +542,15 @@ static int blkif_queue_rw_req(struct request *req)
indirect_page = 
list_first_entry(&info->indirect_pages,
 struct page, 
lru);
list_del(&indirect_page->lru);
-   pfn = page_to_pfn(indirect_page);
+   page = indirect_page;
}
-   gnt_list_entry = get_grant(&gref_head, pfn, info);
+   gnt_list_entry = get_grant(&gref_head, page, info);
info->shadow[id].indirect_grants[n] = gnt_list_entry;
-   segments = 
kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+   segments = kmap_atomic(gnt_list_entry->page);
ring_req->u.indirect.indirect_grefs[n] = 
gnt_list_entry->gref;
}
 
-   gnt_list_entry = get_grant(&gref_head, 
page_to_pfn(sg_page(sg)), info);
+   gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
ref = gnt_list_entry->gref;
 
info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -561,7 +5

[PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2

2015-09-07 Thread Julien Grall
Currently, blkif_queue_request has 2 distinct execution path:
- Send a discard request
- Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall 
Reviewed-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Roger, if you really want if can drop the else clause in
blkif_queue_request, IHMO it's more clear here. Although I've kept
your Reviewed-by. Let me know if it's not fine.

Changes in v3:
- Fix errors reported by checkpatch.pl
- Add Roger's Reviewed-by

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 277 ---
 1 file changed, 153 insertions(+), 124 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 432e105..b11f084 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -395,13 +395,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t 
mode,
return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+   struct blkfront_info *info = req->rq_disk->private_data;
+   struct blkif_request *ring_req;
+   unsigned long id;
+
+   /* Fill out a communications ring structure. */
+   ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+   id = get_id_from_freelist(info);
+   info->shadow[id].request = req;
+
+   ring_req->operation = BLKIF_OP_DISCARD;
+   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+   ring_req->u.discard.id = id;
+   ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+   else
+   ring_req->u.discard.flag = 0;
+
+   info->ring.req_prod_pvt++;
+
+   /* Keep a private copy so we can reissue requests when recovering. */
+   info->shadow[id].req = *ring_req;
+
+   return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
struct blkfront_info *info = req->rq_disk->private_data;
struct blkif_request *ring_req;
@@ -421,9 +443,6 @@ static int blkif_queue_request(struct request *req)
struct scatterlist *sg;
int nseg, max_grefs;
 
-   if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-   return 1;
-
max_grefs = req->nr_phys_segments;
if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
/*
@@ -453,139 +472,131 @@ static int blkif_queue_request(struct request *req)
id = get_id_from_freelist(info);
info->shadow[id].request = req;
 
-   if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-   ring_req->operation = BLKIF_OP_DISCARD;
-   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-   ring_req->u.discard.id = id;
-   ring_req->u.discard.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
-   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-   else
-   ring_req->u.discard.flag = 0;
+   BUG_ON(info->max_indirect_segments == 0 &&
+  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+   BUG_ON(info->max_indirect_segments &&
+  req->nr_phys_segments > info->max_indirect_segments);
+   nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+   ring_req->u.rw.id = id;
+   if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+   /*
+* The indirect operation can only be a BLKIF_OP_READ or
+* BLKIF_OP_WRITE
+*/
+   BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+   ring_req->operation = BLKIF_OP_INDIRECT;
+   ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+   BLKIF_OP_WRITE : BLKIF_OP_READ;
+   ring_req->u.indirect.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
+   ring_req->u.indirect.handle = info->handle;
+   ring_req->u.indirect.nr_segments = nseg;
} else {
-   BUG_ON(info->max_indirect_segments == 0 &&
-  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-   BUG_ON(info->max_indirect_segments &&
-  req->nr_phys_segments > info->max_indirect_segments);
-   nseg = blk_r

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-07 Thread Dietmar Eggemann
On 04/09/15 00:51, Steve Muckle wrote:
> Hi Morten, Dietmar,
> 
> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
> ...
>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>> + * recent utilization of currently non-runnable tasks on a CPU. It 
>> represents
>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
> 
> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
> __update_load_avg(). If there is now an assumption that util_avg may be
> used directly as a capacity value, should it be changed to
> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
> always be or if they can be combined.

You're referring to the code line

2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;

in __update_load_avg()?

Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
load related.

LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
scale_load_down() are also NOPs so this area is probably 
worth a separate clean-up.
Beyond that, I'm not sure if the current functionality is
broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?

> 
>> + * capacity_orig is the cpu_capacity available at * the highest frequency
> 
> spurious *
> 
> thanks,
> Steve
> 

Fixed.

Thanks, 

-- Dietmar

-- >8 --

From: Dietmar Eggemann 
Date: Fri, 14 Aug 2015 17:23:13 +0100
Subject: [PATCH] sched/fair: Get rid of scaling utilization by capacity_orig

Utilization is currently scaled by capacity_orig, but since we now have
frequency and cpu invariant cfs_rq.avg.util_avg, frequency and cpu scaling
now happens as part of the utilization tracking itself.
So cfs_rq.avg.util_avg should no longer be scaled in cpu_util().

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Dietmar Eggemann 
Signed-off-by: Morten Rasmussen 
---
 kernel/sched/fair.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2074d45a67c2..a73ece2372f5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4824,33 +4824,39 @@ static int select_idle_sibling(struct task_struct *p, 
int target)
 done:
return target;
 }
+
 /*
  * cpu_util returns the amount of capacity of a CPU that is used by CFS
  * tasks. The unit of the return value must be the one of capacity so we can
  * compare the utilization with the capacity of the CPU that is available for
  * CFS task (ie cpu_capacity).
- * cfs.avg.util_avg is the sum of running time of runnable tasks on a
- * CPU. It represents the amount of utilization of a CPU in the range
- * [0..SCHED_LOAD_SCALE]. The utilization of a CPU can't be higher than the
- * full capacity of the CPU because it's about the running time on this CPU.
- * Nevertheless, cfs.avg.util_avg can be higher than SCHED_LOAD_SCALE
- * because of unfortunate rounding in util_avg or just
- * after migrating tasks until the average stabilizes with the new running
- * time. So we need to check that the utilization stays into the range
- * [0..cpu_capacity_orig] and cap if necessary.
- * Without capping the utilization, a group could be seen as overloaded (CPU0
- * utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
- * available capacity.
+ *
+ * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
+ * recent utilization of currently non-runnable tasks on a CPU. It represents
+ * the amount of utilization of a CPU in the range [0..capacity_orig] where
+ * capacity_orig is the cpu_capacity available at the highest frequency
+ * (arch_scale_freq_capacity()).
+ * The utilization of a CPU converges towards a sum equal to or less than the
+ * current capacity (capacity_curr <= capacity_orig) of the CPU because it is
+ * the running time on this CPU scaled by capacity_curr.
+ *
+ * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
+ * higher than capacity_orig because of unfortunate rounding in
+ * cfs.avg.util_avg or just after migrating tasks and new task wakeups until
+ * the average stabilizes with the new running time. We need to check that the
+ * utilization stays within the range of [0..capacity_orig] and cap it if
+ * necessary. Without utilization capping, a group could be seen as overloaded
+ * (CPU0 utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
+ * available capacity. We allow utilization to overshoot capacity_curr (but not
+ * capacity_orig) as it useful for predicting the capacity required after task
+ * migrations (scheduler-driven DVFS).
  */
 static int cpu_util(int cpu)
 {
unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg;
unsigned long capacity = capacity_orig_of(cpu);
 
-   if (util >= SCHED_LOAD_SCALE)
-   return capacity;
-
-   return (util * capacity) >> SCHED_LOAD_SHIFT;
+   return (util >=

[PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant

2015-09-07 Thread Julien Grall
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org

Changes in v4:
- Typoes
- Rename gnttab_one_grant into gnttab_for_one_grant
- Add Stefano and David's reviewed-by
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming

Changes in v3:
- Fix error reported by checkpatch.pl
- Typoes
- s/pfn/xen_pfn/ in gnttab_foreach_grant
- Drop the possibility to use less data. The complexity is moved
in netback which is the only user
- Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
- s/offset/start/ in gnttab_count_grant and update the
description of the parameter
- s/mfn/gfn base on the new terminologies
- Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
- Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
- Fix compilation on x86.

Changes in v2:
- Patch added
---
 arch/x86/include/asm/xen/page.h |  2 +-
 drivers/xen/grant-table.c   | 26 +
 include/xen/grant_table.h   | 42 +
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include 
 
 #include 
-#include 
+#include 
 #include 
 
 /* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..7b4e1cf 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned 
count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
+void gnttab_foreach_grant_in_range(struct page *page,
+  unsigned int offset,
+  unsigned int len,
+  xen_grant_fn_t fn,
+  void *data)
+{
+   unsigned int goffset;
+   unsigned int glen;
+   unsigned long xen_pfn;
+
+   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   goffset = xen_offset_in_page(offset);
+
+   xen_pfn = page_to_xen_pfn(page) + XEN_PFN_DOWN(offset);
+
+   while (len) {
+   glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+   fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+   goffset = 0;
+   xen_pfn++;
+   len -= glen;
+   }
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..05b5b08 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign 
*xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ * gfn: guest frame number
+ * offset: offset in the grant
+ * len: length of the data in the grant.
+ * data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+  unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+  unsigned int offset,
+  unsigned int len,
+  xen_grant_fn_t fn,
+  void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_for_one_grant(struct page *page, unsigned int offset,
+   unsigned len, xen_grant_fn_t fn,
+   void *data)
+{
+   /* The first request is 

[PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one

2015-09-07 Thread Julien Grall
Many PV drivers contain the idiom:

pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Rename gnttab_page_grant_foreign_access_ref into
gnttab_page_grant_foreign_access_ref_one
- Fix typo in the commit message
- s/mfn/gfn based on the new naming
- Add David and Stefano's reviewed-by

Changes in v2:
- Patch added
---
 include/xen/grant_table.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 05b5b08..e17a4b3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct 
gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+   grant_ref_t ref, domid_t domid,
+   struct page *page, int readonly)
+{
+   gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+   readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
   unsigned long pfn);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages

2015-09-07 Thread Julien Grall
On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.

While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.

When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.

Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

There is some ideas to check whether two biovec could be merged
(see [1]) but it's not critical and can be consider as a performance
improvement.

Changes in v4:
- Fix typoes in the subject
- Add Stefano's reviewed-by

Changes in v3:
- Update commit message
- s/mfn/bfn/ base on the new renaming
- Update TODO

Changes in v2:
- Remove the workaround and check if the Linux page granularity
is the same as Xen or not

[1] https://lkml.org/lkml/2015/7/17/418
---
 drivers/xen/biomerge.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
   const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
 
return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+   /*
+* XXX: Add support for merging bio_vec when using different page
+* size in Xen and Linux.
+*/
+   return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-09-07 Thread Julien Grall
The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall 
Acked-by: Wei Liu 

---
Cc: Ian Campbell 
Cc: net...@vger.kernel.org

Changes in v4:
- Add Wei's acked

Changes in v2:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+   if (skb_is_gso(skb)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+   else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+   }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6)
-   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-   }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 03/20] xen: Add Xen specific page definition

2015-09-07 Thread Julien Grall
The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify xen_page_to_gfn to use new Xen page definition.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Changes in v4:
- Typoes
- Rename xen_page_to_pfn to page_to_xen_pfn

Changes in v3:
- Fix errors reported by checkpatch.pl
- Rename pfn to xen_pfn in xen_pfn_to_page
- Add a comment that we assume PAGE_SIZE to be a multiple of
XEN_PAGE_SIZE
- s/MFN/GFN/ according to new naming
- Add Stefano's reviewed-by

Changes in v2:
- Add XEN_PFN_UP
- Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index 1daae48..96294ac 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include 
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE  (_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK  (~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)  ((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We assume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn)   \
+   ((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> 
PAGE_SHIFT)))
+#define page_to_xen_pfn(page)  \
+   (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)  (((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include 
 
+/* Return the GFN associated to the first 4KB of the page */
 static inline unsigned long xen_page_to_gfn(struct page *page)
 {
-   return pfn_to_gfn(page_to_pfn(page));
+   return pfn_to_gfn(page_to_xen_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 08/20] block/xen-blkfront: split get_grant in 2

2015-09-07 Thread Julien Grall
Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Changes in v4:
- Add Roger's acked-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- gnttab_page_grant_foreign_access_ref has been renamed to
gnttab_page_grant_foreign_access_ref_one
- Fix compilation by using get_indirect_grant rather than
get_grant (the changes was in a later patch...).
- Make grant_foreign_access static inline
- s/mfn/gfn/ based on the new naming

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 88 +---
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 556475d..4232cbd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,34 +245,77 @@ out_of_memory:
return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-  struct page *page,
-  struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
-   unsigned long buffer_gfn;
 
BUG_ON(list_empty(&info->grants));
gnt_list_entry = list_first_entry(&info->grants, struct grant,
- node);
+ node);
list_del(&gnt_list_entry->node);
 
-   if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
info->persistent_gnts_c--;
+
+   return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+   const struct blkfront_info *info)
+{
+   gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+info->xbdev->otherend_id,
+gnt_list_entry->page,
+0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+  unsigned long gfn,
+  struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
+
+   /* Assign a gref to this page */
+   gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+   BUG_ON(gnt_list_entry->gref == -ENOSPC);
+   if (info->feature_persistent)
+   grant_foreign_access(gnt_list_entry, info);
+   else {
+   /* Grant access to the GFN passed by the caller */
+   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+   info->xbdev->otherend_id,
+   gfn, 0);
}
 
+   return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+   struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
+   return gnt_list_entry;
+
/* Assign a gref to this page */
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!page);
-   gnt_list_entry->page = page;
+   struct page *indirect_page;
+
+   /* Fetch a pre-allocated page to use for indirect grefs */
+   BUG_ON(list_empty(&info->indirect_pages));
+   indirect_page = list_first_entry(&info->indirect_pages,
+struct page, lru);
+   list_del(&indirect_page->lru);
+   gnt_list_entry->page = indirect_page;
}
-   buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
-   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-   info->xbdev->otherend_id,
-   buffer_gfn, 0);
+   grant_foreign_access(gnt_list_entry, info);
+
return gnt_list_entry;
 }
 
@@ -525,32 +568,19 @@ static 

[PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte

2015-09-07 Thread Julien Grall
They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.

Rather than worrying how to handle the 64KB case, drop them.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Russell King 

Changes in v4:
- Add Stefano's reviewed

Changes in v3:
- Patch added
---
 arch/arm/include/asm/xen/page.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
 
 #define phys_to_machine_mapping_valid(pfn) (1)
 
-#define pte_mfnpte_pfn
-#define mfn_ptepfn_pte
-
 /* Xen machine address */
 typedef struct xmaddr {
phys_addr_t maddr;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: fix bug which lowmem size is limited to 760MB

2015-09-07 Thread Nicolas Pitre
On Mon, 7 Sep 2015, Arnd Bergmann wrote:

> Given how much more common 1GB hardware configurations are compared to 768MB
> configuration, we could however think about adding a VMSPLIT_3G_OPT option
> that x86 has (also VMSPLIT_2_75G on ARCH_TILE), to allow using the entire
> 1GB of lowmem without going all the way to VMSPLIT_2G. That option would
> also let us use the entire 768MB on the machines that Yongtaek Lee is
> interested in.

That's easy enough:

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0d1b717e1e..a63970f211 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1470,6 +1470,8 @@ choice
 
config VMSPLIT_3G
bool "3G/1G user/kernel split"
+   config VMSPLIT_3G_OPT
+   bool "3G/1G user/kernel split (for full 1G low memory)"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
config VMSPLIT_1G
@@ -1481,6 +1483,7 @@ config PAGE_OFFSET
default PHYS_OFFSET if !MMU
default 0x4000 if VMSPLIT_1G
default 0x8000 if VMSPLIT_2G
+   default 0xAF00 if VMSPLIT_3G_OPT
default 0xC000
 
 config NR_CPUS


That shifts the risk to user space though.  But if there is a regression 
there, it will manifest itself on all systems and not only with some 
particular hardware.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/5] ACPI: add in a bad_madt_entry() function to eventually replace the macro

2015-09-07 Thread Sudeep Holla

Hi Al,

On 19/08/15 23:07, Al Stone wrote:

I finally got a chance to try this series on Juno. Well it exposed a 
firmware bug in MADT table :)


[..]


 acpi_tbl_entry_handler handler,
@@ -245,6 +484,8 @@ acpi_parse_entries(char *id, unsigned long table_size,
table_end) {
 if (entry->type == entry_id
 && (!max_entries || count < max_entries)) {
+   if (bad_madt_entry(table_header, entry))
+   return -EINVAL;


Not sure if we can have the above check here unconditionally.
Currently I can see there are 2 other users of acpi_parse_entries i.e.
PCC and NUMA. So may be it can be made conditional or return success for
non-MADT tables from bad_madt_entry ?

Other than that, you can add for ARM64 specific parts:
Reviewed-and-tested-by: Sudeep Holla 

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] regmap updates for v4.3

2015-09-07 Thread Mark Brown
The following changes since commit 64291f7db5bd8150a74ad2036f1037e6a0428df2:

  Linux 4.2 (2015-08-30 11:34:09 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git 
tags/regmap-v4.3

for you to fetch changes up to 072502a67c9164625288cca17704808e6c06273f:

  Merge remote-tracking branches 'regmap/topic/lockdep' and 
'regmap/topic/seq-delay' into regmap-next (2015-09-04 17:22:10 +0100)


regmap: Changes for v4.3

This has been a busy release for regmap.  By far the biggest set of
changes here are those from Markus Pargmann which implement support for
block transfers in smbus devices.  This required quite a bit of
refactoring but leaves us better able to handle odd restrictions that
controllers may have and with better performance on smbus.

Other new features include:

 - Fix interactions with lockdep for nested regmaps (eg, when a device
   using regmap is connected to a bus where the bus controller has a
   separate regmap).  Lockdep's default class identification is too
   crude to work without help.
 - Support for must write bitfield operations, useful for operations
   which require writing a bit to trigger them from Kuniori Morimoto.
 - Support for delaying during register patch application from Nariman
   Poushin.
 - Support for overriding cache state via the debugfs implementation
   from Richard Fitzgerald.


Axel Lin (1):
  regmap: debugfs: Fix misuse of IS_ENABLED

Kuninori Morimoto (3):
  regmap: add force_write option on _regmap_update_bits()
  regmap: add regmap_write_bits()
  regmap: add regmap_fields_force_write()

Lars-Peter Clausen (1):
  regmap: Add better support for devices without readback support

Mark Brown (9):
  regmap: Silence warning on invalid zero length read
  Merge branches 'fix/raw', 'topic/core', 'topic/i2c', 'topic/raw' and 
'topic/doc' of git://git.kernel.org/.../broonie/regmap into regmap-smbus-block
  regmap: Support bulk reads for devices without raw formatting
  Merge branch 'topic/smbus-block' of 
git://git.kernel.org/.../broonie/regmap into regmap-core
  Merge remote-tracking branch 'regmap/fix/core' into regmap-linus
  Merge remote-tracking branch 'regmap/fix/raw' into regmap-linus
  Merge remote-tracking branch 'regmap/topic/core' into regmap-next
  Merge remote-tracking branches 'regmap/topic/debugfs' and 
'regmap/topic/force-update' into regmap-next
  Merge remote-tracking branches 'regmap/topic/lockdep' and 
'regmap/topic/seq-delay' into regmap-next

Markus Pargmann (11):
  regmap: Fix integertypes for register address and value
  regmap: Fix regmap_can_raw_write check
  regmap: regmap_raw_read return error on !bus->read
  regmap: Fix regmap_bulk_write for bus writes
  regmap: Split use_single_rw internally into use_single_read/write
  regmap: No multi_write support if bus->write does not exist
  regmap: Add missing comments about struct regmap_bus
  regmap: Introduce max_raw_read/write for regmap_bulk_read/write
  regmap: regmap max_raw_read/write getter functions
  regmap: Add raw_write/read checks for max_raw_write/read sizes
  regmap-i2c: Add smbus i2c block support

Nariman Poushin (2):
  regmap: Use reg_sequence for multi_reg_write / register_patch
  regmap: Apply optional delay in multi_reg_write/register_patch

Nicolas Boichat (4):
  mfd: vexpress: Add parentheses around bridge->ops->regmap_init call
  thermal: sti: Add parentheses around bridge->ops->regmap_init call
  regmap: Use different lockdep class for each regmap init call
  regmap: Move documentation to regmap.h

Richard Fitzgerald (2):
  debugfs: Export bool read/write functions
  regmap: debugfs: Allow writes to cache state settings

Sergey SENOZHATSKY (1):
  regmap: fix a NULL pointer dereference in __regmap_init

Stephen Boyd (1):
  regulator: core: Print at debug level on debugfs creation failure

Xiubo Li (1):
  regmap: fix typos in regmap.c

 drivers/base/regmap/internal.h   |  12 +-
 drivers/base/regmap/regcache.c   |   2 +-
 drivers/base/regmap/regmap-ac97.c|  41 ++--
 drivers/base/regmap/regmap-debugfs.c |  99 -
 drivers/base/regmap/regmap-i2c.c |  90 +---
 drivers/base/regmap/regmap-irq.c |   4 +-
 drivers/base/regmap/regmap-mmio.c|  52 ++---
 drivers/base/regmap/regmap-spi.c |  41 ++--
 drivers/base/regmap/regmap-spmi.c|  78 +++
 drivers/base/regmap/regmap.c | 368 +
 drivers/bus/vexpress-config.c|   2 +-
 drivers/gpu/drm/i2c/adv7511.c|   2 +-
 drivers/input/misc/drv260x.c |   6 +-
 drivers/input/misc/drv2665.c |   2 +-
 drivers/input/misc/drv2667.c |   4 +-
 drivers/mfd/arizona-core.c   |   2 +-
 drivers/mfd/twl6040.c   

Re: [PATCH 2/2] rcu: Fix up timeouts for forcing the quiescent state

2015-09-07 Thread Petr Mladek
On Fri 2015-09-04 16:49:46, Paul E. McKenney wrote:
> On Fri, Sep 04, 2015 at 02:11:30PM +0200, Petr Mladek wrote:
> > The deadline to force the quiescent state (jiffies_force_qs) is currently
> > updated only when the previous timeout passed. But the timeout used for
> > wait_event() is always the entire original timeout. This is strange.
> 
> They tell me that kthreads aren't supposed to every catch signals,
> hence the WARN_ON() in the early-exit case stray-signal case.

Yup, I have investigated this recently. All signals are really blocked
for kthreads by default. There are few threads that use signals but
they explicitly enable it by allow_signal().


> In the case where we were awakened with an explicit force-quiescent-state
> request, we do the scan, and then wait the full time for the next scan.
> So the point of the delay is to space out the scans, not to fit a
> pre-determined schedule.
> 
> The reason we get awakened with an explicit force-quiescent-state
> request is that a given CPU just got inundated with RCU callbacks
> or that rcutorture wants to hammer this code path.
> 
> So I am not seeing this as anything in need of fixing.
> 
> Am I missing something subtle here?

There is the commit 88d6df612cc3c99f5 ("rcu: Prevent spurious-wakeup
DoS attack on rcu_gp_kthread()"). It suggests that the spurious
wakeups are possible.

I would consider this patch as a fix/clean up of this Dos attack fix.
Huh, I forgot to mention it in the commit message.

To be honest, I personally do not know how to trigger the spurious
wakeup in the current state of the code. I am trying to convert
the kthread into the kthread worker API and there I got the spurious
wakeups but this is another story.

Thanks a lot for reviewing.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 2/2] ARM: imx: support suspend states on imx7D

2015-09-07 Thread Shawn Guo
On Fri, Jul 31, 2015 at 04:33:59PM -0500, Shenwei Wang wrote:
> IMX7D contains a new version of GPC IP block (GPCv2). It has two
> major functions: power management and wakeup source management.
> 
> GPCv2 provides low power mode control for Cortex-A7 and Cortex-M4
> domains. And it can support WAIT, STOP, and DSM(Deep Sleep Mode) modes.
> After configuring the GPCv2 module, the platform can enter into a
> selected mode either automatically triggered by ARM WFI instruction or
> manually by software. The system will exit the low power states
> by the predefined wakeup sources which are managed by the gpcv2
> irqchip driver.
> 
> This patch adds a new suspend driver to manage the power states on IMX7D.
> It currently supports "SUSPEND_STANDBY" and "SUSPEND_MEM" states.
> 
> Signed-off-by: Shenwei Wang 
> Signed-off-by: Anson Huang 

Please stop sending patches to my Linaro mailbox, and use
shawn...@kernel.org instead.  You should already get that if you ever
run ./scripts/get_maintainer.pl on the patch.  Also please always copy
ker...@pengutronix.de for i.MX platform patches like this.

> ---
>  arch/arm/mach-imx/Kconfig|   1 +
>  arch/arm/mach-imx/Makefile   |   2 +
>  arch/arm/mach-imx/common.h   |   4 +
>  arch/arm/mach-imx/pm-imx7.c  | 917 
> +++
>  arch/arm/mach-imx/suspend-imx7.S | 529 ++
>  5 files changed, 1453 insertions(+)

1453 lines addition to kernel only for i.MX7D suspend support.  Yes,
this is the way we support suspend on i.MX6, but that's enough, and
we have to stop this somewhere.  I would ask you to take Sudeep's
comment and adopt PSCI for i.MX7D power management.

Shawn

[1] https://lkml.org/lkml/2015/8/26/554
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


similar files amd vs radeon

2015-09-07 Thread Peter Senna Tschudin
I executed a clone detection tool* on drivers source code and I found
that there are similar files between drivers/gpu/drm/amd/ and
drivers/gpu/drm/radeon, but also inside each of theses folders.

Some examples:
drivers/gpu/drm/amd/amdgpu/dce_v11_0.c,drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
drivers/gpu/drm/amd/amdgpu/ci_dpm.c,drivers/gpu/drm/radeon/ci_dpm.c
drivers/gpu/drm/radeon/kv_dpm.c,drivers/gpu/drm/amd/amdgpu/kv_dpm.c

I use meld for seeing the differences and similarities. More results
from the tool at: http://pastebin.com/iX3fhifG (The number on the
first field is the number of probable cloned lines of code).

Should these files be consolidated? And if so how?

Thank you,

Peter

* https://github.com/petersenna/ccfinderx-core

-- 
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] 9p: trans_fd, bail out if recv fcall if missing

2015-09-07 Thread Dominique Martinet
req->rc is pre-allocated early on with p9_tag_alloc and shouldn't be missing

Signed-off-by: Dominique Martinet 
---
 net/9p/trans_fd.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

Feel free to adapt error code/message if you can think of something better.

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a270dcc..a6d89c0 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -356,13 +356,12 @@ static void p9_read_work(struct work_struct *work)
}
 
if (m->req->rc == NULL) {
-   m->req->rc = kmalloc(sizeof(struct p9_fcall) +
-   m->client->msize, GFP_NOFS);
-   if (!m->req->rc) {
-   m->req = NULL;
-   err = -ENOMEM;
-   goto error;
-   }
+   p9_debug(P9_DEBUG_ERROR,
+"No recv fcall for tag %d (req %p), 
disconnecting!\n",
+m->rc.tag, m->req);
+   m->req = NULL;
+   err = -EIO;
+   goto error;
}
m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-07 Thread David Laight
From: Christophe Leroy
> Sent: 07 September 2015 15:25
...
> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
> index 2ef50c6..05b3096 100644
> --- a/arch/powerpc/lib/copy_32.S
> +++ b/arch/powerpc/lib/copy_32.S
> @@ -172,7 +172,16 @@ _GLOBAL(memcpy)
>   mtctr   r0
>   beq 63f
>  53:
> - dcbzr11,r6
> + /*
> +  * During early init, cache might not be active yet, so dcbz cannot be
> +  * used. We put dcbt instead of dcbz. If cache is not active, it's just
> +  * like a not. If cache is active, at least it prefetchs the line to be
^^^ nop ??

David

> +  * overwritten.
> +  * Will be replaced by dcbz in machine_init()
> +  */
> +_GLOBAL(ppc32_memcpy_dcbz)
> + dcbtr11,r6
> +
>   COPY_16_BYTES
>  #if L1_CACHE_BYTES >= 32
>   COPY_16_BYTES
> --
> 2.1.0



Re: [PATCH v2 1/2] leds: leds-ipaq-micro: Use devm_led_classdev_register

2015-09-07 Thread Jacek Anaszewski

On 09/07/2015 04:13 PM, Muhammad Falak R Wani wrote:

Use of resource-managed function devm_led_classdev_register
instead of led_classdev_register is preferred, consequently
remove redundant function micro_leds_remove.

Signed-off-by: Muhammad Falak R Wani 
---
  drivers/leds/leds-ipaq-micro.c | 9 +
  1 file changed, 1 insertion(+), 8 deletions(-)


Merged, thanks.

--
Best Regards,
Jacek Anaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 2/4] irqchip: GICv3: set non-percpu irqs status with _IRQ_MOVE_PCNTXT

2015-09-07 Thread Thomas Gleixner
On Mon, 7 Sep 2015, Marc Zyngier wrote:
> On 07/09/15 14:24, Thomas Gleixner wrote:
> > The history of this flag is as follows:
> > 
> > On x86 interrupts can only be safely migrated while the interrupt is
> > handled.
> 
> Woa! That's creative! :-) I suppose this doesn't work very well with CPU
> hotplug though...

Go figure 
 
> So I wonder why we bother introducing the IRQ_MOVE_PCNTXT flag on ARM at
> all. Is that just because migration.c is only compiled when
> GENERIC_PENDING_IRQ is set?

Looks like. We can distangle that, if this code needs to be reusable.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] rcu: Show the real fqs_state

2015-09-07 Thread Petr Mladek
On Fri 2015-09-04 16:24:22, Paul E. McKenney wrote:
> On Fri, Sep 04, 2015 at 02:11:29PM +0200, Petr Mladek wrote:
> > The value of "fqs_state" in struct rcu_state is always RCU_GP_IDLE.
> > 
> > The real state is stored in a local variable in rcu_gp_kthread().
> > It is modified by rcu_gp_fqs() via parameter and return value.
> > But the actual value is never stored to rsp->fqs_state.
> > 
> > The result is that print_one_rcu_state() does not show the real
> > state.
> > 
> > This code has been added 3 years ago by the commit 4cdfc175c25c89ee
> > ("rcu: Move quiescent-state forcing into kthread"). I guess that it
> > was an overlook or optimization.
> > 
> > Anyway, the value seems to be manipulated only by the thread, except
> > for shoving the status. I do not see any risk in updating it directly
> > in the struct.
> > 
> > Signed-off-by: Petr Mladek 
> 
> Good catch, but how about the following fix instead?
> 
>   Thanx, Paul
> 
> 
> 
> rcu: Finish folding ->fqs_state into ->gp_state
> 
> Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing
> into kthread") started the process of folding the old ->fqs_state
> into ->gp_state, but did not complete it.  This situation does not
> cause any malfunction, but can result in extremely confusing trace
> output.  This commit completes this task of eliminating ->fqs_state
> in favor of ->gp_state.

It makes sense but it breaks dynticks handling in rcu_gp_fqs(), see
below.

> 
> Reported-by: Petr Mladek 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 69ab7ce2cf7b..04234936d897 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1949,16 +1949,15 @@ static bool rcu_gp_fqs_check_wake(struct rcu_state 
> *rsp, int *gfp)
>  /*
>   * Do one round of quiescent-state forcing.
>   */
> -static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
> +static void rcu_gp_fqs(struct rcu_state *rsp)
>  {
> - int fqs_state = fqs_state_in;
>   bool isidle = false;
>   unsigned long maxj;
>   struct rcu_node *rnp = rcu_get_root(rsp);
>  
>   WRITE_ONCE(rsp->gp_activity, jiffies);
>   rsp->n_force_qs++;
> - if (fqs_state == RCU_SAVE_DYNTICK) {
> + if (rsp->gp_state == RCU_SAVE_DYNTICK) {

This will never happen because rcu_gp_kthread() modifies rsp->gp_state
many times. The last value before calling rcu_gp_fqs() is
RCU_GP_DOING_FQS.

I think about passing this information via a separate bool.

[...]

> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d5f58e717c8b..9faad70a8246 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -417,12 +417,11 @@ struct rcu_data {
>   struct rcu_state *rsp;
>  };
>  
> -/* Values for fqs_state field in struct rcu_state. */
> +/* Values for gp_state field in struct rcu_state. */
>  #define RCU_GP_IDLE  0   /* No grace period in progress. */

This value seems to be used instead of the new RCU_GP_WAIT_INIT.

>  #define RCU_GP_INIT  1   /* Grace period being
>  #initialized. */

This value is unused.

>  #define RCU_SAVE_DYNTICK 2   /* Need to scan dyntick
>  #state. */

This one is not longer preserved when merged with the other state.

>  #define RCU_FORCE_QS 3   /* Need to force quiescent
>  #state. */

The meaning of this one is strange. If I get it correctly,
it is set after the state was forced. But the comment suggests
that it is before.

By other words, these states seems to get obsoleted by

/* Values for rcu_state structure's gp_flags field. */
#define RCU_GP_WAIT_INIT 0  /* Initial state. */
#define RCU_GP_WAIT_GPS  1  /* Wait for grace-period start. */
#define RCU_GP_DONE_GPS  2  /* Wait done for grace-period start. */
#define RCU_GP_WAIT_FQS  3  /* Wait for force-quiescent-state time. */
#define RCU_GP_DOING_FQS 4  /* Wait done for force-quiescent-state time. */
#define RCU_GP_CLEANUP   5  /* Grace-period cleanup started. */
#define RCU_GP_CLEANED   6  /* Grace-period cleanup complete. */


Please, find below your commit updated with my ideas:

+ used bool save_dyntick instead of RCU_SAVE_DYNTICK
  and RCU_FORCE_QS states
+ rename RCU_GP_WAIT_INIT -> RCU_GP_IDLE
+ remove all the obsolete states

I am sorry if I handled "Signed-off-by" flags a wrong way. It is
basically your patch with few small updates from me. I am not sure
what is the right process in this case. Feel free to use Reviewed-by
instead of Signed-off-by with my name.

Well, I guess that this is not the final state ;-)


>From 61a1bf6659f4f4c0c4021f185bc156f8c83f9ea5 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" 
Date: Fri, 4 Sep 2015 16:24:22 -0700
Subject: [PATCH] rcu: Finish folding ->fqs_state into ->gp_state

Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-sta

Re: [RFC PATCH 1/3] arm64: entry: Remove unnecessary calculation for S_SP in EL1h

2015-09-07 Thread Mark Rutland
On Fri, Sep 04, 2015 at 03:23:05PM +0100, Jungseok Lee wrote:
> Under EL1h, S_SP data is not seen in kernel_exit. Thus, x21 calculation
> is not needed in kernel_entry. Currently, S_SP information is vaild only
> when sp_el0 is used.

I don't think this is true. The generic BUG implementation will grab the
saved SP from the pt_regs, and with this change we'll report whatever
happened to be in x21 instead.

> Signed-off-by: Jungseok Lee 
> ---
>  arch/arm64/kernel/entry.S | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index e163518..d23ca0d 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -91,8 +91,6 @@
>   get_thread_info tsk // Ensure MDSCR_EL1.SS is clear,
>   ldr x19, [tsk, #TI_FLAGS]   // since we can unmask debug
>   disable_step_tsk x19, x20   // exceptions when scheduling.
> - .else
> - add x21, sp, #S_FRAME_SIZE
>   .endif
>   mrs x22, elr_el1
>   mrs x23, spsr_el1

Immediately after this we do:

stp lr, x21, [sp, #S_LR]

To store the LR and SP to the pt_regs which bug_handler would use.

Am I missing smoething?

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 2/4] irqchip: GICv3: set non-percpu irqs status with _IRQ_MOVE_PCNTXT

2015-09-07 Thread Marc Zyngier
Hi Thomas,

On 07/09/15 14:24, Thomas Gleixner wrote:
> On Mon, 7 Sep 2015, Marc Zyngier wrote:
>> On 06/09/15 06:56, Jiang Liu wrote:
>>> On 2015/9/6 12:23, Yang Yingliang wrote:
 Use irq_settings_set_move_pcntxt() helper irqs status with
 _IRQ_MOVE_PCNTXT. So that it can do set affinity when calling
 irq_set_affinity_locked().
>>> Hi Yingliang,
>>> We could only set _IRQ_MOVE_PCNTCT flag to enable migrating
>>> IRQ in process context if your hardware platform supports atomically
>>> change IRQ configuration. Not sure whether that's true for GICv3.
>>> If GICv3 doesn't support atomically change irq configuration, this
>>> change may cause trouble.
>>
>> I think it boils down to what exactly "process context" means here. If
>> this means "we do not need to mask the interrupt" while moving it, then
>> it should be fine (the GIC architecture guarantees that a pending
>> interrupt will be migrated).
>>
>> Is there any other requirement for this flag?
> 
> The history of this flag is as follows:
> 
> On x86 interrupts can only be safely migrated while the interrupt is
> handled.

Woa! That's creative! :-) I suppose this doesn't work very well with CPU
hotplug though...

> With the introduction of IRQ remapping this requirement
> changed. Remapped interrupts can be migrated in any context.
> 
> If you look at irq_set_affinity_locked()
> 
>if (irq_can_move_pcntxt(data) {
>   irq_do_set_affinity(data,...)
> chip->irq_set_affinity(data,...);
>} else {
>   irqd_set_move_pending(data);
>}
> 
> So if IRQ_MOVE_PCNTXT is not set, we handle the migration of the
> interrupt from next the interrupt. If it's set set_affinity() is
> called right away.

OK, that is now starting to make more sense.

> All architectures which do not select GENERIC_PENDING_IRQ are using
> the direct method.

Right. On ARM, only the direct method makes sense so far (we have no
constraint such as the one you describe above).

So I wonder why we bother introducing the IRQ_MOVE_PCNTXT flag on ARM at
all. Is that just because migration.c is only compiled when
GENERIC_PENDING_IRQ is set?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-07 Thread Michal Sojka
On Mon, Sep 07 2015, Christophe Leroy wrote:
> memcpy() uses instruction dcbz to speed up copy by not wasting time
> loading cache line with data that will be overwritten.
> Some platform like mpc52xx do no have cache active at startup and
> can therefore not use memcpy(). Allthough no part of the code
> explicitly uses memcpy(), GCC makes calls to it.
>
> This patch modifies memcpy() such that at startup, the 'dcbz'
> instruction is replaced by 'dcbt' which is harmless if cache is not
> enabled, and which helps a bit (allthough not as much as dcbz) if
> cache is already enabled.
>
> Once the initial MMU is setup, in machine_init() we patch memcpy()
> by replacing the temporary 'dcbt' by 'dcbz'
>
> Reported-by: Michal Sojka 
> Signed-off-by: Christophe Leroy 
> ---
> @Michal, can you please test it ?

Yes, it works.

Tested-by: Michal Sojka 

-Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] mtd: nand: jz4780: Add NAND and BCH drivers

2015-09-07 Thread Alex Smith
On 06/09/2015 21:38, Ezequiel Garcia wrote:
> On 27 Jul 02:50 PM, Alex Smith wrote:
>> Hi,
>>
>> This series adds support for the BCH controller and NAND devices on
>> the Ingenic JZ4780 SoC.
>>
>> Tested on the MIPS Creator Ci20 board. All dependencies are now in
>> mainline so it should be possible to compile test now.
>>
>> This version of the series has been rebased on 4.2-rc4, and also adds
>> an additional patch to fix an issue that was encountered in the
>> external Ci20 3.18 kernel branch.
>>
>> Review and feedback welcome.
>>
> 
> The NEMC driver seems to be upstream. Any chance you submit devicetree
> changes as well for Ci20 (so we can actually test this)?

Sure, can do. The pinctrl driver is not yet upstream (needs some work) which is 
why I didn't add the DT changes initially, but at least if you boot the board 
from the NAND then U-Boot should have left everything in a state usable by the 
kernel.

Thanks,
Alex

> 
>> Thanks,
>> Alex
>>
>> Alex Smith (3):
>>   mtd: nand: increase ready wait timeout and report timeouts
>>   dt-bindings: binding for jz4780-{nand,bch}
>>   mtd: nand: jz4780: driver for NAND devices on JZ4780 SoCs
>>
>>  .../bindings/mtd/ingenic,jz4780-nand.txt   |  57 
>>  drivers/mtd/nand/Kconfig   |   7 +
>>  drivers/mtd/nand/Makefile  |   1 +
>>  drivers/mtd/nand/jz4780_bch.c  | 354 +++
>>  drivers/mtd/nand/jz4780_bch.h  |  42 +++
>>  drivers/mtd/nand/jz4780_nand.c | 376 
>> +
>>  drivers/mtd/nand/nand_base.c   |  15 +-
>>  7 files changed, 849 insertions(+), 3 deletions(-)
>>  create mode 100644 
>> Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt
>>  create mode 100644 drivers/mtd/nand/jz4780_bch.c
>>  create mode 100644 drivers/mtd/nand/jz4780_bch.h
>>  create mode 100644 drivers/mtd/nand/jz4780_nand.c
>>
>> -- 
>> 2.4.6
>>
>>
>> __
>> Linux MTD discussion mailing list
>> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] leds: leds-ipaq-micro: Fix coding style issues

2015-09-07 Thread Jacek Anaszewski

Hi Muhammad,

On 09/07/2015 04:13 PM, Muhammad Falak R Wani wrote:

Spaces at the starting of a line are removed, indentation
using tab, instead of space. Also, warnings related to
line width of more than 80 characters is also taken care of.
Two warnings have been left alone to aid better readability.

Signed-off-by: Muhammad Falak R Wani 
---
  drivers/leds/leds-ipaq-micro.c | 38 +++---
  1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/leds/leds-ipaq-micro.c b/drivers/leds/leds-ipaq-micro.c
index 1206215..86716ea 100644
--- a/drivers/leds/leds-ipaq-micro.c
+++ b/drivers/leds/leds-ipaq-micro.c
@@ -16,9 +16,9 @@
  #define LED_YELLOW0x00
  #define LED_GREEN 0x01

-#define LED_EN  (1 << 4)/* LED ON/OFF 0:off, 1:on  
 */
-#define LED_AUTOSTOP(1 << 5)/* LED ON/OFF auto stop set 0:disable, 
1:enable */
-#define LED_ALWAYS  (1 << 6)/* LED Interrupt Mask 0:No mask, 
1:mask */
+#define LED_EN   (1 << 4) /* LED ON/OFF 0:off, 1:on*/
+#define LED_AUTOSTOP  (1 << 5) /* LED ON/OFF auto stop set 0:disable,1:enable*/
+#define LED_ALWAYS(1 << 6) /* LED Interrupt Mask 0:No mask, 1:mask*/


Please keep comments ending in the same column.



  static void micro_leds_brightness_set(struct led_classdev *led_cdev,
  enum led_brightness value)
@@ -27,14 +27,14 @@ static void micro_leds_brightness_set(struct led_classdev 
*led_cdev,
/*
 * In this message:
 * Byte 0 = LED color: 0 = yellow, 1 = green
-*  yellow LED is always ~30 blinks per minute
+*yellow LED is always ~30 blinks per minute
 * Byte 1 = duration (flags?) appears to be ignored
 * Byte 2 = green ontime in 1/10 sec (deciseconds)
-*  1 = 1/10 second
-*  0 = 256/10 second
+*1 = 1/10 second
+*0 = 256/10 second
 * Byte 3 = green offtime in 1/10 sec (deciseconds)
-*  1 = 1/10 second
-*  0 = 256/10 seconds
+*1 = 1/10 second
+*0 = 256/10 seconds
 */
struct ipaq_micro_msg msg = {
.id = MSG_NOTIFY_LED,
@@ -64,14 +64,14 @@ static int micro_leds_blink_set(struct led_classdev 
*led_cdev,
/*
 * In this message:
 * Byte 0 = LED color: 0 = yellow, 1 = green
-*  yellow LED is always ~30 blinks per minute
+*yellow LED is always ~30 blinks per minute
 * Byte 1 = duration (flags?) appears to be ignored
 * Byte 2 = green ontime in 1/10 sec (deciseconds)
-*  1 = 1/10 second
-*  0 = 256/10 second
+*1 = 1/10 second
+*0 = 256/10 second
 * Byte 3 = green offtime in 1/10 sec (deciseconds)
-*  1 = 1/10 second
-*  0 = 256/10 seconds
+*1 = 1/10 second
+*0 = 256/10 seconds
 */


This looks worse after applying the patch. Why actually did you change
it? AFAICS checkpatch.pl doesn't complain here.


struct ipaq_micro_msg msg = {
.id = MSG_NOTIFY_LED,
@@ -79,14 +79,14 @@ static int micro_leds_blink_set(struct led_classdev 
*led_cdev,
};

msg.tx_data[0] = LED_GREEN;
-if (*delay_on > IPAQ_LED_MAX_DUTY ||
+   if (*delay_on > IPAQ_LED_MAX_DUTY ||
*delay_off > IPAQ_LED_MAX_DUTY)
-return -EINVAL;
+   return -EINVAL;

-if (*delay_on == 0 && *delay_off == 0) {
-*delay_on = 100;
-*delay_off = 100;
-}
+   if (*delay_on == 0 && *delay_off == 0) {
+   *delay_on = 100;
+   *delay_off = 100;
+   }

msg.tx_data[1] = 0;
if (*delay_on >= IPAQ_LED_MAX_DUTY)




--
Best Regards,
Jacek Anaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/3] mtd: nand: jz4780: driver for NAND devices on JZ4780 SoCs

2015-09-07 Thread Alex Smith
Hi,

On 06/09/2015 22:21, Ezequiel Garcia wrote:
> On 27 Jul 03:21 PM, Alex Smith wrote:
>> Add a driver for NAND devices connected to the NEMC on JZ4780 SoCs, as
>> well as the hardware BCH controller. DMA is not currently implemented.
>>
>> While older 47xx SoCs also have a BCH controller, they are incompatible
>> with the one in the 4780 due to differing register/bit positions, which
>> would make implementing a common driver for them quite messy.
>>
> 
> If the difference is only in register/bit positions, a common driver
> might be fairly simple. See drivers/i2c/busses/i2c-mv64xxx.c,
> which supports two different register layouts.

I've just gone back and looked at the older SoCs and it doesn't seem as though 
this commit message really applies to the JZ4740, which is the only other 
Ingenic SoC currently supported upstream. The 4740 doesn't have a BCH 
controller at all and the NAND interface is fairly different. I think this 
driver could potentially be reused if support for the JZ4770 makes it upstream, 
for now though a separate driver is certainly needed for the 4780.


>> +return 0;
>> +}
>> +
>> +static const struct of_device_id jz4780_bch_dt_match[] = {
>> +{ .compatible = "ingenic,jz4780-bch" },
>> +{},
>> +};
>> +MODULE_DEVICE_TABLE(of, jz4780_bch_dt_match);
>> +
>> +static struct platform_driver jz4780_bch_driver = {
>> +.probe  = jz4780_bch_probe,
> 
> Why no remove?

Is it needed? Everything should be cleaned up due to the use of devm functions.


>> +static int jz4780_nand_init_chips(struct jz4780_nand *nand, struct device 
>> *dev)
>> +{
>> +struct jz4780_nand_chip *chip;
>> +const __be32 *prop;
>> +u64 addr, size;
>> +int i = 0;
>> +
>> +/*
>> + * Iterate over each bank assigned to this device and request resources.
>> + * The bank numbers may not be consecutive, but nand_scan_ident()
>> + * expects chip numbers to be, so fill out a consecutive array of chips
>> + * which map chip number to actual bank number.
>> + */
>> +while ((prop = of_get_address(dev->of_node, i, &size, NULL))) {
>> +chip = &nand->chips[i];
>> +chip->bank = of_read_number(prop, 1);
>> +
>> +jz4780_nemc_set_type(nand->dev, chip->bank,
>> + JZ4780_NEMC_BANK_NAND);
>> +
>> +addr = of_translate_address(dev->of_node, prop);
> 
> Are you sure you must translate the address yourself?
> Isn't this handled by the OF magic behing the ranges property
> in the NEMC DT node?

I think the reasoning behind doing this was because I already have to get the 
address property here in order to get the bank number out of it.

You're right though that I can just do "platform_get_resource(pdev, i)" and 
avoid doing the translation again, so I have changed it to do that.

I've fixed the rest of your comments as well.

Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/3] arm64: entry: Remove unnecessary calculation for S_SP in EL1h

2015-09-07 Thread James Morse
On 04/09/15 15:23, Jungseok Lee wrote:
> Under EL1h, S_SP data is not seen in kernel_exit. Thus, x21 calculation
> is not needed in kernel_entry. Currently, S_SP information is vaild only
> when sp_el0 is used.
> 
> Signed-off-by: Jungseok Lee 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index e163518..d23ca0d 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -91,8 +91,6 @@
>   get_thread_info tsk // Ensure MDSCR_EL1.SS is clear,
>   ldr x19, [tsk, #TI_FLAGS]   // since we can unmask debug
>   disable_step_tsk x19, x20   // exceptions when scheduling.
> - .else
> - add x21, sp, #S_FRAME_SIZE
>   .endif
>   mrs x22, elr_el1
>   mrs x23, spsr_el1
> 

This sp value gets written to the struct pt_regs that is built on the
stack, and passed to the fault handlers, see 'el1_sp_pc' in kernel/entry.S,
which goes on to call do_sp_pc_abort() which prints this value out. (Other
fault handlers may make decisions based on this value).
It should be present and correct.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/3] arm64: Introduce IRQ stack

2015-09-07 Thread James Morse
On 04/09/15 15:23, Jungseok Lee wrote:
> Currently, kernel context and interrupts are handled using a single
> kernel stack navigated by sp_el1. This forces many systems to use
> 16KB stack, not 8KB one. Low memory platforms naturally suffer from
> both memory pressure and performance degradation simultaneously as
> VM page allocator falls into slowpath frequently.
> 
> This patch, thus, solves the problem as introducing a separate percpu
> IRQ stack to handle both hard and soft interrupts with two ground rules:
> 
>   - Utilize sp_el0 in EL1 context, which is not used currently
>   - Do *not* complicate current_thread_info calculation
> 
> struct thread_info can be tracked easily using sp_el0, not sp_el1 when
> this feature is enabled.
> 
> Signed-off-by: Jungseok Lee 
> ---
>  arch/arm64/Kconfig.debug | 10 ++
>  arch/arm64/include/asm/irq.h |  8 ++
>  arch/arm64/include/asm/thread_info.h | 11 ++
>  arch/arm64/kernel/asm-offsets.c  |  8 ++
>  arch/arm64/kernel/entry.S| 83 +++-
>  arch/arm64/kernel/head.S |  7 ++
>  arch/arm64/kernel/irq.c  | 18 
>  7 files changed, 142 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
> index d6285ef..e16d91f 100644
> --- a/arch/arm64/Kconfig.debug
> +++ b/arch/arm64/Kconfig.debug
> @@ -18,6 +18,16 @@ config ARM64_PTDUMP
> kernel.
> If in doubt, say "N"
>  
> +config IRQ_STACK
> + bool "Use separate kernel stack when handling interrupts"
> + depends on ARM64_4K_PAGES
> + help
> +   Say Y here if you want to use separate kernel stack to handle both
> +   hard and soft interrupts. As reduceing memory footprint regarding
> +   kernel stack, it benefits low memory platforms.
> +
> +   If in doubt, say N.
> +

I don't think it is necessary to have a debug-only Kconfig option for this.
Reducing memory use is good for everyone!

This would let you get rid of all the #ifdefs


>  config STRICT_DEVMEM
>   bool "Filter access to /dev/mem"
>   depends on MMU
> diff --git a/arch/arm64/include/asm/thread_info.h 
> b/arch/arm64/include/asm/thread_info.h
> index dcd06d1..5345a67 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -71,11 +71,22 @@ register unsigned long current_stack_pointer asm ("sp");
>   */
>  static inline struct thread_info *current_thread_info(void) 
> __attribute_const__;
>  
> +#ifndef CONFIG_IRQ_STACK
>  static inline struct thread_info *current_thread_info(void)
>  {
>   return (struct thread_info *)
>   (current_stack_pointer & ~(THREAD_SIZE - 1));
>  }
> +#else
> +static inline struct thread_info *current_thread_info(void)
> +{
> + unsigned long sp_el0;
> +
> + asm volatile("mrs %0, sp_el0" : "=r" (sp_el0));
> +
> + return (struct thread_info *)(sp_el0 & ~(THREAD_SIZE - 1));
> +}
> +#endif

Because sp_el0 is only used as a stack value to find struct thread_info,
you could just store the struct thread_info pointer in sp_el0, and save the
masking on each read of the value.


>  
>  #define thread_saved_pc(tsk) \
>   ((unsigned long)(tsk->thread.cpu_context.pc))
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index d23ca0d..f1fdfa9 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -88,7 +88,11 @@
>  
>   .if \el == 0
>   mrs x21, sp_el0
> +#ifndef CONFIG_IRQ_STACK
>   get_thread_info tsk // Ensure MDSCR_EL1.SS is clear,
> +#else
> + get_thread_info \el, tsk
> +#endif
>   ldr x19, [tsk, #TI_FLAGS]   // since we can unmask debug
>   disable_step_tsk x19, x20   // exceptions when scheduling.
>   .endif
> @@ -168,11 +172,56 @@
>   eret// return to kernel
>   .endm
>  
> +#ifndef CONFIG_IRQ_STACK
>   .macro  get_thread_info, rd
>   mov \rd, sp
> - and \rd, \rd, #~(THREAD_SIZE - 1)   // top of stack
> + and \rd, \rd, #~(THREAD_SIZE - 1)   // bottom of stack
> + .endm
> +#else
> + .macro  get_thread_info, el, rd
> + .if \el == 0
> + mov \rd, sp
> + .else
> + mrs \rd, sp_el0
> + .endif
> + and \rd, \rd, #~(THREAD_SIZE - 1)   // bottom of thread stack
> + .endm
> +
> + .macro  get_irq_stack
> + get_thread_info 1, tsk
> + ldr w22, [tsk, #TI_CPU]
> + adr_l   x21, irq_stacks
> + mov x23, #IRQ_STACK_SIZE
> + maddx21, x22, x23, x21
>   .endm

Using per_cpu variables would save the multiply here.
You then wouldn't need IRQ_STACK_SIZE.


>  
> + .macro  irq_stack_entry
> + get_irq_stack
> + ldr w23, [x21, #IRQ_COUNT]
> + cbnzw23, 1f
> + mov x23, sp
> + str x23, [x21, #IRQ_THREAD_SP]
> + ldr x23, [x21, #IRQ_STACK]
> + mov sp, x23
> + mov x23, xzr
> +1:   add w23, w2

similar files: fusbh200-hcd.c and fotg210-hcd.c

2015-09-07 Thread Peter Senna Tschudin
I executed a clone detection tool* on drivers source code and I found
that the files

drivers/usb/host/fusbh200-hcd.c

and

drivers/usb/host/fotg210-hcd.c

are very similar. The main difference between the two files are
replacing the string 'USBH20' by 'OTG21' and some white space fixes.
Some changes are being applied to only one of the files, such as the
commit f848a88d223cafa43cb318839a1171b498cf5ec8 that changes
fotg210-hcd.c but not fusbh200-hcd.c.

Should these files be consolidated? And if so how?

Thank you,

Peter

* https://github.com/petersenna/ccfinderx-core

-- 
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Dietmar Eggemann
On 07/09/15 13:42, Peter Zijlstra wrote:
> On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote:
> 
>> A quick run here gives:
>>
>> IVB-EP (2*20*2):
> 
> As noted by someone; that should be 2*10*2, for a total of 40 cpus in
> this machine.
> 
>>
>> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000
>>
>> Before:  After:
>> 5.484170711 ( +-  0.74% )5.590001145 ( +-  0.45% )
>>
>> Which is an almost 2% slowdown :/
>>
>> I've yet to look at what happens.
> 
> OK, so it appears this is link order nonsense. When I compared profiles
> between the series, the one function that had significant change was
> skb_release_data(), which doesn't make much sense.
> 
> If I do a 'make clean' in front of each build, I get a repeatable
> improvement with this patch set (although how much of that is due to the
> patches itself or just because of code movement is as yet undetermined).
> 
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.
> 

-- >8 --

From: Dietmar Eggemann 
Date: Mon, 7 Sep 2015 14:57:22 +0100
Subject: [PATCH] sched/fair: Defer calling scaling functions

Do not call the scaling functions in case time goes backwards or the
last update of the sched_avg structure has happened less than 1024ns
ago.

Signed-off-by: Dietmar Eggemann 
---
 kernel/sched/fair.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d6ca8d987a63..3445d2fb38f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2552,8 +2552,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
u64 delta, scaled_delta, periods;
u32 contrib;
unsigned int delta_w, scaled_delta_w, decayed = 0;
-   unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
-   unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+   unsigned long scale_freq, scale_cpu;
 
delta = now - sa->last_update_time;
/*
@@ -2574,6 +2573,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
 
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
if (delta + delta_w >= 1024) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/9] Input: goodix - use actual config length for each device type

2015-09-07 Thread Irina Tirdea
Each of the Goodix devices supported by this driver has a fixed size for
the configuration information registers. The size varies depending on the
device and is specified in the datasheet.

Use the proper configuration length as specified in the datasheet for
each device model, so we do not read more than the actual size of the
configuration registers.

Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 6ae28c5..7be6eab 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -36,6 +36,7 @@ struct goodix_ts_data {
unsigned int max_touch_num;
unsigned int int_trigger_type;
bool rotated_screen;
+   int cfg_len;
 };
 
 #define GOODIX_MAX_HEIGHT  4096
@@ -45,6 +46,8 @@ struct goodix_ts_data {
 #define GOODIX_MAX_CONTACTS10
 
 #define GOODIX_CONFIG_MAX_LENGTH   240
+#define GOODIX_CONFIG_911_LENGTH   186
+#define GOODIX_CONFIG_967_LENGTH   228
 
 /* Register defines */
 #define GOODIX_READ_COOR_ADDR  0x814E
@@ -115,6 +118,23 @@ static int goodix_i2c_read(struct i2c_client *client,
return ret < 0 ? ret : (ret != ARRAY_SIZE(msgs) ? -EIO : 0);
 }
 
+static int goodix_get_cfg_len(u16 id)
+{
+   switch (id) {
+   case 911:
+   case 9271:
+   case 9110:
+   case 927:
+   case 928:
+   return GOODIX_CONFIG_911_LENGTH;
+   case 912:
+   case 967:
+   return GOODIX_CONFIG_967_LENGTH;
+   default:
+   return GOODIX_CONFIG_MAX_LENGTH;
+   }
+}
+
 static int goodix_ts_read_input_report(struct goodix_ts_data *ts, u8 *data)
 {
int touch_num;
@@ -230,8 +250,7 @@ static void goodix_read_config(struct goodix_ts_data *ts)
int error;
 
error = goodix_i2c_read(ts->client, GOODIX_REG_CONFIG_DATA,
-   config,
-   GOODIX_CONFIG_MAX_LENGTH);
+   config, ts->cfg_len);
if (error) {
dev_warn(&ts->client->dev,
 "Error reading config (%d), using defaults\n",
@@ -398,6 +417,8 @@ static int goodix_ts_probe(struct i2c_client *client,
return error;
}
 
+   ts->cfg_len = goodix_get_cfg_len(id_info);
+
goodix_read_config(ts);
 
error = goodix_request_input_dev(ts, version_info, id_info);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 4/9] Input: goodix - write configuration data to device

2015-09-07 Thread Irina Tirdea
Goodix devices can be configured by writing custom data to the device at
init. The configuration data is read with request_firmware from
"goodix__cfg.bin", where  is the product id read from the device
(e.g.: goodix_911_cfg.bin for Goodix GT911, goodix_9271_cfg.bin for
GT9271).

The configuration information has a specific format described in the Goodix
datasheet. It includes X/Y resolution, maximum supported touch points,
interrupt flags, various sesitivity factors and settings for advanced
features (like gesture recognition).

Before writing the firmware, it is necessary to reset the device. If
the device ACPI/DT information does not declare gpio pins (needed for
reset), writing the firmware will not be available for these devices.

This is based on Goodix datasheets for GT911 and GT9271 and on Goodix
driver gt9xx.c for Android (publicly available in Android kernel
trees for various devices).

Signed-off-by: Octavian Purdila 
Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 225 +++--
 1 file changed, 192 insertions(+), 33 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 8edfc06..9cf16ff7 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -40,6 +41,9 @@ struct goodix_ts_data {
int cfg_len;
struct gpio_desc *gpiod_int;
struct gpio_desc *gpiod_rst;
+   u16 id;
+   u16 version;
+   char *cfg_name;
 };
 
 #define GOODIX_MAX_HEIGHT  4096
@@ -145,6 +149,39 @@ static int goodix_i2c_read(struct i2c_client *client,
return ret < 0 ? ret : (ret != ARRAY_SIZE(msgs) ? -EIO : 0);
 }
 
+/**
+ * goodix_i2c_write - write data to a register of the i2c slave device.
+ *
+ * @client: i2c device.
+ * @reg: the register to write to.
+ * @buf: raw data buffer to write.
+ * @len: length of the buffer to write
+ */
+static int goodix_i2c_write(struct i2c_client *client, u16 reg, const u8 *buf,
+   unsigned len)
+{
+   u8 *addr_buf;
+   struct i2c_msg msg;
+   int ret;
+
+   addr_buf = kmalloc(len + 2, GFP_KERNEL);
+   if (!addr_buf)
+   return -ENOMEM;
+
+   addr_buf[0] = reg >> 8;
+   addr_buf[1] = reg & 0xFF;
+   memcpy(&addr_buf[2], buf, len);
+
+   msg.flags = 0;
+   msg.addr = client->addr;
+   msg.buf = addr_buf;
+   msg.len = len + 2;
+
+   ret = i2c_transfer(client->adapter, &msg, 1);
+   kfree(addr_buf);
+   return ret < 0 ? ret : (ret != 1 ? -EIO : 0);
+}
+
 static int goodix_get_cfg_len(u16 id)
 {
switch (id) {
@@ -264,6 +301,73 @@ static irqreturn_t goodix_ts_irq_handler(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
+/**
+ * goodix_check_cfg - Checks if config fw is valid
+ *
+ * @ts: goodix_ts_data pointer
+ * @cfg: firmware config data
+ */
+static int goodix_check_cfg(struct goodix_ts_data *ts,
+   const struct firmware *cfg)
+{
+   int i, raw_cfg_len;
+   u8 check_sum = 0;
+
+   if (cfg->size > GOODIX_CONFIG_MAX_LENGTH) {
+   dev_err(&ts->client->dev,
+   "The length of the config fw is not correct");
+   return -EINVAL;
+   }
+
+   raw_cfg_len = cfg->size - 2;
+   for (i = 0; i < raw_cfg_len; i++)
+   check_sum += cfg->data[i];
+   check_sum = (~check_sum) + 1;
+   if (check_sum != cfg->data[raw_cfg_len]) {
+   dev_err(&ts->client->dev,
+   "The checksum of the config fw is not correct");
+   return -EINVAL;
+   }
+
+   if (cfg->data[raw_cfg_len + 1] != 1) {
+   dev_err(&ts->client->dev,
+   "Config fw must have Config_Fresh register set");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+/**
+ * goodix_send_cfg - Write fw config to device
+ *
+ * @ts: goodix_ts_data pointer
+ * @cfg: config firmware to write to device
+ */
+static int goodix_send_cfg(struct goodix_ts_data *ts,
+  const struct firmware *cfg)
+{
+   int error;
+
+   error = goodix_check_cfg(ts, cfg);
+   if (error)
+   return error;
+
+   error = goodix_i2c_write(ts->client, GOODIX_REG_CONFIG_DATA, cfg->data,
+cfg->size);
+   if (error) {
+   dev_err(&ts->client->dev, "Failed to write config data: %d",
+   error);
+   return error;
+   }
+   dev_dbg(&ts->client->dev, "Config sent successfully.");
+
+   /* Let the firmware reconfigure itself, so sleep for 10ms */
+   usleep_range(1, 11000);
+
+   return 0;
+}
+
 static int goodix_int_sync(struct goodix_ts_data *ts)
 {
int error;
@@ -406,30 +510,29 @@ static void goodix_read_config(struct goodix_ts_data *ts)
 /**
  * 

[PATCH v5 6/9] Input: goodix - use goodix_i2c_write_u8 instead of i2c_master_send

2015-09-07 Thread Irina Tirdea
Use goodix_i2c_write_u8 instead of i2c_master_send to simplify code.

Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 3d4a004..03f3968 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -295,16 +295,11 @@ static void goodix_process_events(struct goodix_ts_data 
*ts)
  */
 static irqreturn_t goodix_ts_irq_handler(int irq, void *dev_id)
 {
-   static const u8 end_cmd[] = {
-   GOODIX_READ_COOR_ADDR >> 8,
-   GOODIX_READ_COOR_ADDR & 0xff,
-   0
-   };
struct goodix_ts_data *ts = dev_id;
 
goodix_process_events(ts);
 
-   if (i2c_master_send(ts->client, end_cmd, sizeof(end_cmd)) < 0)
+   if (goodix_i2c_write_u8(ts->client, GOODIX_READ_COOR_ADDR, 0) < 0)
dev_err(&ts->client->dev, "I2C write end_cmd error\n");
 
return IRQ_HANDLED;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 8/9] Input: goodix - add sysfs interface to dump config

2015-09-07 Thread Irina Tirdea
Goodix devices have a configuration information register area that
specify various parameters for the device. The configuration information
has a specific format described in the Goodix datasheet. It includes X/Y
resolution, maximum supported touch points, interrupt flags, various
sesitivity factors and settings for advanced features (like gesture
recognition).

Export a sysfs interface that would allow reading the configuration
information. The default device configuration can be used as a starting
point for creating a valid configuration firmware used by the device at
init time to update its configuration.

This sysfs interface will be exported only if the gpio pins are properly
initialized from ACPI/DT.

Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 33a7b81..3179767 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -530,12 +530,35 @@ static ssize_t goodix_esd_timeout_store(struct device 
*dev,
return count;
 }
 
+static ssize_t goodix_dump_config_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct goodix_ts_data *ts = dev_get_drvdata(dev);
+   u8 config[GOODIX_CONFIG_MAX_LENGTH];
+   int error, count = 0, i;
+
+   error = goodix_i2c_read(ts->client, GOODIX_REG_CONFIG_DATA,
+   config, ts->cfg_len);
+   if (error) {
+   dev_warn(&ts->client->dev,
+"Error reading config (%d)\n",  error);
+   return error;
+   }
+
+   for (i = 0; i < ts->cfg_len; i++)
+   count += scnprintf(buf + count, PAGE_SIZE - count, "%02x ",
+  config[i]);
+   return count;
+}
+
 /* ESD timeout in ms. Default disabled (0). Recommended 2000 ms. */
 static DEVICE_ATTR(esd_timeout, S_IRUGO | S_IWUSR, goodix_esd_timeout_show,
   goodix_esd_timeout_store);
+static DEVICE_ATTR(dump_config, S_IRUGO, goodix_dump_config_show, NULL);
 
 static struct attribute *goodix_attrs[] = {
&dev_attr_esd_timeout.attr,
+   &dev_attr_dump_config.attr,
NULL
 };
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] arm64: kernel: Use a separate stack for irq interrupts.

2015-09-07 Thread James Morse
Having to handle interrupts on top of an existing kernel stack means the
kernel stack must be large enough to accomodate both the maximum kernel
usage, and the maximum irq handler usage. Switching to a different stack
when processing irqs allows us to make the stack size smaller.

Maximum kernel stack usage (running ltp and generating usb+ethernet
interrupts) was 7256 bytes. With this patch, the same workload gives
a maximum stack usage of 5816 bytes.

Signed-off-by: James Morse 
---
 arch/arm64/include/asm/irq.h | 12 +
 arch/arm64/include/asm/thread_info.h |  8 --
 arch/arm64/kernel/entry.S| 33 ---
 arch/arm64/kernel/irq.c  | 52 
 arch/arm64/kernel/smp.c  |  4 +++
 arch/arm64/kernel/stacktrace.c   |  4 ++-
 6 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index bbb251b14746..050d4196c736 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -2,14 +2,20 @@
 #define __ASM_IRQ_H
 
 #include 
+#include 
 
 #include 
+#include 
+
+DECLARE_PER_CPU(unsigned long, irq_sp);
 
 struct pt_regs;
 
 extern void migrate_irqs(void);
 extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
 
+extern int alloc_irq_stack(unsigned int cpu);
+
 static inline void acpi_irq_init(void)
 {
/*
@@ -21,4 +27,10 @@ static inline void acpi_irq_init(void)
 }
 #define acpi_irq_init acpi_irq_init
 
+static inline bool is_irq_stack(unsigned long sp)
+{
+   struct thread_info *ti = get_thread_info(sp);
+   return (get_thread_info(per_cpu(irq_sp, ti->cpu)) == ti);
+}
+
 #endif
diff --git a/arch/arm64/include/asm/thread_info.h 
b/arch/arm64/include/asm/thread_info.h
index dcd06d18a42a..b906254fc400 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -69,12 +69,16 @@ register unsigned long current_stack_pointer asm ("sp");
 /*
  * how to get the thread information struct from C
  */
+static inline struct thread_info *get_thread_info(unsigned long sp)
+{
+   return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
+}
+
 static inline struct thread_info *current_thread_info(void) 
__attribute_const__;
 
 static inline struct thread_info *current_thread_info(void)
 {
-   return (struct thread_info *)
-   (current_stack_pointer & ~(THREAD_SIZE - 1));
+   return get_thread_info(current_stack_pointer);
 }
 
 #define thread_saved_pc(tsk)   \
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e16351819fed..d42371f3f5a1 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -190,10 +190,37 @@ tsk   .reqx28 // current thread_info
  * Interrupt handling.
  */
.macro  irq_handler
-   adrpx1, handle_arch_irq
-   ldr x1, [x1, #:lo12:handle_arch_irq]
-   mov x0, sp
+   mrs x21, tpidr_el1
+   adr_l   x20, irq_sp
+   add x20, x20, x21
+
+   ldr x21, [x20]
+   mov x20, sp
+
+   mov x0, x21
+   mov x1, x20
+   bl  irq_copy_thread_info
+
+   /* test for recursive use of irq_sp */
+   cbz w0, 1f
+   mrs x30, elr_el1
+   mov sp, x21
+
+   /*
+* Create a fake stack frame to bump unwind_frame() onto the original
+* stack. This relies on x29 not being clobbered by kernel_entry().
+*/
+   pushx29, x30
+
+1: ldr_l   x1, handle_arch_irq
+   mov x0, x20
blr x1
+
+   mov x0, x20
+   mov x1, x21
+   bl  irq_copy_thread_info
+   mov sp, x20
+
.endm
 
.text
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 463fa2e7e34c..10b57a006da8 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -26,11 +26,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 unsigned long irq_err_count;
 
+DEFINE_PER_CPU(unsigned long, irq_sp) = 0;
+
 int arch_show_interrupts(struct seq_file *p, int prec)
 {
 #ifdef CONFIG_SMP
@@ -55,6 +58,10 @@ void __init init_IRQ(void)
irqchip_init();
if (!handle_arch_irq)
panic("No interrupt controller found.");
+
+   /* Allocate an irq stack for the boot cpu */
+   if (alloc_irq_stack(smp_processor_id()))
+   panic("Failed to allocate irq stack for boot cpu.");
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -117,3 +124,48 @@ void migrate_irqs(void)
local_irq_restore(flags);
 }
 #endif /* CONFIG_HOTPLUG_CPU */
+
+/* Allocate an irq_stack for a cpu that is about to be brought up. */
+int alloc_irq_stack(unsigned int cpu)
+{
+   struct page *irq_stack_page;
+   union thread_union *irq_stack;
+
+   /* reuse stack allocated previously */
+   if (per_cpu(irq_sp, cpu))
+   return 0;
+
+   irq_stack_page = alloc_kmem_pages(THREADINFO_GFP, THREAD_S

[PATCH v5 9/9] Input: goodix - add runtime power management support

2015-09-07 Thread Irina Tirdea
Add support for runtime power management so that the device is
turned off when not used (when the userspace holds no open
handles of the input device). The device uses autosuspend with a
default delay of 2 seconds, so the device will suspend if no
handles to it are open for 2 seconds.

The runtime management support is only available if the gpio pins
are properly initialized from ACPI/DT.

Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 57 +++---
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 3179767..34c0183 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -75,6 +76,8 @@ struct goodix_ts_data {
 #define MAX_CONTACTS_LOC   5
 #define TRIGGER_LOC6
 
+#define GOODIX_AUTOSUSPEND_DELAY_MS2000
+
 static const unsigned long goodix_irq_flags[] = {
IRQ_TYPE_EDGE_RISING,
IRQ_TYPE_EDGE_FALLING,
@@ -566,6 +569,27 @@ static const struct attribute_group goodix_attr_group = {
.attrs = goodix_attrs,
 };
 
+static int goodix_open(struct input_dev *input_dev)
+{
+   struct goodix_ts_data *ts = input_get_drvdata(input_dev);
+   int error;
+
+   error = pm_runtime_get_sync(&ts->client->dev);
+   if (error < 0) {
+   pm_runtime_put_noidle(&ts->client->dev);
+   return error;
+   }
+   return 0;
+}
+
+static void goodix_close(struct input_dev *input_dev)
+{
+   struct goodix_ts_data *ts = input_get_drvdata(input_dev);
+
+   pm_runtime_mark_last_busy(&ts->client->dev);
+   pm_runtime_put_autosuspend(&ts->client->dev);
+}
+
 /**
  * goodix_get_gpio_config - Get GPIO config from ACPI/DT
  *
@@ -751,6 +775,9 @@ static int goodix_request_input_dev(struct goodix_ts_data 
*ts)
ts->input_dev->id.vendor = 0x0416;
ts->input_dev->id.product = ts->id;
ts->input_dev->id.version = ts->version;
+   ts->input_dev->open = goodix_open;
+   ts->input_dev->close = goodix_close;
+   input_set_drvdata(ts->input_dev, ts);
 
error = input_register_device(ts->input_dev);
if (error) {
@@ -798,7 +825,8 @@ static int goodix_configure_dev(struct goodix_ts_data *ts)
  * @ts: our goodix_ts_data pointer
  *
  * request_firmware_wait callback that finishes
- * initialization of the device.
+ * initialization of the device. This will only be called
+ * when ts->gpiod_int and ts->gpiod_rst are properly initialized.
  */
 static void goodix_config_cb(const struct firmware *cfg, void *ctx)
 {
@@ -811,7 +839,21 @@ static void goodix_config_cb(const struct firmware *cfg, 
void *ctx)
if (error)
goto err_release_cfg;
}
-   goodix_configure_dev(ts);
+   error = goodix_configure_dev(ts);
+   if (error)
+   goto err_release_cfg;
+
+   error = pm_runtime_set_active(&ts->client->dev);
+   if (error) {
+   dev_err(&ts->client->dev, "failed to set active: %d\n", error);
+   goto err_release_cfg;
+   }
+   /* input_dev is a child of client->dev, ignore it for runtime pm */
+   pm_suspend_ignore_children(&ts->client->dev, true);
+   pm_runtime_enable(&ts->client->dev);
+   pm_runtime_set_autosuspend_delay(&ts->client->dev,
+GOODIX_AUTOSUSPEND_DELAY_MS);
+   pm_runtime_use_autosuspend(&ts->client->dev);
 
 err_release_cfg:
release_firmware(cfg);
@@ -915,8 +957,12 @@ static int goodix_ts_remove(struct i2c_client *client)
 {
struct goodix_ts_data *ts = i2c_get_clientdata(client);
 
-   if (ts->gpiod_int && ts->gpiod_rst)
+   if (ts->gpiod_int && ts->gpiod_rst) {
+   pm_runtime_disable(&client->dev);
+   pm_runtime_set_suspended(&client->dev);
+   pm_runtime_put_noidle(&client->dev);
sysfs_remove_group(&client->dev.kobj, &goodix_attr_group);
+   }
goodix_disable_esd(ts);
kfree(ts->cfg_name);
return 0;
@@ -990,7 +1036,10 @@ static int __maybe_unused goodix_resume(struct device 
*dev)
return goodix_enable_esd(ts);
 }
 
-static SIMPLE_DEV_PM_OPS(goodix_pm_ops, goodix_suspend, goodix_resume);
+static const struct dev_pm_ops goodix_pm_ops = {
+   SET_SYSTEM_SLEEP_PM_OPS(goodix_suspend, goodix_resume)
+   SET_RUNTIME_PM_OPS(goodix_suspend, goodix_resume, NULL)
+};
 
 static const struct i2c_device_id goodix_ts_id[] = {
{ "GDIX1001:00", 0 },
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 7/9] Input: goodix - add support for ESD

2015-09-07 Thread Irina Tirdea
Add ESD (Electrostatic Discharge) protection mechanism.

The driver enables ESD protection in HW and checks a register
to determine if ESD occurred. If ESD is signalled by the HW,
the driver will reset the device.

The ESD poll time (in ms) can be set through the sysfs property
esd_timeout. If it is set to 0, ESD protection is disabled.
Recommended value is 2000 ms. The initial value for ESD timeout
can be set through esd-recovery-timeout-ms ACPI/DT property.
If there is no such property defined, ESD protection is disabled.
For ACPI 5.1, the property can be specified using _DSD properties:
 Device (STAC)
 {
 Name (_HID, "GDIX1001")
 ...

 Name (_DSD,  Package ()
 {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package ()
 {
 Package (2) { "esd-recovery-timeout-ms", Package(1) { 2000 }},
 ...
 }
 })
 }

The ESD protection mechanism is only available if the gpio pins
are properly initialized from ACPI/DT.

This is based on Goodix datasheets for GT911 and GT9271 and on Goodix
driver gt9xx.c for Android (publicly available in Android kernel
trees for various devices).

Signed-off-by: Irina Tirdea 
---
 .../bindings/input/touchscreen/goodix.txt  |   6 +
 drivers/input/touchscreen/goodix.c | 174 -
 2 files changed, 173 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt 
b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index c0715f8..5891ad1 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -14,6 +14,12 @@ Required properties:
  - interrupts  : Interrupt to which the chip is connected
  - gpios   : GPIOS the chip is connected to: first one is the
  interrupt gpio and second one the reset gpio.
+Optional properties:
+
+ - esd-recovery-timeout-ms : ESD poll time (in milli seconds) for the driver to
+check if ESD occurred and in that case reset the
+device. ESD is disabled if this property is not set
+or is set to 0.
 
 Example:
 
diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 03f3968..33a7b81 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -45,8 +45,12 @@ struct goodix_ts_data {
u16 version;
char *cfg_name;
unsigned long irq_flags;
+   atomic_t esd_timeout;
+   struct delayed_work esd_work;
 };
 
+#define GOODIX_DEVICE_ESD_TIMEOUT_PROPERTY "esd-recovery-timeout-ms"
+
 #define GOODIX_MAX_HEIGHT  4096
 #define GOODIX_MAX_WIDTH   4096
 #define GOODIX_INT_TRIGGER 1
@@ -60,6 +64,8 @@ struct goodix_ts_data {
 /* Register defines */
 #define GOODIX_REG_COMMAND 0x8040
 #define GOODIX_CMD_SCREEN_OFF  0x05
+#define GOODIX_CMD_ESD_ENABLED 0xAA
+#define GOODIX_REG_ESD_CHECK   0x8041
 
 #define GOODIX_READ_COOR_ADDR  0x814E
 #define GOODIX_REG_CONFIG_DATA 0x8047
@@ -426,6 +432,117 @@ static int goodix_reset(struct goodix_ts_data *ts)
return goodix_int_sync(ts);
 }
 
+static void goodix_disable_esd(struct goodix_ts_data *ts)
+{
+   if (!atomic_read(&ts->esd_timeout))
+   return;
+   cancel_delayed_work_sync(&ts->esd_work);
+}
+
+static int goodix_enable_esd(struct goodix_ts_data *ts)
+{
+   int error, esd_timeout;
+
+   esd_timeout = atomic_read(&ts->esd_timeout);
+   if (!esd_timeout)
+   return 0;
+
+   error = goodix_i2c_write_u8(ts->client, GOODIX_REG_ESD_CHECK,
+   GOODIX_CMD_ESD_ENABLED);
+   if (error) {
+   dev_err(&ts->client->dev, "Failed to enable ESD: %d\n", error);
+   return error;
+   }
+
+   schedule_delayed_work(&ts->esd_work, round_jiffies_relative(
+ msecs_to_jiffies(esd_timeout)));
+   return 0;
+}
+
+static void goodix_esd_work(struct work_struct *work)
+{
+   struct goodix_ts_data *ts = container_of(work, struct goodix_ts_data,
+esd_work.work);
+   int retries = 3, error;
+   u8 esd_data[2];
+   const struct firmware *cfg = NULL;
+
+   while (--retries) {
+   error = goodix_i2c_read(ts->client, GOODIX_REG_COMMAND,
+   esd_data, sizeof(esd_data));
+   if (error)
+   continue;
+   if (esd_data[0] != GOODIX_CMD_ESD_ENABLED &&
+   esd_data[1] == GOODIX_CMD_ESD_ENABLED) {
+   /* feed the watchdog */
+   goodix_i2c_write_u8(ts->client,
+   GOODIX_REG_COMMAND,
+  

Re: [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report timeouts

2015-09-07 Thread Alex Smith
Hi Ezequiel,

Thanks for reviewing the series.

On 06/09/2015 21:37, Ezequiel Garcia wrote:
> On 27 Jul 02:50 PM, Alex Smith wrote:
>> If nand_wait_ready() times out, this is silently ignored, and its
>> caller will then proceed to read from/write to the chip before it is
>> ready. This can potentially result in corruption with no indication as
>> to why.
>>
>> While a 20ms timeout seems like it should be plenty enough, certain
>> behaviour can cause it to timeout much earlier than expected. The
>> situation which prompted this change was that CPU 0, which is
>> responsible for updating jiffies, was holding interrupts disabled
>> for a fairly long time while writing to the console during a printk,
>> causing several jiffies updates to be delayed. If CPU 1 happens to
>> enter the timeout loop in nand_wait_ready() just before CPU 0 re-
>> enables interrupts and updates jiffies, CPU 1 will immediately time
>> out when the delayed jiffies updates are made. The result of this is
>> that nand_wait_ready() actually waits less time than the NAND chip
>> would normally take to be ready, and then read_page() proceeds to
>> read out bad data from the chip.
>>
>> The situation described above may seem unlikely, but in fact it can be
>> reproduced almost every boot on the MIPS Creator Ci20.
>>
> 
> Not only unlikely but scary :) BTW, can't find SMP patches for Ci20,
> are you sure this behavior will apply once SMP is upstreamed?

Certainly made for fun debugging ;)

SMP support only exists in our 3.18 branch [1] at the moment, which was where 
this problem was encountered. Support should be upstreamed at some point, and I 
would guess that this behaviour could still happen then (even though it's a 
really obscure edge case that we were somehow managing to almost always hit on 
boot).

[1] https://github.com/MIPS/CI20_linux

> 
>> Debugging this was made more difficult by the misleading comment above
>> nand_wait_ready() stating "The timeout is caught later" - no timeout
>> was ever reported, leading me away from the real source of the problem.
>>
>> Therefore, this patch increases the timeout to 200ms. This should be
>> enough to cover cases where jiffies updates get delayed. Additionally,
>> add a pr_warn() when a timeout does occur so that it is easier to
>> pinpoint any problems in future caused by the chip not becoming ready.
>>
>> Signed-off-by: Alex Smith 
>> Cc: Zubair Lutfullah Kakakhel 
>> Cc: David Woodhouse 
>> Cc: Brian Norris 
>> Cc: linux-...@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> v3 -> v4:
>>  - New patch to fix issue encountered in external Ci20 3.18 kernel
>>branch which also applies upstream.
>> ---
>>  drivers/mtd/nand/nand_base.c | 15 ---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca8277a..a0dab3414f16 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info 
>> *mtd, unsigned long timeo)
>>  }
>>  }
>>  
>> -/* Wait for the ready pin, after a command. The timeout is caught later. */
>> +/**
>> + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
>> + * @mtd: MTD device structure
>> + *
>> + * Wait for the ready pin after a command, and warn if a timeout occurs.
>> + */
>>  void nand_wait_ready(struct mtd_info *mtd)
>>  {
>>  struct nand_chip *chip = mtd->priv;
>> -unsigned long timeo = jiffies + msecs_to_jiffies(20);
>> +unsigned long timeo = jiffies + msecs_to_jiffies(200);
>>  
>>  /* 400ms timeout */
>>  if (in_interrupt() || oops_in_progress)
>>  return panic_nand_wait_ready(mtd, 400);
>>  
>>  led_trigger_event(nand_led_trigger, LED_FULL);
>> +
> 
> Spurious change here.

Removed.

> 
>>  /* Wait until command is processed or timeout occurs */
>>  do {
>>  if (chip->dev_ready(mtd))
>> -break;
>> +goto out;
>>  touch_softlockup_watchdog();
>>  } while (time_before(jiffies, timeo));
>> +
>> +pr_warn("timeout while waiting for chip to become ready\n");
>> +out:
>>  led_trigger_event(nand_led_trigger, LED_OFF);
>>  }
> 
> This change looks reasonable, a timeout value should be large enough
> to be confident the operation has _really_ timed out. On non-error
> path, this change shouldn't make any difference.
> 
> And the warning is probably helpful too, so:
> 
> Reviewed-by: Ezequiel Garcia 

Great, thanks.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 5/9] Input: goodix - add power management support

2015-09-07 Thread Irina Tirdea
Implement suspend/resume for goodix driver.

The suspend and resume process uses the gpio pins.
If the device ACPI/DT information does not declare gpio pins,
suspend/resume will not be available for these devices.

This is based on Goodix datasheets for GT911 and GT9271
and on Goodix driver gt9xx.c for Android (publicly available
in Android kernel trees for various devices).

Signed-off-by: Octavian Purdila 
Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 94 --
 1 file changed, 89 insertions(+), 5 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 9cf16ff7..3d4a004 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -44,6 +44,7 @@ struct goodix_ts_data {
u16 id;
u16 version;
char *cfg_name;
+   unsigned long irq_flags;
 };
 
 #define GOODIX_MAX_HEIGHT  4096
@@ -57,6 +58,9 @@ struct goodix_ts_data {
 #define GOODIX_CONFIG_967_LENGTH   228
 
 /* Register defines */
+#define GOODIX_REG_COMMAND 0x8040
+#define GOODIX_CMD_SCREEN_OFF  0x05
+
 #define GOODIX_READ_COOR_ADDR  0x814E
 #define GOODIX_REG_CONFIG_DATA 0x8047
 #define GOODIX_REG_ID  0x8140
@@ -182,6 +186,11 @@ static int goodix_i2c_write(struct i2c_client *client, u16 
reg, const u8 *buf,
return ret < 0 ? ret : (ret != 1 ? -EIO : 0);
 }
 
+static int goodix_i2c_write_u8(struct i2c_client *client, u16 reg, u8 value)
+{
+   return goodix_i2c_write(client, reg, &value, sizeof(value));
+}
+
 static int goodix_get_cfg_len(u16 id)
 {
switch (id) {
@@ -301,6 +310,18 @@ static irqreturn_t goodix_ts_irq_handler(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
+static void goodix_free_irq(struct goodix_ts_data *ts)
+{
+   devm_free_irq(&ts->client->dev, ts->client->irq, ts);
+}
+
+static int goodix_request_irq(struct goodix_ts_data *ts)
+{
+   return devm_request_threaded_irq(&ts->client->dev, ts->client->irq,
+NULL, goodix_ts_irq_handler,
+ts->irq_flags, ts->client->name, ts);
+}
+
 /**
  * goodix_check_cfg - Checks if config fw is valid
  *
@@ -617,7 +638,6 @@ static int goodix_request_input_dev(struct goodix_ts_data 
*ts)
 static int goodix_configure_dev(struct goodix_ts_data *ts)
 {
int error;
-   unsigned long irq_flags;
 
goodix_read_config(ts);
 
@@ -625,10 +645,8 @@ static int goodix_configure_dev(struct goodix_ts_data *ts)
if (error)
return error;
 
-   irq_flags = goodix_irq_flags[ts->int_trigger_type] | IRQF_ONESHOT;
-   error = devm_request_threaded_irq(&ts->client->dev, ts->client->irq,
- NULL, goodix_ts_irq_handler,
- irq_flags, ts->client->name, ts);
+   ts->irq_flags = goodix_irq_flags[ts->int_trigger_type] | IRQF_ONESHOT;
+   error = goodix_request_irq(ts);
if (error) {
dev_err(&ts->client->dev, "request IRQ failed: %d\n", error);
return error;
@@ -732,6 +750,71 @@ static int goodix_ts_probe(struct i2c_client *client,
return goodix_configure_dev(ts);
 }
 
+static int __maybe_unused goodix_suspend(struct device *dev)
+{
+   struct i2c_client *client = to_i2c_client(dev);
+   struct goodix_ts_data *ts = i2c_get_clientdata(client);
+   int error;
+
+   /* We need gpio pins to suspend/resume */
+   if (!ts->gpiod_int || !ts->gpiod_rst)
+   return 0;
+
+   /* Free IRQ as IRQ pin is used as output in the suspend sequence */
+   goodix_free_irq(ts);
+   /* Output LOW on the INT pin for 5 ms */
+   error = gpiod_direction_output(ts->gpiod_int, 0);
+   if (error) {
+   goodix_request_irq(ts);
+   return error;
+   }
+   usleep_range(5000, 6000);
+
+   error = goodix_i2c_write_u8(ts->client, GOODIX_REG_COMMAND,
+   GOODIX_CMD_SCREEN_OFF);
+   if (error) {
+   dev_err(&ts->client->dev, "Screen off command failed\n");
+   gpiod_direction_input(ts->gpiod_int);
+   goodix_request_irq(ts);
+   return -EAGAIN;
+   }
+
+   /*
+* The datasheet specifies that the interval between sending screen-off
+* command and wake-up should be longer than 58 ms. To avoid waking up
+* sooner, delay 58ms here.
+*/
+   msleep(58);
+   return 0;
+}
+
+static int __maybe_unused goodix_resume(struct device *dev)
+{
+   struct i2c_client *client = to_i2c_client(dev);
+   struct goodix_ts_data *ts = i2c_get_clientdata(client);
+   int error;
+
+   if (!ts->gpiod_int || !ts->gpiod_rst)
+   return 0;
+
+   /*
+* Exit sleep mode by outputting HIGH level to INT pin
+* for 2ms~5ms.
+ 

[PATCH v5 1/9] Input: goodix - sort includes alphabetically

2015-09-07 Thread Irina Tirdea
Signed-off-by: Irina Tirdea 
---
 drivers/input/touchscreen/goodix.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index e36162b..6ae28c5 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -14,18 +14,18 @@
  * Software Foundation; version 2 of the License.
  */
 
-#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 #include 
 
 struct goodix_ts_data {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/9] Input: goodix - reset device at init

2015-09-07 Thread Irina Tirdea
After power on, it is recommended that the driver resets the device.
The reset procedure timing is described in the datasheet and is used
at device init (before writing device configuration) and
for power management. It is a sequence of setting the interrupt
and reset pins high/low at specific timing intervals. This procedure
also includes setting the slave address to the one specified in the
ACPI/device tree.

This is based on Goodix datasheets for GT911 and GT9271 and on Goodix
driver gt9xx.c for Android (publicly available in Android kernel
trees for various devices).

For reset the driver needs to control the interrupt and
reset gpio pins (configured through ACPI/device tree). For devices
that do not have the gpio pins declared, the functionality depending
on these pins will not be available, but the device can still be used
with basic functionality.

Signed-off-by: Octavian Purdila 
Signed-off-by: Irina Tirdea 
---
 .../bindings/input/touchscreen/goodix.txt  |   5 +
 drivers/input/touchscreen/goodix.c | 136 +
 2 files changed, 141 insertions(+)

diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt 
b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index 8ba98ee..c0715f8 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -12,6 +12,8 @@ Required properties:
  - reg : I2C address of the chip. Should be 0x5d or 0x14
  - interrupt-parent: Interrupt controller to which the chip is connected
  - interrupts  : Interrupt to which the chip is connected
+ - gpios   : GPIOS the chip is connected to: first one is the
+ interrupt gpio and second one the reset gpio.
 
 Example:
 
@@ -23,6 +25,9 @@ Example:
reg = <0x5d>;
interrupt-parent = <&gpio>;
interrupts = <0 0>;
+
+   gpios = <&gpio1 0 0>, /* INT */
+   <&gpio1 1 0>; /* RST */
};
 
/* ... */
diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 7be6eab..8edfc06 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -37,6 +38,8 @@ struct goodix_ts_data {
unsigned int int_trigger_type;
bool rotated_screen;
int cfg_len;
+   struct gpio_desc *gpiod_int;
+   struct gpio_desc *gpiod_rst;
 };
 
 #define GOODIX_MAX_HEIGHT  4096
@@ -89,6 +92,30 @@ static const struct dmi_system_id rotated_screen[] = {
{}
 };
 
+/*
+ * ACPI table specifies gpio pins in this order: first rst pin and
+ * then interrupt pin.
+ */
+static const struct dmi_system_id goodix_rst_pin_first[] = {
+#if defined(CONFIG_DMI) && defined(CONFIG_X86)
+   {
+   .ident = "WinBook TW100",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "WinBook"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "TW100")
+   }
+   },
+   {
+   .ident = "WinBook TW700",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "WinBook"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "TW700")
+   },
+   },
+#endif
+   {}
+};
+
 /**
  * goodix_i2c_read - read data from a register of the i2c slave device.
  *
@@ -237,6 +264,102 @@ static irqreturn_t goodix_ts_irq_handler(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
+static int goodix_int_sync(struct goodix_ts_data *ts)
+{
+   int error;
+
+   error = gpiod_direction_output(ts->gpiod_int, 0);
+   if (error)
+   return error;
+   msleep(50); /* T5: 50ms */
+
+   return gpiod_direction_input(ts->gpiod_int);
+}
+
+/**
+ * goodix_reset - Reset device during power on
+ *
+ * @ts: goodix_ts_data pointer
+ */
+static int goodix_reset(struct goodix_ts_data *ts)
+{
+   int error;
+
+   /* begin select I2C slave addr */
+   error = gpiod_direction_output(ts->gpiod_rst, 0);
+   if (error)
+   return error;
+   msleep(20); /* T2: > 10ms */
+   /* HIGH: 0x28/0x29, LOW: 0xBA/0xBB */
+   error = gpiod_direction_output(ts->gpiod_int, ts->client->addr == 0x14);
+   if (error)
+   return error;
+   usleep_range(100, 2000);/* T3: > 100us */
+   error = gpiod_direction_output(ts->gpiod_rst, 1);
+   if (error)
+   return error;
+   usleep_range(6000, 1);  /* T4: > 5ms */
+   /* end select I2C slave addr */
+   error = gpiod_direction_input(ts->gpiod_rst);
+   if (error)
+   return error;
+   return goodix_int_sync(ts);
+}
+
+/

<    1   2   3   4   5   6   >