Re: [PATCH v3] kvm: mmu: lazy collapse small sptes into large sptes

2015-04-13 Thread Wanpeng Li
On Mon, Apr 13, 2015 at 11:06:25PM -0700, Andres Lagar-Cavilla wrote:
>On Mon, Apr 13, 2015 at 10:25 PM, Wanpeng Li  
>wrote:
>> Hi Andres,
>> On Fri, Apr 10, 2015 at 11:05:26AM -0700, Andres Lagar-Cavilla wrote:
>> [...]
 +   if (sp->role.direct &&
 +   !kvm_is_reserved_pfn(pfn) &&
 +   PageTransCompound(pfn_to_page(pfn))) {
>>>
>>>Not your fault, but PageTransCompound is very unhappy naming, as it
>>>also yields true for PageHuge. Suggestion: document this check covers
>>>static hugetlbfs, or switch to PageCompound() check.
>>>
>>>A slightly bolder approach would be to refactor and reuse the nearly
>>>identical check done in transparent_hugepage_adjust, instead of
>>>open-coding here. In essence this code is asking for the same check,
>>>plus the out-of-band check for static hugepages.
>>
>> PageCompound() check still return true for both transparent huge pages
>> and hugetlbfs pages, !PageHuge(page) && PageTransHuge(page) check can
>> guarantee to catch the right transparent huge pages just as my old commit
>> e76d30e20be5fc ("mm/hwpoison: fix test for a transparent huge page").
>> I will send a patch to fix this.
>>
>Why would you want to "fix" it that way? Aren't static hugepages supported?
>
>(PageAnon is an inline check and much cheaper than !PageHuge(), which
>is an actual function call)
>
>Please consider my suggestion about refactoring the similar checks in
>transparent_hugepage_adjust.

Ok, will do. :)

Regards,
Wanpeng Li 

>
>Thanks a ton
>Andres
>>>
>>>
 +   drop_spte(kvm, sptep);
 +   sptep = rmap_get_first(*rmapp, &iter);
 +   need_tlb_flush = 1;
 +   } else
 +   sptep = rmap_get_next(&iter);
 +   }
 +
 +   return need_tlb_flush;
 +}
 +
 +void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 +   struct kvm_memory_slot *memslot)
 +{
 +   bool flush = false;
 +   unsigned long *rmapp;
 +   unsigned long last_index, index;
 +   gfn_t gfn_start, gfn_end;
 +
 +   spin_lock(&kvm->mmu_lock);
 +
 +   gfn_start = memslot->base_gfn;
 +   gfn_end = memslot->base_gfn + memslot->npages - 1;
 +
 +   if (gfn_start >= gfn_end)
 +   goto out;
>>>
>>>I don't understand the value of this check here. Are we looking for a
>>>broken memslot? Shouldn't this be a BUG_ON? Is this the place to care
>>>about these things? npages is capped to KVM_MEM_MAX_NR_PAGES, i.e.
>>>2^31. A 64 bit overflow would be caused by a gigantic gfn_start which
>>>would be trouble in many other ways.
>>>
>>>All this to say: please remove the above 5 lines and make code simpler.
>>
>> I will send a patch to cleanup it. Thanks for your review. :)
>>
>> Regards,
>> Wanpeng Li
>>
>
>
>
>-- 
>Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use -mcount-record for dynamic ftrace

2015-04-13 Thread Kalle Valo
Steven Rostedt  writes:

> I wonder who's responsible for
> https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/
>
> If they can add a i386/x86_64 build for gcc5 I'll be happy to download
> it and test this patch.

Found this:

"For any questions don't hesitate to contact me at tony (at) bake your
noodle . com"

https://www.kernel.org/pub/tools/crosstool/

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] iommu/mediatek: Add mt8173 IOMMU driver

2015-04-13 Thread Yong Wu
Hi Robin,
  Thanks very much for your confirm.
  About the v3 of the DMA-mapping, I have some question below.

On Fri, 2015-03-20 at 19:14 +, Robin Murphy wrote:
> On 18/03/15 11:22, Yong Wu wrote:
> > Hi Tomasz,
> > Thanks very much for your review. please help check below.
> > The others I will fix in the next version.
> >
> > Hi Robin,
> > There are some place I would like you can have a look and give me
> > some suggestion.
> >
> > On Wed, 2015-03-11 at 19:53 +0900, Tomasz Figa wrote:
> >> Hi,
> >>
> >> Please find next part of my comments inline.
> >>
> >>> +/*
> >>> + * pimudev is a global var for dma_alloc_coherent.
> >>> + * It is not accepatable, we will delete it if "domain_alloc" is enabled
> >>
> >> It looks like we indeed need to use dma_alloc_coherent() and we don't
> >> have a good way to pass the device pointer to domain_init callback.
> >>
> >> If you don't expect SoCs in the nearest future to have multiple M4U
> >> blocks, then I guess this global variable could stay, after changing
> >> the comment into an explanation why it's correct. Also it should be
> >> moved to the top of the file, below #include directives, as this is
> >> where usually global variables are located.
> > @Robin,
> >   We have merged this patch[0] in order to delete the global var, But
> > it seems that your patch of "arm64:IOMMU" isn't based on it right row.
> > it will build fail.
> 
> Yeah, I've not yet managed to try pulling in that series (much as I 
> approve of it), partly as I know doing so is going to lean towards a 
> not-insignificant rework and I'd rather avoid picking up more unmerged 
> dependencies to block getting _something_ in for arm64 (which we can 
> then improve).
> 
> >
> > [0]:http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011939.html
> >
[snip]
> 
> Calling arch_setup_dma_ops() from the driver looks plain wrong, 
> especially given that you apparently attach the IOMMU to itself - if you 
> want your own domain you should use iommu_dma_create_domain(). I admit 
> that still leaves you having to dance around a bit in order to tear down 
> the automatic domains for now, but hopefully we'll get the core code 
> sorted out sooner rather than later.
> >>> +
> >>> +   mtk_iommu_config_port(piommu, portid);
> >>> +
> >>> +   if (i == 0)
> >>> +   dev->archdata.dma_ops =
> >>> +   piommu->dev->archdata.dma_ops;
> >>
> >> Shouldn't this be set automatically by IOMMU or DMA mapping core?
> > @Robin,
> >   In the original "arm_iommu_attach_device" of arm/mm, it will call
> > set_dma_ops to add iommu_ops for each iommu device.
> > But iommu_dma_attach_device don't help this, so I have to add it here.
> > Could this be improved?
> 
> If you implemented a simple of_xlate callback so that the core code 
> handles the dma_ops as intended, I think the simplest cheat would be to 
> check the client device's domain, either on attachment or when they 
> start mapping/unmapping, and move them to your own domain if necessary. 
> I'm putting together a v3 of the DMA mapping series, so I'll have a look 
> to see if I can squeeze in a way to make that a bit less painful until 
> we solve it properly.
> 
> 
> Robin.
> 
  I have implemented a simple of_xlate, but I can’t get the standard
struct dma_map_ops “iommu_dma_ops” to assigned it to the client device.
So the v3 of dma mapping will improve this issue?  

  And Is the v3 of the DMA-mapping based on 4.0-rc1? because we
expect it could contain will’s io-pagetable.

  And when the v3 will be ready?
> >>
> >>> +   }
> >>> +   i++;
> >>> +   }
> >>> +
> >>> +   spin_unlock_irqrestore(&priv->portlock, flags);
> >>> +
> >>> +imudev:
> >>> +   return 0;
> >>> +}
> >>> +
> >>> +static void mtk_iommu_detach_device(struct iommu_domain *domain,
> >>> +   struct device *dev)
> >>> +{
> >>
> >> No hardware (de)configuration or clean-up necessary?
> > I will add it. Actually we design like this:If a device have attached to
> > iommu domain, it won't detach from it.
> >>
> >>> +}
> >>> +
> > [snip]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] dm-crypt: Adds support for wiping key when doing suspend/hibernation

2015-04-13 Thread Pavel Machek
Hi!

> > > > > So proper way is to wipe luks crypto keys *after* userspace processes
> > > > > are freezed.
> > > > 
> > > > I know you believe that I'm just not accepting that at face value.
> > > 
> > > If disks are synced before any DM suspend operation then we have higher
> > > chance of preventing data corruption.
> > 
> > disks are already synced as part of the DM suspend operation!
> > 
> 
> Yes, but part of hibernate operation is also sync call.

Yes. Maybe that was a mistake.

> > > I still think that correct order is only:
> > > 
> > > * freeze processes (which doing continous I/O)
> > > * fs & disk sync
> > > * DM suspend
> > > * wipe crypto keys
> > > * enter hibernate
> > 
> > I just don't think that extreme is _required_ to have a hibernate/resume
> > that incorporates dm-crypt key wiping.
> 
> Ok, and what other developers think?

If someone can fix freezer to work with LUKS stopped, that would be a 
good thing. Can you do it, Mike? Then we can look if it works well
enough for Pali.

But that might be too hard / impossible. And at that point, I think
Pali's patch is right thing to do.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hso: fix refcnt leak in recent patch.

2015-04-13 Thread Olivier Sobrie
Hello Neil,

On Tue, Apr 14, 2015 at 11:03:03AM +1000, NeilBrown wrote:
> On Tue, 14 Apr 2015 09:36:34 +1000 NeilBrown  wrote:
> 
> > 
> > 
> > Prior to
> > commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> > hso: fix crash when device disappears while serial port is open
> > 
> > hso_serial_open would always kref_get(&serial->parent->ref) before
> > returning zero.
> > Since that commit, it only calls kref_get when returning 0 if
> > serial->port.count was zero.
> > 
> > This results in calls to
> >kref_put(&serial->parent->ref, hso_serial_ref_free);
> > 
> > after hso_serial_ref_free has been called, which dereferences a freed
> > pointer.
> > 
> > This patch adds the missing kref_get().
> > 
> > Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> > Cc: sta...@vger.kernel.org (v4.0)
> > Cc: Olivier Sobrie 
> > Signed-off-by: NeilBrown 
> > 
> > diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
> > index 75befc1bd816..6848fc903340 100644
> > --- a/drivers/net/usb/hso.c
> > +++ b/drivers/net/usb/hso.c
> > @@ -1299,6 +1299,7 @@ static int hso_serial_open(struct tty_struct *tty, 
> > struct file *filp)
> > }
> > } else {
> > D1("Port was already open");
> > +   kref_get(&serial->parent->ref);
> > }
> >  
> > usb_autopm_put_interface(serial->parent->interface);
> 
> 
> Sorry - that was wrong.
> I'm getting crashes which strongly suggest the kref_put is being called extra
> times, but I misunderstood the code and was hasty.
> 
> Maybe this instead?

Indeed, if I undestand correctly the code in tty_io.c, cleanup() method
is also called when the open fails while kref_get is only done if the
open succeeds. Sorry for that mess.
I assume you get that crash when hso_start_serial_device() returns
an error?

At first sight, the patch below looks good to me.
I'll test it in the next days.

Thank you,

Olivier

> 
> Thanks,
> NeilBrown
> 
> From: NeilBrown 
> Date: Tue, 14 Apr 2015 09:33:03 +1000
> Subject: [PATCH] hso: fix refcnt leak in recent patch.
> 
> Prior to
> commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> hso: fix crash when device disappears while serial port is open
> 
> a kref_get on serial->parent->ref would be taken on each open,
> and it would be kref_put on each close.
> 
> Now the kref_put happens when the tty_struct is finally put (via
> the 'cleanup') providing tty->driver_data has been set.
> So the kref_get must be called exact once when tty->driver_data is
> set.
> 
> With the current code, if the first open fails the kref_get() is never
> called, but the kref_put() is called, leaving to a crash.
> 
> So change the kref_get call to happen exactly when ->driver_data is
> changed from NULL to non-NULL.
> 
> Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> Cc: sta...@vger.kernel.org (v4.0)
> Cc: Olivier Sobrie 
> Signed-off-by: NeilBrown 
> 
> diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
> index 75befc1bd816..17fd3820263a 100644
> --- a/drivers/net/usb/hso.c
> +++ b/drivers/net/usb/hso.c
> @@ -1278,6 +1278,8 @@ static int hso_serial_open(struct tty_struct *tty, 
> struct file *filp)
>   D1("Opening %d", serial->minor);
>  
>   /* setup */
> + if (tty->driver_data == NULL)
> + kref_get(&serial->parent->ref);
>   tty->driver_data = serial;
>   tty_port_tty_set(&serial->port, tty);
>  
> @@ -1294,8 +1296,6 @@ static int hso_serial_open(struct tty_struct *tty, 
> struct file *filp)
>   if (result) {
>   hso_stop_serial_device(serial->parent);
>   serial->port.count--;
> - } else {
> - kref_get(&serial->parent->ref);
>   }
>   } else {
>   D1("Port was already open");



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] spi: bcm2835: Add GPIOLIB dependency

2015-04-13 Thread Martin Sperl

> On 14.04.2015, at 06:24, Guenter Roeck  wrote:
> 
> by adding the now mandatory GPIOLIB dependency.
> 
Note this shows up during automated randconfig testing.

> Fixes: a30a555d7435 ("spi: bcm2835: transform native-cs to gpio-cs
>   on first spi_setup")
> Cc: Martin Sperl 
> Signed-off-by: Guenter Roeck 

I have had an identical patch in the pipeline as well, but
I was still testing it over the night.

Signed-off-by: Martin Sperl 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2015-04-13 Thread punnaiah choudary kalluri
Hi Paul Bolle

On Tue, Apr 14, 2015 at 12:19 AM, Paul Bolle  wrote:
> On Mon, 2015-04-13 at 21:41 +0530, Punnaiah Choudary Kalluri wrote:
>> --- a/drivers/memory/Kconfig
>> +++ b/drivers/memory/Kconfig
>
>> +config PL353_SMC
>> + bool "ARM PL353 Static Memory Controller (SMC) driver"
>> + depends on ARM
>> + help
>> +   This driver is for the ARM PL353 Static Memory Controller (SMC)
>> +   module.
>
> This adds a bool symbol.
>
>> --- a/drivers/memory/Makefile
>> +++ b/drivers/memory/Makefile
>
>> +obj-$(CONFIG_PL353_SMC)  += pl353-smc.o
>
> Which means pl353-smc.o can never be part of a module, right?
>
> (If that's not right you can stop reading here.)
>
>> --- /dev/null
>> +++ b/drivers/memory/pl353-smc.c
>
>> + * This program is free software: you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation, either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>
> This states the license is GPL v2 or later.
>
>> +#include 
>
> I wonder whether this include is needed, since this is built-in only
> code.
>
>> +MODULE_DEVICE_TABLE(of, pl353_smc_of_match);
>
> According to include/linux/module.h this will be preprocessed away for
> built-in code.
>
>> +static struct platform_driver pl353_smc_driver = {
>> + .probe  = pl353_smc_probe,
>> + .remove = pl353_smc_remove,
>> + .driver = {
>> + .name   = "pl353-smc",
>> + .owner  = THIS_MODULE,
>
> THIS_MODULE will be equivalent to NULL for built-in code, according to
> include/linux/export.h.
>
>> + .pm = &pl353_smc_dev_pm_ops,
>> + .of_match_table = pl353_smc_of_match,
>> + },
>> +};
>
>> +module_platform_driver(pl353_smc_driver);
>
> Speaking from memory: for built-in only code this is equivalent to
> having a wrapper that only does
> register_platform_driver(&pl353_smc_driver);
>
> and mark that wrapper with device_initcall().
>
>> +MODULE_AUTHOR("Xilinx, Inc.");
>> +MODULE_DESCRIPTION("ARM PL353 SMC Driver");
>> +MODULE_LICENSE("GPL v2");
>
> For built-in only code these macros will be effectively preprocessed
> away.
>
> (Would you make PL353_SMC a tristate symbol then you should note that
> according to include/linux/module.h "GPL" is the license ident that
> matches the license stated in the comment at the top of this file.)

Ok. I will make PL353_SMC as tristate symbol and also i will change the
licence ident to "GPL".

Thanks for the review.

I will wait some time for further functional comments on this driver before
sending the next version of patches

Thanks,
Punnaiah
>
> Thanks,
>
>
> Paul Bolle
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v6 4/6] regulator: axp20x: add support for AXP22X regulators

2015-04-13 Thread Lee Jones
On Fri, 10 Apr 2015, Mark Brown wrote:

> On Fri, Apr 10, 2015 at 12:09:04PM +0800, Chen-Yu Tsai wrote:
> 
> > This patch depends on the previous patch "regulator: axp20x: prepare
> > support for multiple AXP chip families" and the mfd header from the
> > first patch "mfd: axp20x: add AXP22x PMIC support".
> 
> > Could we merge both regulator patches through the mfd tree, with the
> > other patches in the series? There are no other external dependencies.
> 
> Yes, of course we can - that's the whole point of me sending a
> Reviewed-by!  Half of what I'm trying to do with that is to cut down on
> the number of reposts of MFD series I get sent :(

Welcome to my world!

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 06/11] powerpc/perf: Implement get_cpu_str()

2015-04-13 Thread Sukadev Bhattiprolu
With a file ~/.cache/pmu-events/004d0100-core.json describing Power8
PMU events we would need to run:

perf stat \
--events-file ~/.cache/pmu-events/004d0100-core.json \
-e pm_cyc sleep 1

With this get_cpu_str(), on Powerpc, we can skip the --events-file option
and run:

perf stat -e pm_cyc sleep 1

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0.

Changelog[v3]:
[Tobias Klauser]: Fix some changelog damage to patch.

Changelog[v2]:
[Michael Ellerman]: Use PVR instead of AUXV variables
---
 tools/perf/arch/powerpc/util/header.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/tools/perf/arch/powerpc/util/header.c 
b/tools/perf/arch/powerpc/util/header.c
index 6c1b8a7..306bf35 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -6,6 +6,7 @@
 
 #include "../../util/header.h"
 #include "../../util/util.h"
+#include "../../util/jevents.h"
 
 #define mfspr(rn)   ({unsigned long rval; \
 asm volatile("mfspr %0," __stringify(rn) \
@@ -32,3 +33,14 @@ get_cpuid(char *buffer, size_t sz)
}
return -1;
 }
+
+char *
+get_cpu_str(void)
+{
+   char *bufp;
+
+   if (asprintf(&bufp, "%.8lx-core", mfspr(SPRN_PVR)) < 0)
+   bufp = NULL;
+
+   return bufp;
+}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] s390 patches for the 4.1 merge window

2015-04-13 Thread Martin Schwidefsky
Hi Linus,

please pull from the 'for-linus' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus

to receive the following updates:

The major change in this merge is the removal of the support for
31-bit kernels. Naturally 31-bit user space will continue to work
via the compat layer.

And then some cleanup, some improvements and bug fixes.

Heiko Carstens (17):
  s390: remove 31 bit support
  s390: remove "64" suffix from a couple of files
  s390: remove 31 bit syscalls
  s390/cmpxchg: simplify cmpxchg_double
  s390: remove test_facility(2) (== z/Architecture mode active) checks
  s390/traps: panic() instead of die() on translation exception
  s390/maccess: remove potentially broken probe_kernel_write()
  s390/maccess: improve s390_kernel_write()
  s390: make couple of functions and variables static
  s390: add missing arch_release_task_struct() declaration
  s390/uprobes: fix address space annotation
  s390: remove "64" suffix from mem64.S and swsusp_asm64.S
  s390/irq: enforce correct irqclass_sub_desc array size
  s390/syscalls: simplify syscall_get_arch()
  s390/cacheinfo: add missing facility check
  s390/hibernate: fix save and restore of kernel text section
  s390/smp: wait until secondaries are active & online

Sebastian Ott (3):
  s390/ipl: cleanup bin attr usage
  s390/ipl: cleanup shutdown_action attributes
  s390/ipl: cleanup macro usage

Stefan Haberland (1):
  s390/dasd: remove setting of scheduler from driver

Xu Wang (2):
  s390/watchdog: enable KEEPALIVE for /dev/watchdog
  s390/watchdog: support for KVM hypervisors and delete pr_info messages

 arch/s390/Kbuild   |1 -
 arch/s390/Kconfig  |   79 +-
 arch/s390/Makefile |   16 +-
 arch/s390/boot/compressed/Makefile |   12 +-
 arch/s390/boot/compressed/{head64.S => head.S} |0
 arch/s390/boot/compressed/head31.S |   51 -
 arch/s390/boot/compressed/vmlinux.lds.S|5 -
 arch/s390/crypto/crypt_s390.h  |8 +-
 arch/s390/hypfs/hypfs_diag0c.c |4 -
 arch/s390/include/asm/appldata.h   |   24 -
 arch/s390/include/asm/atomic.h |   95 -
 arch/s390/include/asm/bitops.h |   28 -
 arch/s390/include/asm/cmpxchg.h|7 +-
 arch/s390/include/asm/cputime.h|   26 -
 arch/s390/include/asm/ctl_reg.h|   14 +-
 arch/s390/include/asm/elf.h|4 -
 arch/s390/include/asm/idals.h  |   16 -
 arch/s390/include/asm/jump_label.h |   12 +-
 arch/s390/include/asm/lowcore.h|  159 --
 arch/s390/include/asm/mman.h   |2 +-
 arch/s390/include/asm/mmu_context.h|4 -
 arch/s390/include/asm/percpu.h |4 -
 arch/s390/include/asm/perf_event.h |3 -
 arch/s390/include/asm/pgalloc.h|   24 -
 arch/s390/include/asm/pgtable.h|  125 +-
 arch/s390/include/asm/processor.h  |   66 +-
 arch/s390/include/asm/ptrace.h |4 -
 arch/s390/include/asm/qdio.h   |   10 -
 arch/s390/include/asm/runtime_instr.h  |   10 +-
 arch/s390/include/asm/rwsem.h  |   81 -
 arch/s390/include/asm/setup.h  |   35 -
 arch/s390/include/asm/sfp-util.h   |   10 -
 arch/s390/include/asm/sparsemem.h  |9 -
 arch/s390/include/asm/switch_to.h  |   21 +-
 arch/s390/include/asm/syscall.h|2 +-
 arch/s390/include/asm/thread_info.h|   11 +-
 arch/s390/include/asm/tlb.h|4 -
 arch/s390/include/asm/tlbflush.h   |7 -
 arch/s390/include/asm/types.h  |   17 -
 arch/s390/include/asm/uaccess.h|1 +
 arch/s390/include/asm/unistd.h |8 -
 arch/s390/include/asm/vdso.h   |2 -
 arch/s390/kernel/Makefile  |   24 +-
 arch/s390/kernel/asm-offsets.c |4 -
 arch/s390/kernel/base.S|   76 -
 arch/s390/kernel/cache.c   |4 +
 arch/s390/kernel/cpcmd.c   |   10 -
 arch/s390/kernel/diag.c|   15 -
 arch/s390/kernel/dis.c |   48 +-
 arch/s390/kernel/dumpstack.c   |   26 +-
 arch/s390/kernel/early.c   |   69 -
 arch/s390/kernel/entry.S   | 1005 ++-
 arch/s390/kernel/entry64.S | 1059 ---
 arch/s390/kernel/ftrace.c  |   12 +-
 arch/s390/kernel/head.S|   49 -
 arch/s390/kernel/head31.S  |  106 --
 arch/s390/kernel/head_kdump.S  |  

[PATCH v9 02/11] perf, tools: Add support for text descriptions of events and alias add

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Change pmu.c to allow descriptions of events and add interfaces
to add aliases at runtime from another file. To be used by jevents in
a followon patch

Acked-by: Namhyung Kim 
Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0
v2: Move perf list changes to other patch.
---
 tools/perf/util/json.c |1 +
 tools/perf/util/pmu.c  |   48 +---
 tools/perf/util/pmu.h  |1 +
 3 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/json.c b/tools/perf/util/json.c
index e20001f..2219844 100644
--- a/tools/perf/util/json.c
+++ b/tools/perf/util/json.c
@@ -38,6 +38,7 @@
 #include "jsmn.h"
 #include "json.h"
 #include 
+#include "debug.h"
 
 static char *mapfile(const char *fn, size_t *size)
 {
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 4841167..527da74 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -198,17 +198,12 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias 
*alias,
return 0;
 }
 
-static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, 
FILE *file)
+static int __perf_pmu__new_alias(struct list_head *list, char *name, char *dir,
+   char *desc, char *val)
 {
struct perf_pmu_alias *alias;
-   char buf[256];
int ret;
 
-   ret = fread(buf, 1, sizeof(buf), file);
-   if (ret == 0)
-   return -EINVAL;
-   buf[ret] = 0;
-
alias = malloc(sizeof(*alias));
if (!alias)
return -ENOMEM;
@@ -218,26 +213,49 @@ static int perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name, FI
alias->unit[0] = '\0';
alias->per_pkg = false;
 
-   ret = parse_events_terms(&alias->terms, buf);
+   ret = parse_events_terms(&alias->terms, val);
if (ret) {
+   pr_err("Cannot parse alias %s: %d\n", val, ret);
free(alias);
return ret;
}
 
alias->name = strdup(name);
-   /*
-* load unit name and scale if available
-*/
-   perf_pmu__parse_unit(alias, dir, name);
-   perf_pmu__parse_scale(alias, dir, name);
-   perf_pmu__parse_per_pkg(alias, dir, name);
-   perf_pmu__parse_snapshot(alias, dir, name);
+
+   if (dir) {
+   /*
+* load unit name and scale if available
+*/
+   perf_pmu__parse_unit(alias, dir, name);
+   perf_pmu__parse_scale(alias, dir, name);
+   perf_pmu__parse_per_pkg(alias, dir, name);
+   perf_pmu__parse_snapshot(alias, dir, name);
+   }
+
+   alias->desc = desc ? strdup(desc) : NULL;
 
list_add_tail(&alias->list, list);
 
return 0;
 }
 
+static int perf_pmu__new_alias(struct list_head *list,
+   char *dir,
+   char *name,
+   FILE *file)
+{
+   char buf[256];
+   int ret;
+
+   ret = fread(buf, 1, sizeof(buf), file);
+   if (ret == 0)
+   return -EINVAL;
+   buf[ret] = 0;
+
+   return __perf_pmu__new_alias(list, name, dir, NULL, buf);
+}
+
+
 static inline bool pmu_alias_info_file(char *name)
 {
size_t len;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 6b1249f..d06496d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -37,6 +37,7 @@ struct perf_pmu_info {
 
 struct perf_pmu_alias {
char *name;
+   char *desc;
struct list_head terms; /* HEAD struct parse_events_term -> list */
struct list_head list;  /* ELEM */
char unit[UNIT_MAX_LEN+1];
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 05/11] perf, tools: Automatically look for event file name for cpu

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

When no JSON event file is specified automatically look
for a suitable file in ~/.cache/pmu-events.

The event file format is per architecture, but can be
extended for other architectures.

Acked-by: Namhyung Kim 
Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0 and fix.
v2: Supports XDG_CACHE_HOME and defaults to ~/.cache/pmu-events
v3: Minor updates and handle EVENTMAP.
v4: Unify with header.c. Now uses CPUID directly.
---
 tools/perf/arch/x86/util/header.c |   19 +++---
 tools/perf/util/jevents.c |   40 +
 tools/perf/util/jevents.h |1 +
 tools/perf/util/pmu.c |2 +-
 4 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/tools/perf/arch/x86/util/header.c 
b/tools/perf/arch/x86/util/header.c
index 146d12a..76e0ece 100644
--- a/tools/perf/arch/x86/util/header.c
+++ b/tools/perf/arch/x86/util/header.c
@@ -5,6 +5,7 @@
 #include 
 
 #include "../../util/header.h"
+#include "../../util/jevents.h"
 
 static inline void
 cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c,
@@ -19,8 +20,8 @@ cpuid(unsigned int op, unsigned int *a, unsigned int *b, 
unsigned int *c,
: "a" (op));
 }
 
-int
-get_cpuid(char *buffer, size_t sz)
+static int
+__get_cpuid(char *buffer, size_t sz, const char *fmt)
 {
unsigned int a, b, c, d, lvl;
int family = -1, model = -1, step = -1;
@@ -48,7 +49,7 @@ get_cpuid(char *buffer, size_t sz)
if (family >= 0x6)
model += ((a >> 16) & 0xf) << 4;
}
-   nb = scnprintf(buffer, sz, "%s,%u,%u,%u$", vendor, family, model, step);
+   nb = scnprintf(buffer, sz, fmt, vendor, family, model, step);
 
/* look for end marker to ensure the entire data fit */
if (strchr(buffer, '$')) {
@@ -57,3 +58,15 @@ get_cpuid(char *buffer, size_t sz)
}
return -1;
 }
+
+int get_cpuid(char *buffer, size_t sz)
+{
+   return __get_cpuid(buffer, sz, "%s,%u,%u,%u$");
+}
+
+char *get_cpu_str(void)
+{
+   char *buf = malloc(128);
+   __get_cpuid(buf, 128, "%s-%d-%X-core");
+   return buf;
+}
diff --git a/tools/perf/util/jevents.c b/tools/perf/util/jevents.c
index 023757c..ef4c047 100644
--- a/tools/perf/util/jevents.c
+++ b/tools/perf/util/jevents.c
@@ -39,6 +39,44 @@
 #include "json.h"
 #include "jevents.h"
 
+__attribute__((weak)) char *get_cpu_str(void)
+{
+   return NULL;
+}
+
+static const char *json_default_name(void)
+{
+   char *cache;
+   char *idstr = get_cpu_str();
+   char *res = NULL;
+   char *home = NULL;
+   char *emap;
+
+   emap = getenv("EVENTMAP");
+   if (emap) {
+   if (access(emap, R_OK) == 0)
+   return emap;
+   if (asprintf(&idstr, "%s-core", emap) < 0)
+   return NULL;
+   }
+
+   cache = getenv("XDG_CACHE_HOME");
+   if (!cache) {
+   home = getenv("HOME");
+   if (!home || asprintf(&cache, "%s/.cache", home) < 0)
+   goto out;
+   }
+   if (cache && idstr)
+   res = mkpath("%s/pmu-events/%s.json",
+cache,
+idstr);
+   if (home)
+   free(cache);
+out:
+   free(idstr);
+   return res;
+}
+
 static void addfield(char *map, char **dst, const char *sep,
 const char *a, jsmntok_t *bt)
 {
@@ -171,6 +209,8 @@ int json_events(const char *fn,
int i, j, len;
char *map;
 
+   if (!fn)
+   fn = json_default_name();
tokens = parse_json(fn, &map, &size, &len);
if (!tokens)
return -EIO;
diff --git a/tools/perf/util/jevents.h b/tools/perf/util/jevents.h
index fbc4549..86a94dd 100644
--- a/tools/perf/util/jevents.h
+++ b/tools/perf/util/jevents.h
@@ -4,5 +4,6 @@
 int json_events(const char *fn,
int (*func)(void *data, char *name, char *event, char *desc),
void *data);
+char *get_cpu_str(void);
 
 #endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index d7e5e1b..274aa18 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -482,7 +482,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
if (pmu_aliases(name, &aliases))
return NULL;
 
-   if (!strcmp(name, "cpu") && json_file)
+   if (!strcmp(name, "cpu"))
json_events(json_file, add_alias, &aliases);
 
if (pmu_type(name, &type))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 09/11] perf, tools, test: Add test case for alias and JSON parsing

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add a simple test case to perf test that parses all the available
events, including json events.

This needs adding an all event iterator to pmu.c

Acked-by: Namhyung Kim 
Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0 and fix conflicts in:
tools/perf/tests/builtin-test.c
tools/perf/tests/tests.h
Changelog:
v2: Rename identifiers
v3: Only iterate cpu pmu to avoid bogus errors.
Move pmu iterator to extra patch
v4: Include aliases.c again
v5: Include util/debug.h
---
 tools/perf/Makefile.perf|1 +
 tools/perf/tests/aliases.c  |   59 +++
 tools/perf/tests/builtin-test.c |4 +++
 tools/perf/tests/tests.h|1 +
 4 files changed, 65 insertions(+)
 create mode 100644 tools/perf/tests/aliases.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index d9c1a4c..d9c03c4 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -456,6 +456,7 @@ endif
 LIB_OBJS += $(OUTPUT)tests/code-reading.o
 LIB_OBJS += $(OUTPUT)tests/sample-parsing.o
 LIB_OBJS += $(OUTPUT)tests/parse-no-sample-id-all.o
+LIB_OBJS += $(OUTPUT)tests/aliases.o
 ifndef NO_DWARF_UNWIND
 ifeq ($(ARCH),$(filter $(ARCH),x86 arm))
 LIB_OBJS += $(OUTPUT)tests/dwarf-unwind.o
diff --git a/tools/perf/tests/aliases.c b/tools/perf/tests/aliases.c
new file mode 100644
index 000..4209e51
--- /dev/null
+++ b/tools/perf/tests/aliases.c
@@ -0,0 +1,59 @@
+/* Check if we can set up all aliases and can read JSON files */
+#include 
+#include "tests.h"
+#include "pmu.h"
+#include "evlist.h"
+#include "parse-events.h"
+#include "util/debug.h"
+
+static struct perf_evlist *evlist;
+
+static int num_events;
+static int failed;
+
+static int test__event(const char *pmu, const char *name)
+{
+   int ret;
+
+   /* Not supported for now */
+   if (strcmp(pmu, "cpu"))
+   return 0;
+
+   ret = parse_events(evlist, name);
+
+   if (ret) {
+   /*
+* We only print on failure because common perf setups
+* have events that cannot be parsed.
+*/
+   fprintf(stderr, "invalid or unsupported event: '%s'\n", name);
+   ret = 0;
+   failed++;
+   } else
+   num_events++;
+   return ret;
+}
+
+int test__aliases(void)
+{
+   int err;
+
+   /* Download JSON files */
+   /* XXX assumes perf is installed */
+   /* For now user must manually download */
+   if (0 && system("perf download > /dev/null") < 0) {
+   /* Don't error out for this for now */
+   fprintf(stderr, "perf download failed\n");
+   }
+
+   evlist = perf_evlist__new();
+   if (evlist == NULL)
+   return -ENOMEM;
+
+   err = pmu_iterate_events(test__event);
+   fprintf(stderr, " Parsed %d events :", num_events);
+   if (failed > 0)
+   pr_debug(" %d events failed", failed);
+   perf_evlist__delete(evlist);
+   return err;
+}
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 4b7d9ab..2324b1c 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -167,6 +167,10 @@ static struct test {
.func = test__fdarray__add,
},
{
+   .desc = "Test parsing JSON aliases",
+   .func = test__aliases,
+   },
+   {
.func = NULL,
},
 };
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 00e776a..ddce231 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -51,6 +51,7 @@ int test__hists_cumulate(void);
 int test__switch_tracking(void);
 int test__fdarray__filter(void);
 int test__fdarray__add(void);
+int test__aliases(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 10/11] perf, tools: Add a --no-desc flag to perf list

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add a --no-desc flag to perf list to not print the event descriptions
that were earlier added for JSON events. This may be useful to
get a less crowded listing.

It's still default to print descriptions as that is the more useful
default for most users.

Before:

% perf list
...
  baclears.any   [Counts the total number 
when the front end is
  resteered, mainly when 
the BPU cannot provide a
  correct prediction and 
this is corrected by other
  branch handling 
mechanisms at the front end]
  br_inst_exec.all_branches  [Speculative and retired 
branches]

After:

% perf list --no-desc
...
  baclears.any   [Kernel PMU event]
  br_inst_exec.all_branches  [Kernel PMU event]

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0 and fix conflicts in
tools/perf/builtin-list.c
v2: Rename --quiet to --no-desc. Add option to man page.
---
 tools/perf/Documentation/perf-list.txt |5 -
 tools/perf/builtin-list.c  |   16 +++-
 tools/perf/util/parse-events.c |4 ++--
 tools/perf/util/parse-events.h |2 +-
 tools/perf/util/pmu.c  |4 ++--
 tools/perf/util/pmu.h  |2 +-
 6 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index 205ac40..7479efe 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -8,7 +8,7 @@ perf-list - List all symbolic event types
 SYNOPSIS
 
 [verse]
-'perf list' [hw|sw|cache|tracepoint|pmu|event_glob]
+'perf list' [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob]
 
 DESCRIPTION
 ---
@@ -23,6 +23,9 @@ automatically downloaded with perf download.
 The JSON event file can be also specified with the EVENTMAP environment
 variable.
 
+--no-desc::
+Don't print descriptions.
+
 
 [[EVENT_MODIFIERS]] EVENT MODIFIERS
 ---
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index fd07cc1..76dc23b 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -16,6 +16,8 @@
 #include "util/pmu.h"
 #include "util/parse-options.h"
 
+static bool desc_flag = true;
+
 int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
 {
int i;
@@ -24,10 +26,12 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_BOOLEAN(0, "raw-dump", &raw_dump, "Dump raw events"),
OPT_STRING(0, "events-file", &json_file, "json file",
   "Read event json file"),
+   OPT_BOOLEAN('d', "desc", &desc_flag,
+   "Print extra event descriptions. --no-desc to not 
print."),
OPT_END()
};
const char * const list_usage[] = {
-   "perf list [hw|sw|cache|tracepoint|pmu|event_glob]",
+   "perf list [--events-file FILE] [--no-desc] 
[hw|sw|cache|tracepoint|pmu|event_glob]",
NULL
};
 
@@ -39,12 +43,12 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __maybe_unused)
setup_pager();
 
if (raw_dump) {
-   print_events(NULL, true);
+   print_events(NULL, true, !desc_flag);
return 0;
}
 
if (argc == 0) {
-   print_events(NULL, false);
+   print_events(NULL, false, !desc_flag);
return 0;
}
 
@@ -63,13 +67,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __maybe_unused)
 strcmp(argv[i], "hwcache") == 0)
print_hwcache_events(NULL, false);
else if (strcmp(argv[i], "pmu") == 0)
-   print_pmu_events(NULL, false);
+   print_pmu_events(NULL, false, !desc_flag);
+   else if (strcmp(argv[i], "--raw-dump") == 0)
+   print_events(NULL, true, !desc_flag);
else {
char *sep = strchr(argv[i], ':'), *s;
int sep_idx;
 
if (sep == NULL) {
-   print_events(argv[i], false);
+   print_events(argv[i], false, !desc_flag);
continue;
}
sep_idx = sep - argv[i];
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 7f8ec6c..039ba78 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1336,7 +1336,7 @@ static void print_symbol_events(cons

[PATCH v9 04/11] perf, tools: Add support for reading JSON event files

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add a parser for Intel style JSON event files. This allows
to use an Intel event list directly with perf. The Intel
event lists can be quite large and are too big to store
in unswappable kernel memory.

The parser code knows how to convert the JSON fields
to perf fields. The conversion code is straight forward.
It knows (very little) Intel specific information, and can be easily
extended to handle fields for other CPUs.

The parser code is partially shared with an independent parsing
library, which is 2-clause BSD licenced. To avoid any conflicts I marked
those files as BSD licenced too. As part of perf they become GPLv2.

The events are handled using the existing alias machinery.

We output the BriefDescription in perf list.

Right now the json file can be specified as an argument
to perf stat/record/list. Followon patches will automate this.

JSON files look like this:

[
  {
"EventCode": "0x00",
"UMask": "0x01",
"EventName": "INST_RETIRED.ANY",
"BriefDescription": "Instructions retired from execution.",
"PublicDescription": "Instructions retired from execution.",
"Counter": "Fixed counter 1",
"CounterHTOff": "Fixed counter 1",
"SampleAfterValue": "203",
"MSRIndex": "0",
"MSRValue": "0",
"TakenAlone": "0",
"CounterMask": "0",
"Invert": "0",
"AnyThread": "0",
"EdgeDetect": "0",
"PEBS": "0",
"PRECISE_STORE": "0",
"Errata": "null",
"Offcore": "0"
  }
]

Acked-by: Namhyung Kim 
Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0 and fix conflicts in:
tools/perf/Documentation/perf-record.txt
tools/perf/builtin-list.c
tools/perf/builtin-record.c
tools/perf/util/pmu.h
v2: Address review feedback. Rename option to --event-files
v3: Add JSON example
v4: Update manpages.
v5: Don't remove dot in fixname. Fix compile error. Add include
protection. Comment realloc.
v6: Include debug/util.h
---
 tools/perf/Documentation/perf-list.txt   |   12 +-
 tools/perf/Documentation/perf-record.txt |9 +-
 tools/perf/Documentation/perf-stat.txt   |8 +-
 tools/perf/Makefile.perf |2 +
 tools/perf/builtin-list.c|2 +
 tools/perf/builtin-record.c  |3 +
 tools/perf/builtin-stat.c|2 +
 tools/perf/util/jevents.c|  247 ++
 tools/perf/util/jevents.h|8 +
 tools/perf/util/json.c   |1 +
 tools/perf/util/pmu.c|   14 ++
 tools/perf/util/pmu.h|1 +
 12 files changed, 305 insertions(+), 4 deletions(-)
 create mode 100644 tools/perf/util/jevents.c
 create mode 100644 tools/perf/util/jevents.h

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index 3e2aec9..205ac40 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -15,8 +15,16 @@ DESCRIPTION
 This command displays the symbolic event types which can be selected in the
 various perf commands with the -e option.
 
-[[EVENT_MODIFIERS]]
-EVENT MODIFIERS
+OPTIONS
+---
+--events-file=::
+Specify JSON event list file to use for parsing events. Files can be
+automatically downloaded with perf download.
+The JSON event file can be also specified with the EVENTMAP environment
+variable.
+
+
+[[EVENT_MODIFIERS]] EVENT MODIFIERS
 ---
 
 Events can optionally have a modifier by appending a colon and one or
diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 31e9774..13f34b0 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -235,6 +235,13 @@ Capture machine state (registers) at interrupt, i.e., on 
counter overflows for
 each sample. List of captured registers depends on the architecture. This 
option
 is off by default.
 
+--events-file=::
+Specify JSON event list file to use for parsing events. Must be specified
+before the -e option. Files can be automatically downloaded with perf download.
+The JSON event file can be also specified with the EVENTMAP environment
+variable.
+
+
 SEE ALSO
 
-linkperf:perf-stat[1], linkperf:perf-list[1]
+linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-download[1]
diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 04e150d..3853245 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -154,6 +154,12 @@ filter out the startup phase of the program, which is 
often very different.
 
 Print statistics of transactional execution if supported.
 
+--events-file=::
+Specify JSON event list file to use for parsing events. Must be specified 
before
+the -e option. Files can be automatically downloaded with perf download.
+The J

Re: [PATCH v5 10/10] module: Rework module_addr_{min,max}

2015-04-13 Thread Peter Zijlstra
On Tue, Apr 14, 2015 at 12:25:45PM +0930, Rusty Russell wrote:
> Ingo Molnar  writes:
> > * Peter Zijlstra  wrote:
> >
> >> __module_address() does an initial bound check before doing the 
> >> {list/tree} iteration to find the actual module. The bound variables 
> >> are nowhere near the mod_tree cacheline, in fact they're nowhere 
> >> near one another.
> >> 
> >> module_addr_min lives in .data while module_addr_max lives in .bss 
> >> (smarty pants GCC thinks the explicit 0 assignment is a mistake).
> >> 
> >> Rectify this by moving the two variables into a structure together 
> >> with the latch_tree_root to guarantee they all share the same 
> >> cacheline and avoid hitting two extra cachelines for the lookup.
> >> 
> >> While reworking the bounds code, move the bound update from 
> >> allocation to insertion time, this avoids updating the bounds for a 
> >> few error paths.
> >
> >> +static struct mod_tree_root {
> >> +  struct latch_tree_root root;
> >> +  unsigned long addr_min;
> >> +  unsigned long addr_max;
> >> +} mod_tree __cacheline_aligned = {
> >> +  .addr_min = -1UL,
> >> +};
> >> +
> >> +#define module_addr_min mod_tree.addr_min
> >> +#define module_addr_max mod_tree.addr_max
> 
> Nice catch.
> 
> Does the min/max comparison still win us anything?  (I'm guessing yes...)

Yep, while a tree iteration is much faster than the linear thing it is
still quite a bit slower than two simple compares.

> In general, I'm happy with this series.  Assume you want another
> go-round for Ingo's tweaks, then I'll take them for 4.2.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/10] latched RB-trees and __module_address()

2015-04-13 Thread Peter Zijlstra
On Tue, Apr 14, 2015 at 12:27:05PM +0930, Rusty Russell wrote:

> I was tempted to sneak in those module rcu fixes for 4.1, but seeing
> Ingo's comments I'll wait for 4.2.

I can get you a new version of that if you want. See below. The fixups
are unmodified of the posting (patches 2,3).

---
Subject: module: Sanitize RCU usage and locking
From: Peter Zijlstra 
Date: Sat Feb 28 19:17:04 CET 2015

Currently the RCU usage in module is an inconsistent mess of RCU and
RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
does not imply synchronize_sched().

Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
(most of) the modification sites use synchronize_rcu(). With the
exception of the module bug list, which actually uses RCU.

Convert everything over to RCU-sched.

Furthermore add lockdep asserts to all sites, because it's not at all
clear to me the required locking is observed, esp. on exported
functions.

Cc: Rusty Russell 
Acked-by: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/module.h |   12 ++--
 kernel/module.c|   40 
 lib/bug.c  |7 +--
 3 files changed, 47 insertions(+), 12 deletions(-)

--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -419,14 +419,22 @@ struct symsearch {
bool unused;
 };
 
-/* Search for an exported symbol by name. */
+/*
+ * Search for an exported symbol by name.
+ *
+ * Must be called with module_mutex held or preemption disabled.
+ */
 const struct kernel_symbol *find_symbol(const char *name,
struct module **owner,
const unsigned long **crc,
bool gplok,
bool warn);
 
-/* Walk the exported symbol table */
+/*
+ * Walk the exported symbol table
+ *
+ * Must be called with module_mutex held or preemption disabled.
+ */
 bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
struct module *owner,
void *data), void *data);
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -105,6 +105,22 @@ static LIST_HEAD(modules);
 struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
 #endif /* CONFIG_KGDB_KDB */
 
+static void module_assert_mutex(void)
+{
+   lockdep_assert_held(&module_mutex);
+}
+
+static void module_assert_mutex_or_preempt(void)
+{
+#ifdef CONFIG_LOCKDEP
+   if (!unlikely(debug_locks))
+   return;
+
+   WARN_ON(!rcu_held_lock_sched_held() &&
+   !lockdep_is_held(&module_mutex));
+#endif
+}
+
 #ifdef CONFIG_MODULE_SIG
 #ifdef CONFIG_MODULE_SIG_FORCE
 static bool sig_enforce = true;
@@ -318,6 +334,8 @@ bool each_symbol_section(bool (*fn)(cons
 #endif
};
 
+   module_assert_mutex_or_preempt();
+
if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data))
return true;
 
@@ -457,6 +475,8 @@ static struct module *find_module_all(co
 {
struct module *mod;
 
+   module_assert_mutex();
+
list_for_each_entry(mod, &modules, list) {
if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
continue;
@@ -1854,8 +1874,8 @@ static void free_module(struct module *m
list_del_rcu(&mod->list);
/* Remove this module from bug list, this uses list_del_rcu */
module_bug_cleanup(mod);
-   /* Wait for RCU synchronizing before releasing mod->list and buglist. */
-   synchronize_rcu();
+   /* Wait for RCU-sched synchronizing before releasing mod->list and 
buglist. */
+   synchronize_sched();
mutex_unlock(&module_mutex);
 
/* This may be NULL, but that's OK */
@@ -3106,11 +3126,11 @@ static noinline int do_init_module(struc
mod->init_text_size = 0;
/*
 * We want to free module_init, but be aware that kallsyms may be
-* walking this with preempt disabled.  In all the failure paths,
-* we call synchronize_rcu/synchronize_sched, but we don't want
-* to slow down the success path, so use actual RCU here.
+* walking this with preempt disabled.  In all the failure paths, we
+* call synchronize_sched(), but we don't want to slow down the success
+* path, so use actual RCU here.
 */
-   call_rcu(&freeinit->rcu, do_free_init);
+   call_rcu_sched(&freeinit->rcu, do_free_init);
mutex_unlock(&module_mutex);
wake_up_all(&module_wq);
 
@@ -3368,8 +3388,8 @@ static int load_module(struct load_info
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
wake_up_all(&module_wq);
-   /* Wait for RCU synchronizing before releasing mod->list. */
-   synchronize_rcu();
+   /* Wait for RCU-sched synchronizing before releasing mod->list. */
+   synchro

Re: [Linux-nvdimm] [GIT PULL] PMEM driver for v4.1

2015-04-13 Thread Boaz Harrosh
On 04/13/2015 08:19 PM, Christoph Hellwig wrote:
> On Mon, Apr 13, 2015 at 02:11:56PM +0300, Yigal Korman wrote:
>> mlock()
> 
> DAX files always are in-memory so this just sounds like an oversight.
> method.

Yes mlock on DAX can just return true, but mlock implies MAP_POPULATE.

Which means "I would like to page-fault the all mmap range at mmap time
so at access time I'm guarantied not to sleep". This is usually done
for latency sensitive applications. 

But current code fails on MAP_POPULATE for DAX because it is only
implemented for pages, and therefor mlock fails as well.

One thing I do not understand. does mlock also protects against
truncate?

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 11/11] perf-download: Download the events json file

2015-04-13 Thread Sukadev Bhattiprolu
Add a downloader to automatically download the right files from a
download site.

This is implemented as a script calling curl, similar to perf archive.
The perf driver automatically calls the right binary. The downloader is
extensible, but currently only implements an Intel and Powerpc event
download.  It would be straightforward to add support for other architectures.

For now, there could be slight variations in how individual architectures
organize the JSON files. Eg. Powerpc has a simple mapping from its PVR
to cpu family eg power8's events file, which x86 uses, CPU vendor, family
model to locate the specific file to download.

The downloaded event files are put into ~/.cache/pmu-events, where the
builtin event parser in util/* can find them automatically.

Signed-off-by: Andi Kleen 
Link: 
http://lkml.kernel.org/n/1405123165-22666-8-git-send-email-a...@firstfloor.org
Signed-off-by: Jiri Olsa 
Signed-off-by: Sukadev Bhattiprolu 

Changelog[v9] (by Sukadev Bhattiprolu)
Add the perf-download script back into patchset. Set default
download location to the tools/perf/pmu-events/ directory in
Linus's tree.
Include code to parse/download powerpc JSON files.
Remove Acked-by: Namhyung Kim since this patch has major changes
---
 tools/perf/Documentation/perf-download.txt |   31 +
 tools/perf/Documentation/perf-list.txt |   12 +-
 tools/perf/Makefile.perf   |5 +-
 tools/perf/perf-download.sh|  171 
 4 files changed, 217 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-download.txt
 create mode 100755 tools/perf/perf-download.sh

diff --git a/tools/perf/Documentation/perf-download.txt 
b/tools/perf/Documentation/perf-download.txt
new file mode 100644
index 000..9e5b28e
--- /dev/null
+++ b/tools/perf/Documentation/perf-download.txt
@@ -0,0 +1,31 @@
+perf-download(1)
+===
+
+NAME
+
+perf-download - Download event files for current CPU.
+
+SYNOPSIS
+
+[verse]
+'perf download' [vendor-family-model]
+
+DESCRIPTION
+---
+This command automatically downloads the event list for the current CPU and
+stores them in $XDG_CACHE_HOME/pmu-events (or $HOME/.cache/pmu-events).
+The other tools automatically look for them there. The CPU can be also
+specified at the command line.
+
+The downloading is done using http through wget, which needs
+to be installed. When behind a firewall the proxies
+may also need to be set up using "export https_proxy="
+
+The user should regularly call this to download updated event lists
+for the current CPU.
+
+Note the downloaded files are stored per user, so if perf is
+used as both normal user and with sudo the event files may
+also need to be moved to root's home directory with
+sudo mkdir /root/.cache ; sud cp -r ~/.cache/pmu-events /root/.cache
+after downloading.
diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index 7479efe..98637e8 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -66,6 +66,16 @@ Sampling). Examples to use IBS:
  perf record -a -e r076:p ...  # same as -e cpu-cycles:p
  perf record -a -e r0C1:p ...  # use ibs op counting micro-ops
 
+PER CPU EVENT LISTS
+---
+
+For some CPUs (particularly modern Intel CPUs) "perf download" can
+download additional CPU specific event definitions, which then
+become visible in perf list and available in the other perf tools.
+
+This obsoletes the raw event description method described below
+for most cases.
+
 RAW HARDWARE EVENT DESCRIPTOR
 -
 Even when an event is not available in a symbolic form within perf right now,
@@ -141,6 +151,6 @@ types specified.
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-top[1],
-linkperf:perf-record[1],
+linkperf:perf-record[1], linkperf:perf-download[1],
 http://www.intel.com/Assets/PDF/manual/253669.pdf[Intel® 64 and IA-32 
Architectures Software Developer's Manual Volume 3B: System Programming Guide],
 http://support.amd.com/us/Processor_TechDocs/24593_APM_v2.pdf[AMD64 
Architecture Programmer’s Manual Volume 2: System Programming]
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index d9c03c4..9f955b1 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -136,6 +136,7 @@ SCRIPT_SH =
 
 SCRIPT_SH += perf-archive.sh
 SCRIPT_SH += perf-with-kcore.sh
+SCRIPT_SH += perf-download.sh
 
 grep-libs = $(filter -l%,$(1))
 strip-libs = $(filter-out -l%,$(1))
@@ -946,6 +947,8 @@ endif
$(INSTALL) $(OUTPUT)perf-archive -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
$(call QUIET_INSTALL, perf-with-kcore) \
$(INSTALL) $(OUTPUT)perf-with-kcore -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
+   $(call QUIET_INSTALL, perf-download) \
+   $(INSTALL) $(OUTPUT)perf-download -t 
'$(DESTDIR_SQ)$(perfexec_in

[PATCH v9 07/11] perf, tools: Query terminal width and use in perf list

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Automatically adapt the now wider and word wrapped perf list
output to wider terminals. This requires querying the terminal
before the auto pager takes over, and exporting this
information from the pager subsystem.

Acked-by: Namhyung Kim 
Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 

---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0.
---
 tools/perf/util/cache.h |1 +
 tools/perf/util/pager.c |   15 +++
 tools/perf/util/pmu.c   |   12 ++--
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/cache.h b/tools/perf/util/cache.h
index d04d770..f1990c9 100644
--- a/tools/perf/util/cache.h
+++ b/tools/perf/util/cache.h
@@ -32,6 +32,7 @@ extern void setup_pager(void);
 extern const char *pager_program;
 extern int pager_in_use(void);
 extern int pager_use_color;
+int pager_get_columns(void);
 
 char *alias_lookup(const char *alias);
 int split_cmdline(char *cmdline, const char ***argv);
diff --git a/tools/perf/util/pager.c b/tools/perf/util/pager.c
index 31ee02d..9761202 100644
--- a/tools/perf/util/pager.c
+++ b/tools/perf/util/pager.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "run-command.h"
 #include "sigchain.h"
+#include 
 
 /*
  * This is split up from the rest of git so that we can do
@@ -8,6 +9,7 @@
  */
 
 static int spawned_pager;
+static int pager_columns;
 
 static void pager_preexec(void)
 {
@@ -47,9 +49,12 @@ static void wait_for_pager_signal(int signo)
 void setup_pager(void)
 {
const char *pager = getenv("PERF_PAGER");
+   struct winsize sz;
 
if (!isatty(1))
return;
+   if (ioctl(1, TIOCGWINSZ, &sz) == 0)
+   pager_columns = sz.ws_col;
if (!pager) {
if (!pager_program)
perf_config(perf_default_config, NULL);
@@ -98,3 +103,13 @@ int pager_in_use(void)
env = getenv("PERF_PAGER_IN_USE");
return env ? perf_config_bool("PERF_PAGER_IN_USE", env) : 0;
 }
+
+int pager_get_columns(void)
+{
+   char *s;
+
+   s = getenv("COLUMNS");
+   if (s)
+   return atoi(s);
+   return (pager_columns ? pager_columns : 80) - 2;
+}
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 274aa18..2150455 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -11,6 +11,7 @@
 #include "pmu.h"
 #include "parse-events.h"
 #include "cpumap.h"
+#include "cache.h"
 #include "jevents.h"
 
 const char *json_file;
@@ -929,15 +930,6 @@ static void wordwrap(char *s, int start, int max, int corr)
}
 }
 
-static int get_columns(void)
-{
-   /*
-* Should ask the terminal with TIOCGWINSZ here, but we
-* need the original fd before the pager.
-*/
-   return 79;
-}
-
 void print_pmu_events(const char *event_glob, bool name_only)
 {
struct perf_pmu *pmu;
@@ -947,7 +939,7 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
int len, j;
struct pair *aliases;
int numdesc = 0;
-   int columns = get_columns();
+   int columns = pager_get_columns();
 
pmu = NULL;
len = 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] dm-crypt: Adds support for wiping key when doing suspend/hibernation

2015-04-13 Thread Pavel Machek
On Thu 2015-04-09 09:12:08, Mike Snitzer wrote:
> On Mon, Apr 06 2015 at  9:29am -0400,
> Pali Rohár  wrote:
> 
> > On Monday 06 April 2015 15:00:46 Mike Snitzer wrote:
> > > On Sun, Apr 05 2015 at  1:20pm -0400,
> > > 
> > > Pali Rohár  wrote:
> > > > This patch series increase security of suspend and hibernate
> > > > actions. It allows user to safely wipe crypto keys before
> > > > suspend and hibernate actions starts without race
> > > > conditions on userspace process with heavy I/O.
> > > > 
> > > > To automatically wipe cryto key for  before
> > > > hibernate action call: $ dmsetup message  0 key
> > > > wipe_on_hibernation 1
> > > > 
> > > > To automatically wipe cryto key for  before suspend
> > > > action call: $ dmsetup message  0 key
> > > > wipe_on_suspend 1
> > > > 
> > > > (Value 0 after wipe_* string reverts original behaviour - to
> > > > not wipe key)
> > > 
> > > Can you elaborate on the attack vector your changes are meant
> > > to protect against?  The user already authorized access, why
> > > is it inherently dangerous to _not_ wipe the associated key
> > > across these events?
> > 
> > Hi,
> > 
> > yes, I will try to explain current problems with cryptsetup 
> > luksSuspend command and hibernation.
> > 
> > First, sometimes it is needed to put machine into other hands. 
> > You can still watch other person what is doing with machine, but 
> > once if you let machine unlocked (e.g opened luks disk), she/he 
> > can access encrypted data.
> > 
> > If you turn off machine, it could be safe, because luks disk 
> > devices are locked. But if you enter machine into suspend or 
> > hibernate state luks devices are still open. And my patches try 
> > to achieve similar security as when machine is off (= no crypto 
> > keys in RAM or on swap).
> > 
> > When doing hibernate on unencrypted swap it is to prevent leaking 
> > crypto keys to hibernate image (which is stored in swap).
> > 
> > When doing suspend action it is again to prevent leaking crypto 
> > keys. E.g when you suspend laptop and put it off (somebody can 
> > remove RAMs and do some cold boot attack).
> > 
> > The most common situation is:
> > You have mounted partition from dm-crypt device (e.g. /home/), 
> > some userspace processes access it (e.g opened firefox which 
> > still reads/writes to cache ~/.firefox/) and you want to drop 
> > crypto keys from kernel for some time.
> > 
> > For that operation there is command cryptsetup luksSuspend, which 
> > suspend dm device and then tell kernel to wipe crypto keys. All 
> > I/O operations are then stopped and userspace processes which 
> > want to do some those I/O operations are stopped too (until you 
> > call cryptsetup luksResume and enter correct key).
> > 
> > Now if you want to suspend/hiberate your machine (when some of dm 
> > devices are suspeneded and some processes are stopped due to 
> > pending I/O) it is not possible. Kernel freeze_processes function 
> > will fail because userspace processes are still stopped inside 
> > some I/O syscall (read/write, etc,...).
> > 
> > My patches fixes this problem and do those operations (suspend dm 
> > device, wipe crypto keys, enter suspend/hiberate) in correct 
> > order and without race condition.
> > 
> > dm device is suspended *after* userspace processes are freezed 
> > and after that are crypto keys wiped. And then computer/laptop 
> > enters into suspend/hibernate state.
> 
> Wouldn't it be better to fix freeze_processes() to be tolerant of
> processes that are hung as a side-effect of their backing storage being
> suspended?  A hibernate shouldn't fail simply because a user chose to
> suspend a DM device.

That would be nice, I agree. But that's non-trivial ammount of work
and might be (close to) impossible.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 08/11] perf, tools: Add a new pmu interface to iterate over all events

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

With calling a callback. To be used in test code added in the next
patch.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v9] by Sukadev Bhattiprolu
Rebase to 4.0 and fix conflicts in:
tools/perf/util/pmu.c
tools/perf/util/pmu.h
---
 tools/perf/util/pmu.c |   18 ++
 tools/perf/util/pmu.h |1 +
 2 files changed, 19 insertions(+)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2150455..82f7654 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1068,3 +1068,21 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char 
*name, const char *fmt,
va_end(args);
return ret;
 }
+
+int pmu_iterate_events(int (*func)(const char *pmu, const char *name))
+{
+   int ret = 0;
+   struct perf_pmu *pmu;
+   struct perf_pmu_alias *alias;
+
+   perf_pmu__find("cpu"); /* Load PMUs */
+   pmu = NULL;
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   list_for_each_entry(alias, &pmu->aliases, list) {
+   ret = func(pmu->name, alias->name);
+   if (ret != 0)
+   break;
+   }
+   }
+   return ret;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index f8dac0f..889cadf 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -76,6 +76,7 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char 
*name, const char *fmt,
 int perf_pmu__test(void);
 
 struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu);
+int pmu_iterate_events(int (*func)(const char *, const char *name));
 
 extern const char *json_file;
 #endif /* __PMU_H */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 00/11] Add support for JSON event files.

2015-04-13 Thread Sukadev Bhattiprolu
This is another attempt to resurrect Andi Kleen's patchset so users
can specify perf events by their event names rather than raw codes.

This is a rebase of Andi Kleen's patchset from Jul 30, 2014[1] to 4.0.
(I fixed minor and not so minor conflicts).

This patchset includes the perf-download tool that was dropped and sets
the default download location to the (tools/perf/pmu-events/arch/...
directory in Linus's tree.

A follow-on patchset will include the actual JSON files for Powerpc, which
are currently available on github[2].

[1] https://lkml.org/lkml/2014/7/30/693
[2] https://github.com/open-power/power-pmu-events

Andi Kleen (9):
  perf, tools: Add jsmn `jasmine' JSON parser
  perf, tools: Add support for text descriptions of events and alias
add
  perf, tools, list: Update perf list to output descriptions
  perf, tools: Add support for reading JSON event files
  perf, tools: Automatically look for event file name for cpu
  perf, tools: Query terminal width and use in perf list
  perf, tools: Add a new pmu interface to iterate over all events
  perf, tools, test: Add test case for alias and JSON parsing
  perf, tools: Add a --no-desc flag to perf list

Sukadev Bhattiprolu (2):
  powerpc/perf: Implement get_cpu_str()
  perf-download: Download the events json file

 tools/perf/Documentation/perf-download.txt |   31 +++
 tools/perf/Documentation/perf-list.txt |   29 ++-
 tools/perf/Documentation/perf-record.txt   |9 +-
 tools/perf/Documentation/perf-stat.txt |8 +-
 tools/perf/Makefile.perf   |   12 +-
 tools/perf/arch/powerpc/util/header.c  |   12 ++
 tools/perf/arch/x86/util/header.c  |   19 +-
 tools/perf/builtin-list.c  |   18 +-
 tools/perf/builtin-record.c|3 +
 tools/perf/builtin-stat.c  |2 +
 tools/perf/perf-download.sh|  171 +++
 tools/perf/tests/aliases.c |   59 ++
 tools/perf/tests/builtin-test.c|4 +
 tools/perf/tests/tests.h   |1 +
 tools/perf/util/cache.h|1 +
 tools/perf/util/jevents.c  |  287 +
 tools/perf/util/jevents.h  |9 +
 tools/perf/util/jsmn.c |  313 
 tools/perf/util/jsmn.h |   67 ++
 tools/perf/util/json.c |  162 ++
 tools/perf/util/json.h |   13 ++
 tools/perf/util/pager.c|   15 ++
 tools/perf/util/parse-events.c |4 +-
 tools/perf/util/parse-events.h |2 +-
 tools/perf/util/pmu.c  |  160 +++---
 tools/perf/util/pmu.h  |5 +-
 26 files changed, 1365 insertions(+), 51 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-download.txt
 create mode 100755 tools/perf/perf-download.sh
 create mode 100644 tools/perf/tests/aliases.c
 create mode 100644 tools/perf/util/jevents.c
 create mode 100644 tools/perf/util/jevents.h
 create mode 100644 tools/perf/util/jsmn.c
 create mode 100644 tools/perf/util/jsmn.h
 create mode 100644 tools/perf/util/json.c
 create mode 100644 tools/perf/util/json.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 03/11] perf, tools, list: Update perf list to output descriptions

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add support to perf list to print aliases with descriptions.
Support word wrapping for descriptions.
Fix up the sorting code to put aliases with descriptions
last.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---

Changelov[v9] by Sukadev Bhattiprolu
Rebase to 4.0 and fix conflicts in tools/perf/util/pmu.c
---
 tools/perf/util/pmu.c |   86 -
 1 file changed, 70 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 527da74..623b107 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -877,11 +877,51 @@ static char *format_alias_or(char *buf, int len, struct 
perf_pmu *pmu,
return buf;
 }
 
-static int cmp_string(const void *a, const void *b)
+struct pair {
+   char *name;
+   char *desc;
+};
+
+static int cmp_pair(const void *a, const void *b)
 {
-   const char * const *as = a;
-   const char * const *bs = b;
-   return strcmp(*as, *bs);
+   const struct pair *as = a;
+   const struct pair *bs = b;
+
+   /* Put downloaded event list last */
+   if (!!as->desc != !!bs->desc)
+   return !!as->desc - !!bs->desc;
+   return strcmp(as->name, bs->name);
+}
+
+static void wordwrap(char *s, int start, int max, int corr)
+{
+   int column = start;
+   int n;
+
+   while (*s) {
+   int wlen = strcspn(s, " \t");
+
+   if (column + wlen >= max && column > start) {
+   printf("\n%*s", start, "");
+   column = start + corr;
+   }
+   n = printf("%s%.*s", column > start ? " " : "", wlen, s);
+   if (n <= 0)
+   break;
+   s += wlen;
+   column += n;
+   while (isspace(*s))
+   s++;
+   }
+}
+
+static int get_columns(void)
+{
+   /*
+* Should ask the terminal with TIOCGWINSZ here, but we
+* need the original fd before the pager.
+*/
+   return 79;
 }
 
 void print_pmu_events(const char *event_glob, bool name_only)
@@ -891,7 +931,9 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
char buf[1024];
int printed = 0;
int len, j;
-   char **aliases;
+   struct pair *aliases;
+   int numdesc = 0;
+   int columns = get_columns();
 
pmu = NULL;
len = 0;
@@ -901,14 +943,15 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
if (pmu->selectable)
len++;
}
-   aliases = zalloc(sizeof(char *) * len);
+   aliases = zalloc(sizeof(struct pair) * len);
if (!aliases)
goto out_enomem;
pmu = NULL;
j = 0;
while ((pmu = perf_pmu__scan(pmu)) != NULL) {
list_for_each_entry(alias, &pmu->aliases, list) {
-   char *name = format_alias(buf, sizeof(buf), pmu, alias);
+   char *name = alias->desc ? alias->name :
+   format_alias(buf, sizeof(buf), pmu, alias);
bool is_cpu = !strcmp(pmu->name, "cpu");
 
if (event_glob != NULL &&
@@ -917,11 +960,14 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
   event_glob
continue;
 
-   if (is_cpu && !name_only)
-   name = format_alias_or(buf, sizeof(buf), pmu, 
alias);
+   aliases[j].name = name;
+   if (is_cpu && !name_only && !alias->desc)
+   aliases[j].name = format_alias_or(buf,
+   sizeof(buf), pmu, alias);
 
-   aliases[j] = strdup(name);
-   if (aliases[j] == NULL)
+   aliases[j].name = strdup(aliases[j].name);
+   aliases[j].desc = alias->desc;
+   if (aliases[j].name == NULL)
goto out_enomem;
j++;
}
@@ -929,25 +975,33 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
char *s;
if (asprintf(&s, "%s//", pmu->name) < 0)
goto out_enomem;
-   aliases[j] = s;
+   aliases[j].name = s;
j++;
}
}
len = j;
-   qsort(aliases, len, sizeof(char *), cmp_string);
+   qsort(aliases, len, sizeof(struct pair), cmp_pair);
for (j = 0; j < len; j++) {
if (name_only) {
-   printf("%s ", aliases[j]);
+   printf("%s ", aliases[j].name);
continue;
}
-  

[PATCH v9 01/11] perf, tools: Add jsmn `jasmine' JSON parser

2015-04-13 Thread Sukadev Bhattiprolu
From: Andi Kleen 

I need a JSON parser. This adds the simplest JSON
parser I could find -- Serge Zaitsev's jsmn `jasmine' --
to the perf library. I merely converted it to (mostly)
Linux style and added support for non 0 terminated input.

The parser is quite straight forward and does not
copy any data, just returns tokens with offsets
into the input buffer. So it's relatively efficient
and simple to use.

The code is not fully checkpatch clean, but I didn't
want to completely fork the upstream code.

Original source: http://zserge.bitbucket.org/jsmn.html

In addition I added a simple wrapper that mmaps a json
file and provides some straight forward access functions.

Used in follow-on patches to parse event files.

Acked-by: Namhyung Kim 
Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
---

Changelog[v9] (by Sukadev Bhattiprolu)
Rebase to 4.0 and fix minor conflicts in tools/perf/Makefile.perf
Report error if specified events file is invalid.

v2: Address review feedback.
v3: Minor checkpatch fixes.
---
 tools/perf/Makefile.perf |4 +
 tools/perf/util/jsmn.c   |  313 ++
 tools/perf/util/jsmn.h   |   67 ++
 tools/perf/util/json.c   |  160 
 tools/perf/util/json.h   |   13 ++
 5 files changed, 557 insertions(+)
 create mode 100644 tools/perf/util/jsmn.c
 create mode 100644 tools/perf/util/jsmn.h
 create mode 100644 tools/perf/util/json.c
 create mode 100644 tools/perf/util/json.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index aa6a504..a558eb3 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -331,6 +331,8 @@ LIB_H += ui/ui.h
 LIB_H += util/data.h
 LIB_H += util/kvm-stat.h
 LIB_H += util/thread-stack.h
+LIB_H += util/jsmn.h
+LIB_H += util/json.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -410,6 +412,8 @@ LIB_OBJS += $(OUTPUT)util/data.o
 LIB_OBJS += $(OUTPUT)util/tsc.o
 LIB_OBJS += $(OUTPUT)util/cloexec.o
 LIB_OBJS += $(OUTPUT)util/thread-stack.o
+LIB_OBJS += $(OUTPUT)util/jsmn.o
+LIB_OBJS += $(OUTPUT)util/json.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/util/jsmn.c b/tools/perf/util/jsmn.c
new file mode 100644
index 000..11d1fa1
--- /dev/null
+++ b/tools/perf/util/jsmn.c
@@ -0,0 +1,313 @@
+/*
+ * Copyright (c) 2010 Serge A. Zaitsev
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ * Slightly modified by AK to not assume 0 terminated input.
+ */
+
+#include 
+#include "jsmn.h"
+
+/*
+ * Allocates a fresh unused token from the token pool.
+ */
+static jsmntok_t *jsmn_alloc_token(jsmn_parser *parser,
+  jsmntok_t *tokens, size_t num_tokens)
+{
+   jsmntok_t *tok;
+
+   if ((unsigned)parser->toknext >= num_tokens)
+   return NULL;
+   tok = &tokens[parser->toknext++];
+   tok->start = tok->end = -1;
+   tok->size = 0;
+   return tok;
+}
+
+/*
+ * Fills token type and boundaries.
+ */
+static void jsmn_fill_token(jsmntok_t *token, jsmntype_t type,
+   int start, int end)
+{
+   token->type = type;
+   token->start = start;
+   token->end = end;
+   token->size = 0;
+}
+
+/*
+ * Fills next available token with JSON primitive.
+ */
+static jsmnerr_t jsmn_parse_primitive(jsmn_parser *parser, const char *js,
+ size_t len,
+ jsmntok_t *tokens, size_t num_tokens)
+{
+   jsmntok_t *token;
+   int start;
+
+   start = parser->pos;
+
+   for (; parser->pos < len; parser->pos++) {
+   switch (js[parser->pos]) {
+#ifndef JSMN_STRICT
+   /*
+* In strict mode primitive must be followed by ","
+* or "}" or "]"
+*/
+   case ':':
+#endif
+   case '\t':
+  

[PATCHv2 net-next 2/2] net: Export IGMP/MLD message validation code

2015-04-13 Thread Linus Lüssing
With this patch, the IGMP and MLD message validation functions are moved
from the bridge code to IPv4/IPv6 multicast files. Some small
refactoring was done to enhance readibility and to iron out some
differences in behaviour between the IGMP and MLD parsing code (e.g. the
skb-cloning of MLD messages is now only done if necessary, just like the
IGMP part always did).

Finally, these IGMP and MLD message validation functions are exported so
that not only the bridge can use it but batman-adv later, too.

Signed-off-by: Linus Lüssing 
---
 include/linux/igmp.h  |1 +
 include/linux/skbuff.h|3 +
 include/net/addrconf.h|1 +
 net/bridge/br_multicast.c |  218 +++--
 net/core/skbuff.c |   38 
 net/ipv4/igmp.c   |  152 +++
 net/ipv6/Makefile |1 +
 net/ipv6/mcast_snoop.c|  202 +
 8 files changed, 428 insertions(+), 188 deletions(-)
 create mode 100644 net/ipv6/mcast_snoop.c

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 2c677af..193ad48 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -130,5 +130,6 @@ extern void ip_mc_unmap(struct in_device *);
 extern void ip_mc_remap(struct in_device *);
 extern void ip_mc_dec_group(struct in_device *in_dev, __be32 addr);
 extern void ip_mc_inc_group(struct in_device *in_dev, __be32 addr);
+int ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed);
 
 #endif
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0991259..79d8e8b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3404,6 +3404,9 @@ static inline void skb_checksum_none_assert(const struct 
sk_buff *skb)
 bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off);
 
 int skb_checksum_setup(struct sk_buff *skb, bool recalculate);
+int skb_checksum_trimmed(struct sk_buff *skb, unsigned int transport_len,
+__sum16(*skb_check_func)(struct sk_buff *skb),
+struct sk_buff **skb_trimmed);
 
 u32 skb_get_poff(const struct sk_buff *skb);
 u32 __skb_get_poff(const struct sk_buff *skb, void *data,
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 80456f7..def59d3 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -142,6 +142,7 @@ void ipv6_mc_unmap(struct inet6_dev *idev);
 void ipv6_mc_remap(struct inet6_dev *idev);
 void ipv6_mc_init_dev(struct inet6_dev *idev);
 void ipv6_mc_destroy_dev(struct inet6_dev *idev);
+int ipv6_mc_check_mld(struct sk_buff *skb, struct sk_buff **skb_trimmed);
 void addrconf_dad_failure(struct inet6_ifaddr *ifp);
 
 bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b52f4cb..c2115b1 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -975,9 +975,6 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge 
*br,
int err = 0;
__be32 group;
 
-   if (!pskb_may_pull(skb, sizeof(*ih)))
-   return -EINVAL;
-
ih = igmpv3_report_hdr(skb);
num = ntohs(ih->ngrec);
len = sizeof(*ih);
@@ -1248,25 +1245,14 @@ static int br_ip4_multicast_query(struct net_bridge *br,
max_delay = 10 * HZ;
group = 0;
}
-   } else {
-   if (!pskb_may_pull(skb, sizeof(struct igmpv3_query))) {
-   err = -EINVAL;
-   goto out;
-   }
-
+   } else if (skb->len >= sizeof(*ih3)) {
ih3 = igmpv3_query_hdr(skb);
if (ih3->nsrcs)
goto out;
 
max_delay = ih3->code ?
IGMPV3_MRC(ih3->code) * (HZ / IGMP_TIMER_SCALE) : 1;
-   }
-
-   /* RFC2236+RFC3376 (IGMPv2+IGMPv3) require the multicast link layer
-* all-systems destination addresses (224.0.0.1) for general queries
-*/
-   if (!group && iph->daddr != htonl(INADDR_ALLHOSTS_GROUP)) {
-   err = -EINVAL;
+   } else {
goto out;
}
 
@@ -1329,12 +1315,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
(port && port->state == BR_STATE_DISABLED))
goto out;
 
-   /* RFC2710+RFC3810 (MLDv1+MLDv2) require link-local source addresses */
-   if (!(ipv6_addr_type(&ip6h->saddr) & IPV6_ADDR_LINKLOCAL)) {
-   err = -EINVAL;
-   goto out;
-   }
-
if (skb->len == sizeof(*mld)) {
if (!pskb_may_pull(skb, sizeof(*mld))) {
err = -EINVAL;
@@ -1358,14 +1338,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 
is_general_query = group && ipv6_addr_any(group);
 
-   /* RFC2710+RFC3810 (MLDv1+MLDv2) require the multicast link layer
-* all-nodes destination address (ff02:

Re: [PATCH v6 1/3] nand: pl353: Add basic driver for arm pl353 smc nand interface

2015-04-13 Thread punnaiah choudary kalluri
Hi Paul Bolle,

On Tue, Apr 14, 2015 at 12:27 AM, Paul Bolle  wrote:
> On Mon, 2015-04-13 at 21:42 +0530, Punnaiah Choudary Kalluri wrote:
>
>> --- a/drivers/mtd/nand/Makefile
>> +++ b/drivers/mtd/nand/Makefile
>
>> +obj-$(CONFIG_MTD_NAND_PL353) += pl353_nand.o
>
> (I think pl353_nand.o can be part of a module. If that's incorrect, you
> can stop reading here.)
>
>> --- /dev/null
>> +++ b/drivers/mtd/nand/pl353_nand.c
>
>> + * This program is free software; you can redistribute it and/or modify it 
>> under
>> + * the terms of the GNU General Public License version 2 as published by the
>> + * Free Software Foundation; either version 2 of the License, or (at your
>> + * option) any later version.
>
> This states the license of this driver is GPL v2 or later.
>
>> +MODULE_LICENSE("GPL v2");
>
> And according to include/linux/module.h this states the license is GPL
> v2. So either the comment at the top of this file or the license ident
> used in the MODULE_LICENSE() macro needs to change.

Thanks for the review. I will change the licence ident to "GPL".

I will wait some time for further functional comments on this driver
before sending the next version
of patches.

Regards,
Punnaiah

>
> Thanks,
>
>
> Paul Bolle
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 net-next 1/2] bridge: multicast: call skb_checksum_{simple_, }validate

2015-04-13 Thread Linus Lüssing
Let's use these new, neat helpers.

Signed-off-by: Linus Lüssing 
---
 net/bridge/br_multicast.c |   28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 4b6722f..b52f4cb 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1610,16 +1610,8 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
if (!pskb_may_pull(skb2, sizeof(*ih)))
goto out;
 
-   switch (skb2->ip_summed) {
-   case CHECKSUM_COMPLETE:
-   if (!csum_fold(skb2->csum))
-   break;
-   /* fall through */
-   case CHECKSUM_NONE:
-   skb2->csum = 0;
-   if (skb_checksum_complete(skb2))
-   goto out;
-   }
+   if (skb_checksum_simple_validate(skb2))
+   goto out;
 
err = 0;
 
@@ -1737,20 +1729,8 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
 
ip6h = ipv6_hdr(skb2);
 
-   switch (skb2->ip_summed) {
-   case CHECKSUM_COMPLETE:
-   if (!csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, skb2->len,
-   IPPROTO_ICMPV6, skb2->csum))
-   break;
-   /*FALLTHROUGH*/
-   case CHECKSUM_NONE:
-   skb2->csum = ~csum_unfold(csum_ipv6_magic(&ip6h->saddr,
-   &ip6h->daddr,
-   skb2->len,
-   IPPROTO_ICMPV6, 0));
-   if (__skb_checksum_complete(skb2))
-   goto out;
-   }
+   if (skb_checksum_validate(skb2, IPPROTO_ICMPV6, ip6_compute_pseudo))
+   goto out;
 
err = 0;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2 net-next 0/2] Exporting IGMP/MLD checking from bridge code

2015-04-13 Thread Linus Lüssing
The multicast optimizations in batman-adv are yet only usable and
enabled in non-bridged scenarios. To be able to support bridged setups
batman-adv needs to be able to detect IGMP/MLD queriers and reports on
mesh nodes without bridges, too. See the following link for details:

http://www.open-mesh.org/projects/batman-adv/wiki/Multicast-optimizations-listener-reports

To avoid duplicate code between the bridge and batman-adv, the IGMP/MLD
message validation code is moved from the bridge to the IPv4/IPv6 stack.

On the way, some refactoring to increase readability and to iron out
some subtle differences between the IGMP and MLD parsing code is done.

Cheers, Linus


Changelog v2:
* Updated copyright for mcast_snoop.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] perf probe: Find compilation directory path for lazy matching

2015-04-13 Thread Masami Hiramatsu
(2015/04/14 8:10), Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 13, 2015 at 02:18:40PM +0900, Naohiro Aota escreveu:
>> If we use lazy matching, it failed to open a souce file if perf command
>> is invoked outside of compilation directory:
>>
>> $ perf probe -a '__schedule;clear_*'
>> Failed to open kernel/sched/core.c: No such file or directory
>>   Error: Failed to add events. (-2)
> 
> Masami, you mean this one, right?

Yes, this is what I meant :)

Thank you!

> 
> - Arnaldo
>  
>> OTOH, other commands like "probe -L" can solve the souce directory by
>> themselves. Let's make it possible for lazy matching too!
>>
>> Signed-off-by: Naohiro Aota 
>> ---
>>  tools/perf/util/probe-event.c  | 59 ---
>>  tools/perf/util/probe-finder.c | 71 
>> +-
>>  tools/perf/util/probe-finder.h |  4 +++
>>  3 files changed, 74 insertions(+), 60 deletions(-)
>>
>> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
>> index f272a71..32a429b 100644
>> --- a/tools/perf/util/probe-event.c
>> +++ b/tools/perf/util/probe-event.c
>> @@ -648,65 +648,6 @@ static int try_to_find_probe_trace_events(struct 
>> perf_probe_event *pev,
>>  return ntevs;
>>  }
>>  
>> -/*
>> - * Find a src file from a DWARF tag path. Prepend optional source path 
>> prefix
>> - * and chop off leading directories that do not exist. Result is passed 
>> back as
>> - * a newly allocated path on success.
>> - * Return 0 if file was found and readable, -errno otherwise.
>> - */
>> -static int get_real_path(const char *raw_path, const char *comp_dir,
>> - char **new_path)
>> -{
>> -const char *prefix = symbol_conf.source_prefix;
>> -
>> -if (!prefix) {
>> -if (raw_path[0] != '/' && comp_dir)
>> -/* If not an absolute path, try to use comp_dir */
>> -prefix = comp_dir;
>> -else {
>> -if (access(raw_path, R_OK) == 0) {
>> -*new_path = strdup(raw_path);
>> -return *new_path ? 0 : -ENOMEM;
>> -} else
>> -return -errno;
>> -}
>> -}
>> -
>> -*new_path = malloc((strlen(prefix) + strlen(raw_path) + 2));
>> -if (!*new_path)
>> -return -ENOMEM;
>> -
>> -for (;;) {
>> -sprintf(*new_path, "%s/%s", prefix, raw_path);
>> -
>> -if (access(*new_path, R_OK) == 0)
>> -return 0;
>> -
>> -if (!symbol_conf.source_prefix) {
>> -/* In case of searching comp_dir, don't retry */
>> -zfree(new_path);
>> -return -errno;
>> -}
>> -
>> -switch (errno) {
>> -case ENAMETOOLONG:
>> -case ENOENT:
>> -case EROFS:
>> -case EFAULT:
>> -raw_path = strchr(++raw_path, '/');
>> -if (!raw_path) {
>> -zfree(new_path);
>> -return -ENOENT;
>> -}
>> -continue;
>> -
>> -default:
>> -zfree(new_path);
>> -return -errno;
>> -}
>> -}
>> -}
>> -
>>  #define LINEBUF_SIZE 256
>>  #define NR_ADDITIONAL_LINES 2
>>  
>> diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
>> index 46f009a..0fd2df4 100644
>> --- a/tools/perf/util/probe-finder.c
>> +++ b/tools/perf/util/probe-finder.c
>> @@ -849,11 +849,22 @@ static int probe_point_lazy_walker(const char *fname, 
>> int lineno,
>>  static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf)
>>  {
>>  int ret = 0;
>> +char *fpath;
>>  
>>  if (intlist__empty(pf->lcache)) {
>> +const char *comp_dir;
>> +
>> +comp_dir = cu_get_comp_dir(&pf->cu_die);
>> +ret = get_real_path(pf->fname, comp_dir, &fpath);
>> +if (ret < 0) {
>> +pr_warning("Failed to find source file path.\n");
>> +return ret;
>> +}
>> +
>>  /* Matching lazy line pattern */
>> -ret = find_lazy_match_lines(pf->lcache, pf->fname,
>> +ret = find_lazy_match_lines(pf->lcache, fpath,
>>  pf->pev->point.lazy_line);
>> +free(fpath);
>>  if (ret <= 0)
>>  return ret;
>>  }
>> @@ -1616,3 +1627,61 @@ found:
>>  return (ret < 0) ? ret : lf.found;
>>  }
>>  
>> +/*
>> + * Find a src file from a DWARF tag path. Prepend optional source path 
>> prefix
>> + * and chop off leading directories that do not exist. Result is passed 
>> back as
>> + * a newly allocated path on success.
>> + * Return 0 if file was found and readable, -errno otherwise.
>> + */
>> +int get_real_path(const char *raw_path, const char *comp_d

Re: [PATCH v2 04/10] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2015-04-13 Thread David Hildenbrand
> On Tue, Mar 31, 2015 at 04:08:02PM +0100, Alex Bennée wrote:
> > This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
> > ioctl. Currently any operation flag will return EINVAL. Actual
> > functionality will be added with further patches.
> > 
> > Signed-off-by: Alex Bennée .
> > 
> > ---
> > v2
> >   - simplified form of the ioctl (stuff will go into setup_debug)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index b112efc..06c5064 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2604,7 +2604,7 @@ handled.
> >  4.87 KVM_SET_GUEST_DEBUG
> >  
> >  Capability: KVM_CAP_SET_GUEST_DEBUG
> > -Architectures: x86, s390, ppc
> > +Architectures: x86, s390, ppc, arm64
> >  Type: vcpu ioctl
> >  Parameters: struct kvm_guest_debug (in)
> >  Returns: 0 on success; -1 on error
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 5560f74..445933d 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -183,6 +183,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> > ext)
> > case KVM_CAP_ARM_PSCI:
> > case KVM_CAP_ARM_PSCI_0_2:
> > case KVM_CAP_READONLY_MEM:
> > +   case KVM_CAP_SET_GUEST_DEBUG:
> > r = 1;
> > break;
> 
> shouldn't you wait with advertising this capability until you've
> implemented support for it?
> 

I think this would work for now, however it's not very practical
- in the end one has to sense which debug flags are actually supported.

Question is if he wants to add initial support and extend functionality and
flags with each patch or enable the whole set of features in one shot at the
end.

Doing the latter seems more practicable to me (especially as the debug features
are added in the same patch series).

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] iommu/mediatek: Add mt8173 IOMMU driver

2015-04-13 Thread Yong Wu
Hi Tomasz,

 Thanks very much for you suggestion and explain so detail.
 please help check below.

On Fri, 2015-03-27 at 18:41 +0900, Tomasz Figa wrote:
> Hi Yong Wu,
> 
> Sorry for long delay, I had to figure out some time to look at this again.
> 
> On Wed, Mar 18, 2015 at 8:22 PM, Yong Wu  wrote:
> >>
> >> > +   imudev = piommu->dev;
> >> > +
> >> > +   spin_lock_irqsave(&priv->portlock, flags);
> >>
> >> What is protected by this spinlock?
> > We will write a register of the local arbiter while config port. If
> > some modules are in the same local arbiter, it may be overwrite. so I
> > add it here.
> >>
> 
> OK. Maybe it could be called larb_lock then? It would be good to have
> structures or code that should be running under this spinlock
> annotated with proper comments. And purpose of the lock documented in
> a comment as well (probably in a kerneldoc-style documentation of
> priv).
   Thanks. I have move the spinlock into the smi driver, it will lock
for writing the local arbiter regsiter only.
> 
> >> > +static void mtk_iommu_detach_device(struct iommu_domain *domain,
> >> > +   struct device *dev)
> >> > +{
> >>
> >> No hardware (de)configuration or clean-up necessary?
> > I will add it. Actually we design like this:If a device have attached to
> > iommu domain, it won't detach from it.
> 
> Isn't proper clean-up required for module removal? Some drivers might
> be required to be loadable modules, which should be unloadable.
> 
> >>
> >> > +
> >> > +   piommu->protect_va = devm_kmalloc(piommu->dev, 
> >> > MTK_PROTECT_PA_ALIGN*2,
> >>
> >> style: Operators like * should have space on both sides.
> >>
> >> > + GFP_KERNEL);
> >>
> >> Shouldn't dma_alloc_coherent() be used for this?
> >  We don't care the data in it. I think they are the same. Could you
> > help tell me why dma_alloc_coherent may be better.
> 
> Can you guarantee that at the time you allocate the memory using
> devm_kmalloc() the memory is not dirty (i.e. some write back data are
> stored in CPU cache) and is not going to be written back in some time,
> overwriting data put there by IOMMU hardware?
> 
As I noted in the function "mtk_iommu_hw_init":

   /* protect memory,HW will write here while translation fault */
   protectpa = __virt_to_phys(piommu->protect_va);

 We don’t care the content of this buffer, It is ok even though its
data is dirty.
It seem to be a the protect memory. While a translation fault
happened, The iommu HW will overwrite here instead of writing to the
fault physical address which may be 0 or some random address.

> >> > +
> >> > +   iommu_set_fault_handler(domain, mtk_iommu_fault_handler, piommu);
> >>
> >> I don't see any other drivers doing this. Isn't this for upper layers,
> >> so that they can set their own generic fault handlers?
> >  I think that this function is related with the iommu domain, we
> > have only one multimedia iommu domain. so I add it after the iommu
> > domain are created.
> 
> No, this function is for drivers of IOMMU clients (i.e. master IP
> blocks) which want to subscribe to page fault to do things like paging
> on demand and so on. It shouldn't be called by IOMMU driver. Please
> see other IOMMU drivers, for example rockchip-iommmu.c.
 Thanks. I have read it. I will delete it and print the error info
in the ISR. Also call the report_iommu_fault in the ISR.

> Best regards,
> Tomasz


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] ARM: fix module-bound check in setting page attributes

2015-04-13 Thread Hillf Danton
Ping

> -Original Message-
> From: Hillf Danton [mailto:hillf...@alibaba-inc.com]
> Sent: Tuesday, April 07, 2015 4:31 PM
> To: Hillf Danton; 'Laura Abbott'
> Cc: 'Russell King'; 'linux-kernel'; li...@arm.linux.org.uk
> Subject: Re: [patch] ARM: fix module-bound check in setting page attributes
> 
> Ping
> >
> > It was introduced in commit f2ca09f381a59
> > (ARM: 8311/1: Don't use is_module_addr in setting page attributes)
> >
> > We have no need to check start twice, but see if end is also in range.
> >
> > Signed-off-by: Hillf Danton 
> > ---
> >
> > --- a/arch/arm/mm/pageattr.cWed Mar 25 11:55:13 2015
> > +++ b/arch/arm/mm/pageattr.cWed Mar 25 11:57:31 2015
> > @@ -52,7 +52,7 @@ static int change_memory_common(unsigned
> > if (start < MODULES_VADDR || start >= MODULES_END)
> > return -EINVAL;
> >
> > -   if (end < MODULES_VADDR || start >= MODULES_END)
> > +   if (end < MODULES_VADDR || end >= MODULES_END)
> > return -EINVAL;
> >
> > data.set_mask = set_mask;
> > --


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] kvm: mmu: don't do overflow memslot check

2015-04-13 Thread Wanpeng Li
As Andre pointed out:

| I don't understand the value of this check here. Are we looking for a
| broken memslot? Shouldn't this be a BUG_ON? Is this the place to care
| about these things? npages is capped to KVM_MEM_MAX_NR_PAGES, i.e.
| 2^31. A 64 bit overflow would be caused by a gigantic gfn_start which
| would be trouble in many other ways.

This patch drops the memslot overflow check to make the codes more simple.

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/mmu.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a0d77e..9265fda 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4505,19 +4505,12 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
bool flush = false;
unsigned long *rmapp;
unsigned long last_index, index;
-   gfn_t gfn_start, gfn_end;
 
spin_lock(&kvm->mmu_lock);
 
-   gfn_start = memslot->base_gfn;
-   gfn_end = memslot->base_gfn + memslot->npages - 1;
-
-   if (gfn_start >= gfn_end)
-   goto out;
-
rmapp = memslot->arch.rmap[0];
-   last_index = gfn_to_index(gfn_end, memslot->base_gfn,
-   PT_PAGE_TABLE_LEVEL);
+   last_index = gfn_to_index(memslot->base_gfn + memslot->npages - 1,
+   memslot->base_gfn, PT_PAGE_TABLE_LEVEL);
 
for (index = 0; index <= last_index; ++index, ++rmapp) {
if (*rmapp)
@@ -4535,7 +4528,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
if (flush)
kvm_flush_remote_tlbs(kvm);
 
-out:
spin_unlock(&kvm->mmu_lock);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] kvm: mmu: fix catch transparent huge page backing

2015-04-13 Thread Wanpeng Li
PageTransCompound() can't guarantee the page is a transparent huge page
since it returns true for both transparent huge and hugetlbfs pages.

This patch fixes it by checking the page is also !hugetlbfs page.

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 146f295..2a0d77e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4487,7 +4487,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 */
if (sp->role.direct &&
!kvm_is_reserved_pfn(pfn) &&
-   PageTransCompound(pfn_to_page(pfn))) {
+   !PageHuge(pfn_to_page(pfn)) &&
+   PageTransHuge(pfn_to_page(pfn))) {
drop_spte(kvm, sptep);
sptep = rmap_get_first(*rmapp, &iter);
need_tlb_flush = 1;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kvm: mmu: lazy collapse small sptes into large sptes

2015-04-13 Thread Andres Lagar-Cavilla
On Mon, Apr 13, 2015 at 10:25 PM, Wanpeng Li  wrote:
> Hi Andres,
> On Fri, Apr 10, 2015 at 11:05:26AM -0700, Andres Lagar-Cavilla wrote:
> [...]
>>> +   if (sp->role.direct &&
>>> +   !kvm_is_reserved_pfn(pfn) &&
>>> +   PageTransCompound(pfn_to_page(pfn))) {
>>
>>Not your fault, but PageTransCompound is very unhappy naming, as it
>>also yields true for PageHuge. Suggestion: document this check covers
>>static hugetlbfs, or switch to PageCompound() check.
>>
>>A slightly bolder approach would be to refactor and reuse the nearly
>>identical check done in transparent_hugepage_adjust, instead of
>>open-coding here. In essence this code is asking for the same check,
>>plus the out-of-band check for static hugepages.
>
> PageCompound() check still return true for both transparent huge pages
> and hugetlbfs pages, !PageHuge(page) && PageTransHuge(page) check can
> guarantee to catch the right transparent huge pages just as my old commit
> e76d30e20be5fc ("mm/hwpoison: fix test for a transparent huge page").
> I will send a patch to fix this.
>
Why would you want to "fix" it that way? Aren't static hugepages supported?

(PageAnon is an inline check and much cheaper than !PageHuge(), which
is an actual function call)

Please consider my suggestion about refactoring the similar checks in
transparent_hugepage_adjust.

Thanks a ton
Andres
>>
>>
>>> +   drop_spte(kvm, sptep);
>>> +   sptep = rmap_get_first(*rmapp, &iter);
>>> +   need_tlb_flush = 1;
>>> +   } else
>>> +   sptep = rmap_get_next(&iter);
>>> +   }
>>> +
>>> +   return need_tlb_flush;
>>> +}
>>> +
>>> +void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>>> +   struct kvm_memory_slot *memslot)
>>> +{
>>> +   bool flush = false;
>>> +   unsigned long *rmapp;
>>> +   unsigned long last_index, index;
>>> +   gfn_t gfn_start, gfn_end;
>>> +
>>> +   spin_lock(&kvm->mmu_lock);
>>> +
>>> +   gfn_start = memslot->base_gfn;
>>> +   gfn_end = memslot->base_gfn + memslot->npages - 1;
>>> +
>>> +   if (gfn_start >= gfn_end)
>>> +   goto out;
>>
>>I don't understand the value of this check here. Are we looking for a
>>broken memslot? Shouldn't this be a BUG_ON? Is this the place to care
>>about these things? npages is capped to KVM_MEM_MAX_NR_PAGES, i.e.
>>2^31. A 64 bit overflow would be caused by a gigantic gfn_start which
>>would be trouble in many other ways.
>>
>>All this to say: please remove the above 5 lines and make code simpler.
>
> I will send a patch to cleanup it. Thanks for your review. :)
>
> Regards,
> Wanpeng Li
>



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/3] powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior

2015-04-13 Thread Preeti U Murthy
Hi Shreyas,

On 04/14/2015 07:26 AM, Shreyas B. Prabhu wrote:
> Fastsleep is one of the idle state which cpuidle subsystem currently
> uses on power8 machines. In this state L2 cache is brought down to a
> threshold voltage. Therefore when the core is in fastsleep, the
> communication between L2 and L3 needs to be fenced. But there is a bug
> in the current power8 chips surrounding this fencing.
> 
> OPAL provides a workaround which precludes the possibility of hitting
> this bug. But running with this workaround applied causes checkstop
> if any correctable error in L2 cache directory is detected. Hence OPAL
> also provides a way to undo the workaround.
> 
> In the existing implementation, workaround is applied by the last thread
> of the core entering fastsleep and undone by the first thread waking up.
> But this has a performance cost. These OPAL calls account for roughly
> 4000 cycles everytime the core has to enter or wakeup from fastsleep.
> 
> This patch introduces a sysfs attribute (fastsleep_workaround_state)
> to choose the behavior of this workaround.
> 
> By default, fastsleep_workaround_state = dynamic. In this case, workaround
> is applied/undone everytime the core enters/exits fastsleep.
> 
> fastsleep_workaround_state = applyonce. In this case the workaround is
> applied once on all the cores and never undone. This can be triggered by
> echo applyonce > /sys/devices/system/cpu/fastsleep_workaround_state

I was wondering if we really need such an elaborate design for this
sysfs file. Why not a sysfs file called fastsleep_workaround_apply_once,
which is set to '0' by default and the only value that it can take is
'1' ? The name easily implies that the workaround is applied only once
if it is set. I can see that this can cut down a good chunk of code from
this patch. I just didn't find too much value in having so much code for
a simple 'on' knob.

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1

2015-04-13 Thread Minfei Huang
On 04/14/15 at 12:32P, Josh Poimboeuf wrote:
> On Tue, Apr 14, 2015 at 01:29:50PM +0800, Minfei Huang wrote:
> > On 04/14/15 at 12:11P, Josh Poimboeuf wrote:
> > > On Tue, Apr 14, 2015 at 01:03:48PM +0800, Minfei Huang wrote:
> > > > On 04/13/15 at 11:57P, Josh Poimboeuf wrote:
> > > > > On Tue, Apr 14, 2015 at 08:26:29AM +0800, Minfei Huang wrote:
> > > > > > On 04/13/15 at 06:13P, Josh Poimboeuf wrote:
> > > > > > > On Sun, Apr 12, 2015 at 09:15:54PM +0800, Minfei Huang wrote:
> > > > > > > > For now, the kallsyms will only store the first 
> > > > > > > > (KSYM_NAME_LEN-1). The
> > > > > > > > kallsyms name is same for the function which first 
> > > > > > > > (KSYM_NAME_LEN-1) is
> > > > > > > > same, but the rest is not.
> > > > > > > > 
> > > > > > > > Then function will never be patched, although function name and 
> > > > > > > > address
> > > > > > > > are provided both. The reason caused this bug is livepatch cannt
> > > > > > > > recognize the function name.
> > > > > > > > 
> > > > > > > > Now, livepatch will verify the function name with first 
> > > > > > > > (KSYM_NAME_LEN-1)
> > > > > > > > and address, if provided. Once they are matched, we can confirm 
> > > > > > > > that the
> > > > > > > > patched function is found.
> > > > > > > 
> > > > > > > From scripts/kallsyms.c:
> > > > > > > 
> > > > > > >   if (strlen(str) > KSYM_NAME_LEN) {
> > > > > > >   fprintf(stderr, "Symbol %s too long for kallsyms (%zu 
> > > > > > > vs %d).\n"
> > > > > > >   "Please increase KSYM_NAME_LEN both in 
> > > > > > > kernel and kallsyms.c\n",
> > > > > > >   str, strlen(str), KSYM_NAME_LEN);
> > > > > > >   return -1;
> > > > > > >   }
> > > > > > > 
> > > > > > > So I think such a long symbol name wouldn't be added to the 
> > > > > > > kallsyms
> > > > > > > database in the first place.
> > > > > > > 
> > > > > > 
> > > > > > Actually, kernel allows overlength function name to be used. 
> > > > > > Following
> > > > > > is my testing module.
> > > > > > 
> > > > > > We can got the address in /proc/kallsyms.
> > > > > > $ cat /proc/kallsyms | grep sysfs_print
> > > > > > a000 t 
> > > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > > > > >   [sysfs_print]
> > > > > > a010 t kobj_release [sysfs_print]
> > > > > > a020 t 
> > > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > > > > >   [sysfs_print]
> > > > > > a4e0 b root_kobj[sysfs_print]
> > > > > > a200 d print_ktype  [sysfs_print]
> > > > > > a4a0 b print_kobj   [sysfs_print]
> > > > > > a04c t sys_print_exit   [sysfs_print]
> > > > > > a144 r __func__.14514   [sysfs_print]
> > > > > > a230 d kobj_attrs   [sysfs_print]
> > > > > > a240 d sys_print_kobj_attr  [sysfs_print]
> > > > > > a260 d __this_module[sysfs_print]
> > > > > > a04c t cleanup_module   [sysfs_print]
> > > > > > 
> > > > > > Code:
> > > > > > 
> > > > > > static ssize_t 
> > > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_store(struct
> > > > > >  kobject *kobj, s
> > > > > > const char *buf, size_t count)
> > > > > > {
> > > > > > return count;
> > > > > > }
> > > > > > 
> > > > > > static ssize_t 
> > > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_show(struct
> > > > > >  kobject *kobj,
> > > > > > struct kobj_attribute *attr, char *buf)
> > > > > > {
> > > > > > return snprintf(buf, PAGE_SIZE-1, "%s\n", "This is printed by 
> > > > > > module");
> > > > > > }
> > > > > > 
> > > > > > static struct kobj_attribute sys_print_kobj_attr = 
> > > > > > __ATTR_RW(sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_p
> > > > > > static struct attribute *kobj_attrs[] = {
> > > > > > &sys_print_kobj_attr.attr,
> > > > > > NULL
> > > > > > };
> > > > > > 
> > > > > 
> > > > > Hm, this seems like a kallsyms bug.  IMO it should either fail the 
> > > > > build
> > > > > or omit the symbol from the kallsyms db.  Truncating it seems 
> > > > > dangerous
> > > > > and counterintuitive.
> > > > > 
> > > > 
> > > > Kallsyms will record all of the function name, without truncating it.
> > > > But the kallsyms will return the truncated function name which is max to
> > > > 127.
> > > >
> > > > > But regardless I really don't see a good reason to encourage this kind
> > > > > of insanity in the livepatch code.
> > > > > 
> > > > 
> > > > Yes, the above code is terrible, but we cannt stop user composing like
> > > > that.
> > > > 
> > > > Once the fu

Re:salve

2015-04-13 Thread rnl
benvenuto al nostro negozio
iPhone 6, 280 euro, laptop, in bicicletta, macchina fotografica, cellulare, tv 
 il prezzo è molto buono
site:  ereaaal.  com


Re: [PATCH v3] kvm: mmu: lazy collapse small sptes into large sptes

2015-04-13 Thread Wanpeng Li
Hi Andres,
On Fri, Apr 10, 2015 at 11:05:26AM -0700, Andres Lagar-Cavilla wrote:
[...]
>> +   if (sp->role.direct &&
>> +   !kvm_is_reserved_pfn(pfn) &&
>> +   PageTransCompound(pfn_to_page(pfn))) {
>
>Not your fault, but PageTransCompound is very unhappy naming, as it
>also yields true for PageHuge. Suggestion: document this check covers
>static hugetlbfs, or switch to PageCompound() check.
>
>A slightly bolder approach would be to refactor and reuse the nearly
>identical check done in transparent_hugepage_adjust, instead of
>open-coding here. In essence this code is asking for the same check,
>plus the out-of-band check for static hugepages.

PageCompound() check still return true for both transparent huge pages
and hugetlbfs pages, !PageHuge(page) && PageTransHuge(page) check can 
guarantee to catch the right transparent huge pages just as my old commit 
e76d30e20be5fc ("mm/hwpoison: fix test for a transparent huge page"). 
I will send a patch to fix this.

>
>
>> +   drop_spte(kvm, sptep);
>> +   sptep = rmap_get_first(*rmapp, &iter);
>> +   need_tlb_flush = 1;
>> +   } else
>> +   sptep = rmap_get_next(&iter);
>> +   }
>> +
>> +   return need_tlb_flush;
>> +}
>> +
>> +void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>> +   struct kvm_memory_slot *memslot)
>> +{
>> +   bool flush = false;
>> +   unsigned long *rmapp;
>> +   unsigned long last_index, index;
>> +   gfn_t gfn_start, gfn_end;
>> +
>> +   spin_lock(&kvm->mmu_lock);
>> +
>> +   gfn_start = memslot->base_gfn;
>> +   gfn_end = memslot->base_gfn + memslot->npages - 1;
>> +
>> +   if (gfn_start >= gfn_end)
>> +   goto out;
>
>I don't understand the value of this check here. Are we looking for a
>broken memslot? Shouldn't this be a BUG_ON? Is this the place to care
>about these things? npages is capped to KVM_MEM_MAX_NR_PAGES, i.e.
>2^31. A 64 bit overflow would be caused by a gigantic gfn_start which
>would be trouble in many other ways.
>
>All this to say: please remove the above 5 lines and make code simpler.

I will send a patch to cleanup it. Thanks for your review. :)

Regards,
Wanpeng Li

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Align jump targets to 1 byte boundaries

2015-04-13 Thread Ingo Molnar

* Markus Trippelsdorf  wrote:

> On 2015.04.13 at 11:31 -0700, Linus Torvalds wrote:
> > On Mon, Apr 13, 2015 at 10:26 AM, Markus Trippelsdorf
> >  wrote:
> > >
> > > I must have made a measurement mistake above, because the actual code
> > > size savings are roughly 5%:
> > 
> > Can you check against the -fno-guess-branch-probability output?
> 
>textdata bss dec filename
>8746230  970072  802816 10519118 ./vmlinux gcc-5 (lto) 
>9202488  978512  811008 10992008 ./vmlinux gcc-5
>8036915  970296  802816 9810027  ./vmlinux gcc-5 (lto 
> -fno-guess-branch-probability)
>8593615  978512  811008 10383135 ./vmlinux gcc-5 
> (-fno-guess-branch-probability)

Just to make sure, could you please also apply the 3 alignment patches 
attached below? There's a lot of noise from extra alignment.

Having said that, LTO should have three main effects:

 1) better cross-unit inlining decisions

 2) better register allocation and clobbering knowledge (if a small 
function is known not to clobber caller-saved registers, then the 
saving can be skipped)

 3) better dead code elimination

1)-2) is probably worth the price, 3) in isolation isn't. So we'd have 
to estimate which one is how significant, to judge the value of LTO - 
but I haven't seen any effort so far to disambiguate it.

_Possibly_ if you build kernel/built-in.o only, and compared its 
sizes, that would help a bit, because the core kernel has very little 
dead code, giving a fairer estimation of 'true' optimizations.

Thanks,

Ingo

==

 arch/x86/Makefile | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 5ba2d9ce82dc..10989a73b986 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -77,6 +77,15 @@ else
 KBUILD_AFLAGS += -m64
 KBUILD_CFLAGS += -m64
 
+# Pack jump targets tightly, don't align them to the default 16 bytes:
+KBUILD_CFLAGS += -falign-jumps=1
+
+# Pack functions tightly as well:
+KBUILD_CFLAGS += -falign-functions=1
+
+# Pack loops tightly as well:
+KBUILD_CFLAGS += -falign-loops=1
+
 # Don't autogenerate traditional x87 instructions
 KBUILD_CFLAGS += $(call cc-option,-mno-80387)
 KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET 0/6] perf kmem: Implement page allocation analysis (v7)

2015-04-13 Thread Pekka Enberg
On Tue, Apr 14, 2015 at 5:52 AM, Namhyung Kim  wrote:
> Currently perf kmem command only analyzes SLAB memory allocation.  And
> I'd like to introduce page allocation analysis also.  Users can use
>  --slab and/or --page option to select it.  If none of these options
>  are used, it does slab allocation analysis for backward compatibility.

Nice addition!

Acked-by: Pekka Enberg 

for the whole series.

- Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1

2015-04-13 Thread Josh Poimboeuf
On Tue, Apr 14, 2015 at 01:29:50PM +0800, Minfei Huang wrote:
> On 04/14/15 at 12:11P, Josh Poimboeuf wrote:
> > On Tue, Apr 14, 2015 at 01:03:48PM +0800, Minfei Huang wrote:
> > > On 04/13/15 at 11:57P, Josh Poimboeuf wrote:
> > > > On Tue, Apr 14, 2015 at 08:26:29AM +0800, Minfei Huang wrote:
> > > > > On 04/13/15 at 06:13P, Josh Poimboeuf wrote:
> > > > > > On Sun, Apr 12, 2015 at 09:15:54PM +0800, Minfei Huang wrote:
> > > > > > > For now, the kallsyms will only store the first 
> > > > > > > (KSYM_NAME_LEN-1). The
> > > > > > > kallsyms name is same for the function which first 
> > > > > > > (KSYM_NAME_LEN-1) is
> > > > > > > same, but the rest is not.
> > > > > > > 
> > > > > > > Then function will never be patched, although function name and 
> > > > > > > address
> > > > > > > are provided both. The reason caused this bug is livepatch cannt
> > > > > > > recognize the function name.
> > > > > > > 
> > > > > > > Now, livepatch will verify the function name with first 
> > > > > > > (KSYM_NAME_LEN-1)
> > > > > > > and address, if provided. Once they are matched, we can confirm 
> > > > > > > that the
> > > > > > > patched function is found.
> > > > > > 
> > > > > > From scripts/kallsyms.c:
> > > > > > 
> > > > > > if (strlen(str) > KSYM_NAME_LEN) {
> > > > > > fprintf(stderr, "Symbol %s too long for kallsyms (%zu 
> > > > > > vs %d).\n"
> > > > > > "Please increase KSYM_NAME_LEN both in 
> > > > > > kernel and kallsyms.c\n",
> > > > > > str, strlen(str), KSYM_NAME_LEN);
> > > > > > return -1;
> > > > > > }
> > > > > > 
> > > > > > So I think such a long symbol name wouldn't be added to the kallsyms
> > > > > > database in the first place.
> > > > > > 
> > > > > 
> > > > > Actually, kernel allows overlength function name to be used. Following
> > > > > is my testing module.
> > > > > 
> > > > > We can got the address in /proc/kallsyms.
> > > > > $ cat /proc/kallsyms | grep sysfs_print
> > > > > a000 t 
> > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > > > >   [sysfs_print]
> > > > > a010 t kobj_release [sysfs_print]
> > > > > a020 t 
> > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > > > >   [sysfs_print]
> > > > > a4e0 b root_kobj[sysfs_print]
> > > > > a200 d print_ktype  [sysfs_print]
> > > > > a4a0 b print_kobj   [sysfs_print]
> > > > > a04c t sys_print_exit   [sysfs_print]
> > > > > a144 r __func__.14514   [sysfs_print]
> > > > > a230 d kobj_attrs   [sysfs_print]
> > > > > a240 d sys_print_kobj_attr  [sysfs_print]
> > > > > a260 d __this_module[sysfs_print]
> > > > > a04c t cleanup_module   [sysfs_print]
> > > > > 
> > > > > Code:
> > > > > 
> > > > > static ssize_t 
> > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_store(struct
> > > > >  kobject *kobj, s
> > > > > const char *buf, size_t count)
> > > > > {
> > > > > return count;
> > > > > }
> > > > > 
> > > > > static ssize_t 
> > > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_show(struct
> > > > >  kobject *kobj,
> > > > > struct kobj_attribute *attr, char *buf)
> > > > > {
> > > > > return snprintf(buf, PAGE_SIZE-1, "%s\n", "This is printed by 
> > > > > module");
> > > > > }
> > > > > 
> > > > > static struct kobj_attribute sys_print_kobj_attr = 
> > > > > __ATTR_RW(sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_p
> > > > > static struct attribute *kobj_attrs[] = {
> > > > > &sys_print_kobj_attr.attr,
> > > > > NULL
> > > > > };
> > > > > 
> > > > 
> > > > Hm, this seems like a kallsyms bug.  IMO it should either fail the build
> > > > or omit the symbol from the kallsyms db.  Truncating it seems dangerous
> > > > and counterintuitive.
> > > > 
> > > 
> > > Kallsyms will record all of the function name, without truncating it.
> > > But the kallsyms will return the truncated function name which is max to
> > > 127.
> > >
> > > > But regardless I really don't see a good reason to encourage this kind
> > > > of insanity in the livepatch code.
> > > > 
> > > 
> > > Yes, the above code is terrible, but we cannt stop user composing like
> > > that.
> > > 
> > > Once the function name is like above, user will never have chance to use
> > > livepatch.
> > 
> > Again, this seems like a kallsyms bug.  Fix the bug and the real world
> > need for this patch set goes away.  The user will be forced to either
> > shorten their function name or increase KSYM_NAME_LEN.
>

[PATCH v2 2/3] Remove celleb-only SCC PATA drivers

2015-04-13 Thread Daniel Axtens
The SCC PATA interface is only used by celleb.
celleb has been dropped [1], so drop the drivers.

[1] http://patchwork.ozlabs.org/patch/451730/

CC: Bartlomiej Zolnierkiewicz 
CC: Tejun Heo 
CC: "David S. Miller" 
CC: linux-...@vger.kernel.org
CC: Valentin Rothberg 
CC: m...@ellerman.id.au
CC: linuxppc-...@lists.ozlabs.org
Signed-off-by: Daniel Axtens 

---
v2: get name of ozlab*s*.org right. Sorry all.
---
 drivers/ata/Kconfig|9 -
 drivers/ata/Makefile   |1 -
 drivers/ata/pata_scc.c | 1110 
 drivers/ide/Kconfig|9 -
 drivers/ide/Makefile   |1 -
 drivers/ide/scc_pata.c |  887 --
 6 files changed, 2017 deletions(-)
 delete mode 100644 drivers/ata/pata_scc.c
 delete mode 100644 drivers/ide/scc_pata.c

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 5f60155..ee5209f 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -729,15 +729,6 @@ config PATA_SC1200
 
  If unsure, say N.
 
-config PATA_SCC
-   tristate "Toshiba's Cell Reference Set IDE support"
-   depends on PCI && PPC_CELLEB
-   help
- This option enables support for the built-in IDE controller on
- Toshiba Cell Reference Board.
-
- If unsure, say N.
-
 config PATA_SCH
tristate "Intel SCH PATA support"
depends on PCI
diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index b67e995..40f7865 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -75,7 +75,6 @@ obj-$(CONFIG_PATA_PDC_OLD)+= pata_pdc202xx_old.o
 obj-$(CONFIG_PATA_RADISYS) += pata_radisys.o
 obj-$(CONFIG_PATA_RDC) += pata_rdc.o
 obj-$(CONFIG_PATA_SC1200)  += pata_sc1200.o
-obj-$(CONFIG_PATA_SCC) += pata_scc.o
 obj-$(CONFIG_PATA_SCH) += pata_sch.o
 obj-$(CONFIG_PATA_SERVERWORKS) += pata_serverworks.o
 obj-$(CONFIG_PATA_SIL680)  += pata_sil680.o
diff --git a/drivers/ata/pata_scc.c b/drivers/ata/pata_scc.c
deleted file mode 100644
index 5cd60d6..000
--- a/drivers/ata/pata_scc.c
+++ /dev/null
@@ -1,1110 +0,0 @@
-/*
- * Support for IDE interfaces on Celleb platform
- *
- * (C) Copyright 2006 TOSHIBA CORPORATION
- *
- * This code is based on drivers/ata/ata_piix.c:
- *  Copyright 2003-2005 Red Hat Inc
- *  Copyright 2003-2005 Jeff Garzik
- *  Copyright (C) 1998-1999 Andrzej Krzysztofowicz, Author and Maintainer
- *  Copyright (C) 1998-2000 Andre Hedrick 
- *  Copyright (C) 2003 Red Hat Inc
- *
- * and drivers/ata/ahci.c:
- *  Copyright 2004-2005 Red Hat, Inc.
- *
- * and drivers/ata/libata-core.c:
- *  Copyright 2003-2004 Red Hat, Inc.  All rights reserved.
- *  Copyright 2003-2004 Jeff Garzik
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define DRV_NAME   "pata_scc"
-#define DRV_VERSION"0.3"
-
-#define PCI_DEVICE_ID_TOSHIBA_SCC_ATA  0x01b4
-
-/* PCI BARs */
-#define SCC_CTRL_BAR   0
-#define SCC_BMID_BAR   1
-
-/* offset of CTRL registers */
-#define SCC_CTL_PIOSHT 0x000
-#define SCC_CTL_PIOCT  0x004
-#define SCC_CTL_MDMACT 0x008
-#define SCC_CTL_MCRCST 0x00C
-#define SCC_CTL_SDMACT 0x010
-#define SCC_CTL_SCRCST 0x014
-#define SCC_CTL_UDENVT 0x018
-#define SCC_CTL_TDVHSEL0x020
-#define SCC_CTL_MODEREG0x024
-#define SCC_CTL_ECMODE 0xF00
-#define SCC_CTL_MAEA0  0xF50
-#define SCC_CTL_MAEC0  0xF54
-#define SCC_CTL_CCKCTRL0xFF0
-
-/* offset of BMID registers */
-#define SCC_DMA_CMD0x000
-#define SCC_DMA_STATUS 0x004
-#define SCC_DMA_TABLE_OFS  0x008
-#define SCC_DMA_INTMASK0x010
-#define SCC_DMA_INTST  0x014
-#define SCC_DMA_PTERADD0x018
-#define SCC_REG_CMD_ADDR   0x020
-#define SCC_REG_DATA   0x000
-#define SCC_REG_ERR0x004
-#define SCC_REG_FEATURE0x004
-#define SCC_REG_NSECT  0x008
-#define SCC_REG_LBAL   0x00C
-#define SCC_REG_LBAM   0x010
-#define SCC_REG_LBAH   0x014
-#define SCC_REG_DEVICE 0x018
-#define SCC_REG_STATUS 0x01C
-#define SCC_REG_CMD0x01C
-#define SCC_REG_ALTSTATUS  0x020
-
-/

Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1

2015-04-13 Thread Minfei Huang
On 04/14/15 at 12:11P, Josh Poimboeuf wrote:
> On Tue, Apr 14, 2015 at 01:03:48PM +0800, Minfei Huang wrote:
> > On 04/13/15 at 11:57P, Josh Poimboeuf wrote:
> > > On Tue, Apr 14, 2015 at 08:26:29AM +0800, Minfei Huang wrote:
> > > > On 04/13/15 at 06:13P, Josh Poimboeuf wrote:
> > > > > On Sun, Apr 12, 2015 at 09:15:54PM +0800, Minfei Huang wrote:
> > > > > > For now, the kallsyms will only store the first (KSYM_NAME_LEN-1). 
> > > > > > The
> > > > > > kallsyms name is same for the function which first 
> > > > > > (KSYM_NAME_LEN-1) is
> > > > > > same, but the rest is not.
> > > > > > 
> > > > > > Then function will never be patched, although function name and 
> > > > > > address
> > > > > > are provided both. The reason caused this bug is livepatch cannt
> > > > > > recognize the function name.
> > > > > > 
> > > > > > Now, livepatch will verify the function name with first 
> > > > > > (KSYM_NAME_LEN-1)
> > > > > > and address, if provided. Once they are matched, we can confirm 
> > > > > > that the
> > > > > > patched function is found.
> > > > > 
> > > > > From scripts/kallsyms.c:
> > > > > 
> > > > >   if (strlen(str) > KSYM_NAME_LEN) {
> > > > >   fprintf(stderr, "Symbol %s too long for kallsyms (%zu 
> > > > > vs %d).\n"
> > > > >   "Please increase KSYM_NAME_LEN both in 
> > > > > kernel and kallsyms.c\n",
> > > > >   str, strlen(str), KSYM_NAME_LEN);
> > > > >   return -1;
> > > > >   }
> > > > > 
> > > > > So I think such a long symbol name wouldn't be added to the kallsyms
> > > > > database in the first place.
> > > > > 
> > > > 
> > > > Actually, kernel allows overlength function name to be used. Following
> > > > is my testing module.
> > > > 
> > > > We can got the address in /proc/kallsyms.
> > > > $ cat /proc/kallsyms | grep sysfs_print
> > > > a000 t 
> > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > > >   [sysfs_print]
> > > > a010 t kobj_release [sysfs_print]
> > > > a020 t 
> > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > > >   [sysfs_print]
> > > > a4e0 b root_kobj[sysfs_print]
> > > > a200 d print_ktype  [sysfs_print]
> > > > a4a0 b print_kobj   [sysfs_print]
> > > > a04c t sys_print_exit   [sysfs_print]
> > > > a144 r __func__.14514   [sysfs_print]
> > > > a230 d kobj_attrs   [sysfs_print]
> > > > a240 d sys_print_kobj_attr  [sysfs_print]
> > > > a260 d __this_module[sysfs_print]
> > > > a04c t cleanup_module   [sysfs_print]
> > > > 
> > > > Code:
> > > > 
> > > > static ssize_t 
> > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_store(struct
> > > >  kobject *kobj, s
> > > > const char *buf, size_t count)
> > > > {
> > > > return count;
> > > > }
> > > > 
> > > > static ssize_t 
> > > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_show(struct
> > > >  kobject *kobj,
> > > > struct kobj_attribute *attr, char *buf)
> > > > {
> > > > return snprintf(buf, PAGE_SIZE-1, "%s\n", "This is printed by 
> > > > module");
> > > > }
> > > > 
> > > > static struct kobj_attribute sys_print_kobj_attr = 
> > > > __ATTR_RW(sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_p
> > > > static struct attribute *kobj_attrs[] = {
> > > > &sys_print_kobj_attr.attr,
> > > > NULL
> > > > };
> > > > 
> > > 
> > > Hm, this seems like a kallsyms bug.  IMO it should either fail the build
> > > or omit the symbol from the kallsyms db.  Truncating it seems dangerous
> > > and counterintuitive.
> > > 
> > 
> > Kallsyms will record all of the function name, without truncating it.
> > But the kallsyms will return the truncated function name which is max to
> > 127.
> >
> > > But regardless I really don't see a good reason to encourage this kind
> > > of insanity in the livepatch code.
> > > 
> > 
> > Yes, the above code is terrible, but we cannt stop user composing like
> > that.
> > 
> > Once the function name is like above, user will never have chance to use
> > livepatch.
> 
> Again, this seems like a kallsyms bug.  Fix the bug and the real world
> need for this patch set goes away.  The user will be forced to either
> shorten their function name or increase KSYM_NAME_LEN.
> 

kallsyms bug? I donot think increasing the KSYM_NAME_LEN is a good idea.

For end user, they may know litter about restriction of kallsyms and
livepatch. How can they know the restriction that function name is
limited to 127?

It is significant that livepatch suppo

[PATCH v2 3/3] tty/hvc: remove celleb-only beat driver

2015-04-13 Thread Daniel Axtens
The beat hvc driver is only used by celleb.
celleb has been dropped [1], so drop the drivers.

[1] http://patchwork.ozlabs.org/patch/451730/

CC: Greg Kroah-Hartman 
CC: Jiri Slaby 
CC: Valentin Rothberg 
CC: m...@ellerman.id.au
CC: linuxppc-...@lists.ozlabs.org
Signed-off-by: Daniel Axtens 

---
v2: get name of ozlab*s*.org right. Sorry all.
---
 drivers/tty/hvc/Kconfig|   7 ---
 drivers/tty/hvc/Makefile   |   1 -
 drivers/tty/hvc/hvc_beat.c | 134 -
 3 files changed, 142 deletions(-)
 delete mode 100644 drivers/tty/hvc/hvc_beat.c

diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
index 8902f9b..2509d05 100644
--- a/drivers/tty/hvc/Kconfig
+++ b/drivers/tty/hvc/Kconfig
@@ -42,13 +42,6 @@ config HVC_RTAS
help
  IBM Console device driver which makes use of RTAS
 
-config HVC_BEAT
-   bool "Toshiba's Beat Hypervisor Console support"
-   depends on PPC_CELLEB
-   select HVC_DRIVER
-   help
- Toshiba's Cell Reference Set Beat Console device driver
-
 config HVC_IUCV
bool "z/VM IUCV Hypervisor console support (VM only)"
depends on S390
diff --git a/drivers/tty/hvc/Makefile b/drivers/tty/hvc/Makefile
index 4ca3723..6a2702b 100644
--- a/drivers/tty/hvc/Makefile
+++ b/drivers/tty/hvc/Makefile
@@ -4,7 +4,6 @@ obj-$(CONFIG_HVC_OLD_HVSI)  += hvsi.o
 obj-$(CONFIG_HVC_RTAS) += hvc_rtas.o
 obj-$(CONFIG_HVC_TILE) += hvc_tile.o
 obj-$(CONFIG_HVC_DCC)  += hvc_dcc.o
-obj-$(CONFIG_HVC_BEAT) += hvc_beat.o
 obj-$(CONFIG_HVC_DRIVER)   += hvc_console.o
 obj-$(CONFIG_HVC_IRQ)  += hvc_irq.o
 obj-$(CONFIG_HVC_XEN)  += hvc_xen.o
diff --git a/drivers/tty/hvc/hvc_beat.c b/drivers/tty/hvc/hvc_beat.c
deleted file mode 100644
index 1560d23..000
--- a/drivers/tty/hvc/hvc_beat.c
+++ /dev/null
@@ -1,134 +0,0 @@
-/*
- * Beat hypervisor console driver
- *
- * (C) Copyright 2006 TOSHIBA CORPORATION
- *
- * This code is based on drivers/char/hvc_rtas.c:
- * (C) Copyright IBM Corporation 2001-2005
- * (C) Copyright Red Hat, Inc. 2005
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "hvc_console.h"
-
-extern int64_t beat_get_term_char(uint64_t, uint64_t *, uint64_t *, uint64_t 
*);
-extern int64_t beat_put_term_char(uint64_t, uint64_t, uint64_t, uint64_t);
-
-struct hvc_struct *hvc_beat_dev = NULL;
-
-/* bug: only one queue is available regardless of vtermno */
-static int hvc_beat_get_chars(uint32_t vtermno, char *buf, int cnt)
-{
-   static unsigned char q[sizeof(unsigned long) * 2]
-   __attribute__((aligned(sizeof(unsigned long;
-   static int qlen = 0;
-   u64 got;
-
-again:
-   if (qlen) {
-   if (qlen > cnt) {
-   memcpy(buf, q, cnt);
-   qlen -= cnt;
-   memmove(q + cnt, q, qlen);
-   return cnt;
-   } else {/* qlen <= cnt */
-   int r;
-
-   memcpy(buf, q, qlen);
-   r = qlen;
-   qlen = 0;
-   return r;
-   }
-   }
-   if (beat_get_term_char(vtermno, &got,
-   ((u64 *)q), ((u64 *)q) + 1) == 0) {
-   qlen = got;
-   goto again;
-   }
-   return 0;
-}
-
-static int hvc_beat_put_chars(uint32_t vtermno, const char *buf, int cnt)
-{
-   unsigned long kb[2];
-   int rest, nlen;
-
-   for (rest = cnt; rest > 0; rest -= nlen) {
-   nlen = (rest > 16) ? 16 : rest;
-   memcpy(kb, buf, nlen);
-   beat_put_term_char(vtermno, nlen, kb[0], kb[1]);
-   buf += nlen;
-   }
-   return cnt;
-}
-
-static const struct hv_ops hvc_beat_get_put_ops = {
-   .get_chars = hvc_beat_get_chars,
-   .put_chars = hvc_beat_put_chars,
-};
-
-static int hvc_beat_useit = 1;
-
-static int hvc_beat_config(char *p)
-{
-   hvc_beat_useit = simple_strtoul(p, NULL, 0);
-   return 0;
-}
-
-static int __init hvc_beat_console_init(void)
-{
-   if (hvc_beat_useit && of_machine_is_compatible("Beat")) {
-   hvc_instantiate(0, 0, &

[PATCH v2 1/3] toshiba: Remove celleb from Kconfig options

2015-04-13 Thread Daniel Axtens
The toshiba drivers had celleb as an optional dependency.
celleb has been dropped [1], so clean that out of Kconfig.

[1] http://patchwork.ozlabs.org/patch/451730/

CC: net...@vger.kernel.org
CC: Valentin Rothberg 
CC: m...@ellerman.id.au
CC: linuxppc-...@lists.ozlabs.org
Signed-off-by: Daniel Axtens 

---
v2: get name of ozlab*s*.org right. Sorry all.
---
 drivers/net/ethernet/toshiba/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/Kconfig 
b/drivers/net/ethernet/toshiba/Kconfig
index 74acb5c..5d244b6 100644
--- a/drivers/net/ethernet/toshiba/Kconfig
+++ b/drivers/net/ethernet/toshiba/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_TOSHIBA
bool "Toshiba devices"
default y
-   depends on PCI && (PPC_IBM_CELL_BLADE || PPC_CELLEB || MIPS) || PPC_PS3
+   depends on PCI && (PPC_IBM_CELL_BLADE || MIPS) || PPC_PS3
---help---
  If you have a network (Ethernet) card belonging to this class, say Y
  and read the Ethernet-HOWTO, available from
@@ -42,7 +42,7 @@ config GELIC_WIRELESS
 
 config SPIDER_NET
tristate "Spider Gigabit Ethernet driver"
-   depends on PCI && (PPC_IBM_CELL_BLADE || PPC_CELLEB)
+   depends on PCI && PPC_IBM_CELL_BLADE
select FW_LOADER
select SUNGEM_PHY
---help---
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] tty/hvc: remove celleb-only beat driver

2015-04-13 Thread Daniel Axtens
The beat hvc driver is only used by celleb.
celleb has been dropped [1], so drop the drivers.

[1] http://patchwork.ozlabs.org/patch/451730/

CC: Greg Kroah-Hartman 
CC: Jiri Slaby 
CC: Valentin Rothberg 
CC: m...@ellerman.id.au
CC: linuxppc-...@lists.ozlab.org
Signed-off-by: Daniel Axtens 
---
 drivers/tty/hvc/Kconfig|   7 ---
 drivers/tty/hvc/Makefile   |   1 -
 drivers/tty/hvc/hvc_beat.c | 134 -
 3 files changed, 142 deletions(-)
 delete mode 100644 drivers/tty/hvc/hvc_beat.c

diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
index 8902f9b..2509d05 100644
--- a/drivers/tty/hvc/Kconfig
+++ b/drivers/tty/hvc/Kconfig
@@ -42,13 +42,6 @@ config HVC_RTAS
help
  IBM Console device driver which makes use of RTAS
 
-config HVC_BEAT
-   bool "Toshiba's Beat Hypervisor Console support"
-   depends on PPC_CELLEB
-   select HVC_DRIVER
-   help
- Toshiba's Cell Reference Set Beat Console device driver
-
 config HVC_IUCV
bool "z/VM IUCV Hypervisor console support (VM only)"
depends on S390
diff --git a/drivers/tty/hvc/Makefile b/drivers/tty/hvc/Makefile
index 4ca3723..6a2702b 100644
--- a/drivers/tty/hvc/Makefile
+++ b/drivers/tty/hvc/Makefile
@@ -4,7 +4,6 @@ obj-$(CONFIG_HVC_OLD_HVSI)  += hvsi.o
 obj-$(CONFIG_HVC_RTAS) += hvc_rtas.o
 obj-$(CONFIG_HVC_TILE) += hvc_tile.o
 obj-$(CONFIG_HVC_DCC)  += hvc_dcc.o
-obj-$(CONFIG_HVC_BEAT) += hvc_beat.o
 obj-$(CONFIG_HVC_DRIVER)   += hvc_console.o
 obj-$(CONFIG_HVC_IRQ)  += hvc_irq.o
 obj-$(CONFIG_HVC_XEN)  += hvc_xen.o
diff --git a/drivers/tty/hvc/hvc_beat.c b/drivers/tty/hvc/hvc_beat.c
deleted file mode 100644
index 1560d23..000
--- a/drivers/tty/hvc/hvc_beat.c
+++ /dev/null
@@ -1,134 +0,0 @@
-/*
- * Beat hypervisor console driver
- *
- * (C) Copyright 2006 TOSHIBA CORPORATION
- *
- * This code is based on drivers/char/hvc_rtas.c:
- * (C) Copyright IBM Corporation 2001-2005
- * (C) Copyright Red Hat, Inc. 2005
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "hvc_console.h"
-
-extern int64_t beat_get_term_char(uint64_t, uint64_t *, uint64_t *, uint64_t 
*);
-extern int64_t beat_put_term_char(uint64_t, uint64_t, uint64_t, uint64_t);
-
-struct hvc_struct *hvc_beat_dev = NULL;
-
-/* bug: only one queue is available regardless of vtermno */
-static int hvc_beat_get_chars(uint32_t vtermno, char *buf, int cnt)
-{
-   static unsigned char q[sizeof(unsigned long) * 2]
-   __attribute__((aligned(sizeof(unsigned long;
-   static int qlen = 0;
-   u64 got;
-
-again:
-   if (qlen) {
-   if (qlen > cnt) {
-   memcpy(buf, q, cnt);
-   qlen -= cnt;
-   memmove(q + cnt, q, qlen);
-   return cnt;
-   } else {/* qlen <= cnt */
-   int r;
-
-   memcpy(buf, q, qlen);
-   r = qlen;
-   qlen = 0;
-   return r;
-   }
-   }
-   if (beat_get_term_char(vtermno, &got,
-   ((u64 *)q), ((u64 *)q) + 1) == 0) {
-   qlen = got;
-   goto again;
-   }
-   return 0;
-}
-
-static int hvc_beat_put_chars(uint32_t vtermno, const char *buf, int cnt)
-{
-   unsigned long kb[2];
-   int rest, nlen;
-
-   for (rest = cnt; rest > 0; rest -= nlen) {
-   nlen = (rest > 16) ? 16 : rest;
-   memcpy(kb, buf, nlen);
-   beat_put_term_char(vtermno, nlen, kb[0], kb[1]);
-   buf += nlen;
-   }
-   return cnt;
-}
-
-static const struct hv_ops hvc_beat_get_put_ops = {
-   .get_chars = hvc_beat_get_chars,
-   .put_chars = hvc_beat_put_chars,
-};
-
-static int hvc_beat_useit = 1;
-
-static int hvc_beat_config(char *p)
-{
-   hvc_beat_useit = simple_strtoul(p, NULL, 0);
-   return 0;
-}
-
-static int __init hvc_beat_console_init(void)
-{
-   if (hvc_beat_useit && of_machine_is_compatible("Beat")) {
-   hvc_instantiate(0, 0, &hvc_beat_get_put_ops);
-   }
-   return 0;
-}

[PATCH 1/3] toshiba: Remove celleb from Kconfig options

2015-04-13 Thread Daniel Axtens
The toshiba drivers had celleb as an optional dependency.
celleb has been dropped [1], so clean that out of Kconfig.

[1] http://patchwork.ozlabs.org/patch/451730/

CC: net...@vger.kernel.org
CC: Valentin Rothberg 
CC: m...@ellerman.id.au
CC: linuxppc-...@lists.ozlab.org
Signed-off-by: Daniel Axtens 
---
 drivers/net/ethernet/toshiba/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/Kconfig 
b/drivers/net/ethernet/toshiba/Kconfig
index 74acb5c..5d244b6 100644
--- a/drivers/net/ethernet/toshiba/Kconfig
+++ b/drivers/net/ethernet/toshiba/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_TOSHIBA
bool "Toshiba devices"
default y
-   depends on PCI && (PPC_IBM_CELL_BLADE || PPC_CELLEB || MIPS) || PPC_PS3
+   depends on PCI && (PPC_IBM_CELL_BLADE || MIPS) || PPC_PS3
---help---
  If you have a network (Ethernet) card belonging to this class, say Y
  and read the Ethernet-HOWTO, available from
@@ -42,7 +42,7 @@ config GELIC_WIRELESS
 
 config SPIDER_NET
tristate "Spider Gigabit Ethernet driver"
-   depends on PCI && (PPC_IBM_CELL_BLADE || PPC_CELLEB)
+   depends on PCI && PPC_IBM_CELL_BLADE
select FW_LOADER
select SUNGEM_PHY
---help---
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] Remove celleb-only SCC PATA drivers

2015-04-13 Thread Daniel Axtens
The SCC PATA interface is only used by celleb.
celleb has been dropped [1], so drop the drivers.

[1] http://patchwork.ozlabs.org/patch/451730/

CC: Bartlomiej Zolnierkiewicz 
CC: Tejun Heo 
CC: "David S. Miller" 
CC: linux-...@vger.kernel.org
CC: Valentin Rothberg 
CC: m...@ellerman.id.au
CC: linuxppc-...@lists.ozlab.org
Signed-off-by: Daniel Axtens 
---
 drivers/ata/Kconfig|9 -
 drivers/ata/Makefile   |1 -
 drivers/ata/pata_scc.c | 1110 
 drivers/ide/Kconfig|9 -
 drivers/ide/Makefile   |1 -
 drivers/ide/scc_pata.c |  887 --
 6 files changed, 2017 deletions(-)
 delete mode 100644 drivers/ata/pata_scc.c
 delete mode 100644 drivers/ide/scc_pata.c

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 5f60155..ee5209f 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -729,15 +729,6 @@ config PATA_SC1200
 
  If unsure, say N.
 
-config PATA_SCC
-   tristate "Toshiba's Cell Reference Set IDE support"
-   depends on PCI && PPC_CELLEB
-   help
- This option enables support for the built-in IDE controller on
- Toshiba Cell Reference Board.
-
- If unsure, say N.
-
 config PATA_SCH
tristate "Intel SCH PATA support"
depends on PCI
diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index b67e995..40f7865 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -75,7 +75,6 @@ obj-$(CONFIG_PATA_PDC_OLD)+= pata_pdc202xx_old.o
 obj-$(CONFIG_PATA_RADISYS) += pata_radisys.o
 obj-$(CONFIG_PATA_RDC) += pata_rdc.o
 obj-$(CONFIG_PATA_SC1200)  += pata_sc1200.o
-obj-$(CONFIG_PATA_SCC) += pata_scc.o
 obj-$(CONFIG_PATA_SCH) += pata_sch.o
 obj-$(CONFIG_PATA_SERVERWORKS) += pata_serverworks.o
 obj-$(CONFIG_PATA_SIL680)  += pata_sil680.o
diff --git a/drivers/ata/pata_scc.c b/drivers/ata/pata_scc.c
deleted file mode 100644
index 5cd60d6..000
--- a/drivers/ata/pata_scc.c
+++ /dev/null
@@ -1,1110 +0,0 @@
-/*
- * Support for IDE interfaces on Celleb platform
- *
- * (C) Copyright 2006 TOSHIBA CORPORATION
- *
- * This code is based on drivers/ata/ata_piix.c:
- *  Copyright 2003-2005 Red Hat Inc
- *  Copyright 2003-2005 Jeff Garzik
- *  Copyright (C) 1998-1999 Andrzej Krzysztofowicz, Author and Maintainer
- *  Copyright (C) 1998-2000 Andre Hedrick 
- *  Copyright (C) 2003 Red Hat Inc
- *
- * and drivers/ata/ahci.c:
- *  Copyright 2004-2005 Red Hat, Inc.
- *
- * and drivers/ata/libata-core.c:
- *  Copyright 2003-2004 Red Hat, Inc.  All rights reserved.
- *  Copyright 2003-2004 Jeff Garzik
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define DRV_NAME   "pata_scc"
-#define DRV_VERSION"0.3"
-
-#define PCI_DEVICE_ID_TOSHIBA_SCC_ATA  0x01b4
-
-/* PCI BARs */
-#define SCC_CTRL_BAR   0
-#define SCC_BMID_BAR   1
-
-/* offset of CTRL registers */
-#define SCC_CTL_PIOSHT 0x000
-#define SCC_CTL_PIOCT  0x004
-#define SCC_CTL_MDMACT 0x008
-#define SCC_CTL_MCRCST 0x00C
-#define SCC_CTL_SDMACT 0x010
-#define SCC_CTL_SCRCST 0x014
-#define SCC_CTL_UDENVT 0x018
-#define SCC_CTL_TDVHSEL0x020
-#define SCC_CTL_MODEREG0x024
-#define SCC_CTL_ECMODE 0xF00
-#define SCC_CTL_MAEA0  0xF50
-#define SCC_CTL_MAEC0  0xF54
-#define SCC_CTL_CCKCTRL0xFF0
-
-/* offset of BMID registers */
-#define SCC_DMA_CMD0x000
-#define SCC_DMA_STATUS 0x004
-#define SCC_DMA_TABLE_OFS  0x008
-#define SCC_DMA_INTMASK0x010
-#define SCC_DMA_INTST  0x014
-#define SCC_DMA_PTERADD0x018
-#define SCC_REG_CMD_ADDR   0x020
-#define SCC_REG_DATA   0x000
-#define SCC_REG_ERR0x004
-#define SCC_REG_FEATURE0x004
-#define SCC_REG_NSECT  0x008
-#define SCC_REG_LBAL   0x00C
-#define SCC_REG_LBAM   0x010
-#define SCC_REG_LBAH   0x014
-#define SCC_REG_DEVICE 0x018
-#define SCC_REG_STATUS 0x01C
-#define SCC_REG_CMD0x01C
-#define SCC_REG_ALTSTATUS  0x020
-
-/* register value */
-#define TDVHSEL_MASTER 0

linux-next: manual merge of the ftrace tree with the net-next tree

2015-04-13 Thread Stephen Rothwell
Hi Steven,

Today's linux-next merge of the ftrace tree got a conflict in
net/mac80211/trace.h between commit ba8c3d6f16a1 ("mac80211: add an
intermediate software queue implementation") from the net-next tree and
commit 1bc1e4d048d3 ("mac80211: Move message tracepoints to their own
header") from the ftrace tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc net/mac80211/trace.h
index 790bd45081c4,755a5388dbca..
--- a/net/mac80211/trace.h
+++ b/net/mac80211/trace.h
@@@ -2312,75 -2312,6 +2312,37 @@@ TRACE_EVENT(drv_tdls_recv_channel_switc
)
  );
  
 +TRACE_EVENT(drv_wake_tx_queue,
 +  TP_PROTO(struct ieee80211_local *local,
 +   struct ieee80211_sub_if_data *sdata,
 +   struct txq_info *txq),
 +
 +  TP_ARGS(local, sdata, txq),
 +
 +  TP_STRUCT__entry(
 +  LOCAL_ENTRY
 +  VIF_ENTRY
 +  STA_ENTRY
 +  __field(u8, ac)
 +  __field(u8, tid)
 +  ),
 +
 +  TP_fast_assign(
 +  struct ieee80211_sta *sta = txq->txq.sta;
 +
 +  LOCAL_ASSIGN;
 +  VIF_ASSIGN;
 +  STA_ASSIGN;
 +  __entry->ac = txq->txq.ac;
 +  __entry->tid = txq->txq.tid;
 +  ),
 +
 +  TP_printk(
 +  LOCAL_PR_FMT  VIF_PR_FMT  STA_PR_FMT " ac:%d tid:%d",
 +  LOCAL_PR_ARG, VIF_PR_ARG, STA_PR_ARG, __entry->ac, __entry->tid
 +  )
 +);
 +
- #ifdef CONFIG_MAC80211_MESSAGE_TRACING
- #undef TRACE_SYSTEM
- #define TRACE_SYSTEM mac80211_msg
- 
- #define MAX_MSG_LEN   100
- 
- DECLARE_EVENT_CLASS(mac80211_msg_event,
-   TP_PROTO(struct va_format *vaf),
- 
-   TP_ARGS(vaf),
- 
-   TP_STRUCT__entry(
-   __dynamic_array(char, msg, MAX_MSG_LEN)
-   ),
- 
-   TP_fast_assign(
-   WARN_ON_ONCE(vsnprintf(__get_dynamic_array(msg),
-  MAX_MSG_LEN, vaf->fmt,
-  *vaf->va) >= MAX_MSG_LEN);
-   ),
- 
-   TP_printk("%s", __get_str(msg))
- );
- 
- DEFINE_EVENT(mac80211_msg_event, mac80211_info,
-   TP_PROTO(struct va_format *vaf),
-   TP_ARGS(vaf)
- );
- DEFINE_EVENT(mac80211_msg_event, mac80211_dbg,
-   TP_PROTO(struct va_format *vaf),
-   TP_ARGS(vaf)
- );
- DEFINE_EVENT(mac80211_msg_event, mac80211_err,
-   TP_PROTO(struct va_format *vaf),
-   TP_ARGS(vaf)
- );
- #endif
- 
  #endif /* !__MAC80211_DRIVER_TRACE || TRACE_HEADER_MULTI_READ */
  
  #undef TRACE_INCLUDE_PATH


pgpJLrJtfYTMY.pgp
Description: OpenPGP digital signature


Re: [PATCH] serial: of-serial: Remove device_type = "serial" registration

2015-04-13 Thread Michal Simek
On 04/13/2015 05:50 PM, Peter Hurley wrote:
> [ + Arnd ]
> 
> Hi Michal,
> 
> On 04/13/2015 10:35 AM, Michal Simek wrote:
>> Do not probe all serial drivers by of_serial.c which are using
>> device_type = "serial"; property. Only drivers which have valid
>> compatible strings listed in the driver should be probed.
> 
> What does this fix?
> Is there some kind of probe problem you're trying to address?
> Are you trying to silence the error message?
> 
>> When PORT_UNKNOWN probe will fail anyway.
> 
> Ok, but doesn't device_attach() just continue to try to match other
> drivers on the platform bus?


Please look at my response to Greg reply. Hope that explain it.

Thanks,
Michal









signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1

2015-04-13 Thread Josh Poimboeuf
On Tue, Apr 14, 2015 at 01:03:48PM +0800, Minfei Huang wrote:
> On 04/13/15 at 11:57P, Josh Poimboeuf wrote:
> > On Tue, Apr 14, 2015 at 08:26:29AM +0800, Minfei Huang wrote:
> > > On 04/13/15 at 06:13P, Josh Poimboeuf wrote:
> > > > On Sun, Apr 12, 2015 at 09:15:54PM +0800, Minfei Huang wrote:
> > > > > For now, the kallsyms will only store the first (KSYM_NAME_LEN-1). The
> > > > > kallsyms name is same for the function which first (KSYM_NAME_LEN-1) 
> > > > > is
> > > > > same, but the rest is not.
> > > > > 
> > > > > Then function will never be patched, although function name and 
> > > > > address
> > > > > are provided both. The reason caused this bug is livepatch cannt
> > > > > recognize the function name.
> > > > > 
> > > > > Now, livepatch will verify the function name with first 
> > > > > (KSYM_NAME_LEN-1)
> > > > > and address, if provided. Once they are matched, we can confirm that 
> > > > > the
> > > > > patched function is found.
> > > > 
> > > > From scripts/kallsyms.c:
> > > > 
> > > > if (strlen(str) > KSYM_NAME_LEN) {
> > > > fprintf(stderr, "Symbol %s too long for kallsyms (%zu 
> > > > vs %d).\n"
> > > > "Please increase KSYM_NAME_LEN both in 
> > > > kernel and kallsyms.c\n",
> > > > str, strlen(str), KSYM_NAME_LEN);
> > > > return -1;
> > > > }
> > > > 
> > > > So I think such a long symbol name wouldn't be added to the kallsyms
> > > > database in the first place.
> > > > 
> > > 
> > > Actually, kernel allows overlength function name to be used. Following
> > > is my testing module.
> > > 
> > > We can got the address in /proc/kallsyms.
> > > $ cat /proc/kallsyms | grep sysfs_print
> > > a000 t 
> > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > >   [sysfs_print]
> > > a010 t kobj_release [sysfs_print]
> > > a020 t 
> > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> > >   [sysfs_print]
> > > a4e0 b root_kobj[sysfs_print]
> > > a200 d print_ktype  [sysfs_print]
> > > a4a0 b print_kobj   [sysfs_print]
> > > a04c t sys_print_exit   [sysfs_print]
> > > a144 r __func__.14514   [sysfs_print]
> > > a230 d kobj_attrs   [sysfs_print]
> > > a240 d sys_print_kobj_attr  [sysfs_print]
> > > a260 d __this_module[sysfs_print]
> > > a04c t cleanup_module   [sysfs_print]
> > > 
> > > Code:
> > > 
> > > static ssize_t 
> > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_store(struct
> > >  kobject *kobj, s
> > > const char *buf, size_t count)
> > > {
> > > return count;
> > > }
> > > 
> > > static ssize_t 
> > > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_show(struct
> > >  kobject *kobj,
> > > struct kobj_attribute *attr, char *buf)
> > > {
> > > return snprintf(buf, PAGE_SIZE-1, "%s\n", "This is printed by 
> > > module");
> > > }
> > > 
> > > static struct kobj_attribute sys_print_kobj_attr = 
> > > __ATTR_RW(sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_p
> > > static struct attribute *kobj_attrs[] = {
> > > &sys_print_kobj_attr.attr,
> > > NULL
> > > };
> > > 
> > 
> > Hm, this seems like a kallsyms bug.  IMO it should either fail the build
> > or omit the symbol from the kallsyms db.  Truncating it seems dangerous
> > and counterintuitive.
> > 
> 
> Kallsyms will record all of the function name, without truncating it.
> But the kallsyms will return the truncated function name which is max to
> 127.
>
> > But regardless I really don't see a good reason to encourage this kind
> > of insanity in the livepatch code.
> > 
> 
> Yes, the above code is terrible, but we cannt stop user composing like
> that.
> 
> Once the function name is like above, user will never have chance to use
> livepatch.

Again, this seems like a kallsyms bug.  Fix the bug and the real world
need for this patch set goes away.  The user will be forced to either
shorten their function name or increase KSYM_NAME_LEN.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] serial: of-serial: Remove device_type = "serial" registration

2015-04-13 Thread Michal Simek
Hi Greg,

On 04/13/2015 07:00 PM, Greg Kroah-Hartman wrote:
> On Mon, Apr 13, 2015 at 04:35:27PM +0200, Michal Simek wrote:
>> Do not probe all serial drivers by of_serial.c which are using
>> device_type = "serial"; property. Only drivers which have valid
>> compatible strings listed in the driver should be probed.
> 
> Why?  This was added for some reason, what has changed since then?

I was discussing this patch with Arnd over IRC.

This is what Arnd was saying yesterday.
"when I wrote that driver initially, the idea was that it would get used
as a stub to hook up all other serial drivers
but after that, the common code learned to create platform devices from DT"

and resolution from our discussion was to remove this line because make
no sense to probe all drivers which has device_type = "serial".
It was causing the problem on one system with xilinx_uartps and 16550a
IP where of_serial failed to register for xilinx_uartps and because of
irq_dispose_mapping removed irq_desc. Then when xilinx_uartps was asking
for irq with request_irq() it was returning EINVAL.

The first problem was that of_serial tried to bind driver because of
device_type = "serial"; and the second problem was in xilinx_uartps
driver by incorrectly using platform_get_resources() which doesn't
create irq mapping which is fixed by the second patch.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 2/3] powerpc/powernv: Move cpuidle related code from setup.c to new file

2015-04-13 Thread Preeti U Murthy
On 04/14/2015 07:26 AM, Shreyas B. Prabhu wrote:
> This is a cleanup patch; doesn't change any functionality. Moves
> all cpuidle related code from setup.c to a new file.
> 
> Signed-off-by: Shreyas B. Prabhu 

Reviewed-by: Preeti U Murthy 

Regards
Preeti U Murthy
> ---
>  arch/powerpc/platforms/powernv/Makefile |   2 +-
>  arch/powerpc/platforms/powernv/idle.c   | 191 
> 
>  arch/powerpc/platforms/powernv/setup.c  | 171 
>  3 files changed, 192 insertions(+), 172 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/idle.c
> 
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index 33e44f3..bee9235 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -1,4 +1,4 @@
> -obj-y+= setup.o opal-wrappers.o opal.o opal-async.o
> +obj-y+= setup.o opal-wrappers.o opal.o opal-async.o 
> idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> new file mode 100644
> index 000..104235a
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -0,0 +1,191 @@
> +/*
> + * PowerNV cpuidle code
> + *
> + * Copyright 2015 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "powernv.h"
> +#include "subcore.h"
> +
> +static u32 supported_cpuidle_states;
> +
> +int pnv_save_sprs_for_winkle(void)
> +{
> + int cpu;
> + int rc;
> +
> + /*
> +  * hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross
> +  * all cpus at boot. Get these reg values of current cpu and use the
> +  * same accross all cpus.
> +  */
> + uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
> + uint64_t hid0_val = mfspr(SPRN_HID0);
> + uint64_t hid1_val = mfspr(SPRN_HID1);
> + uint64_t hid4_val = mfspr(SPRN_HID4);
> + uint64_t hid5_val = mfspr(SPRN_HID5);
> + uint64_t hmeer_val = mfspr(SPRN_HMEER);
> +
> + for_each_possible_cpu(cpu) {
> + uint64_t pir = get_hard_smp_processor_id(cpu);
> + uint64_t hsprg0_val = (uint64_t)&paca[cpu];
> +
> + /*
> +  * HSPRG0 is used to store the cpu's pointer to paca. Hence last
> +  * 3 bits are guaranteed to be 0. Program slw to restore HSPRG0
> +  * with 63rd bit set, so that when a thread wakes up at 0x100 we
> +  * can use this bit to distinguish between fastsleep and
> +  * deep winkle.
> +  */
> + hsprg0_val |= 1;
> +
> + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
> + if (rc != 0)
> + return rc;
> +
> + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> + if (rc != 0)
> + return rc;
> +
> + /* HIDs are per core registers */
> + if (cpu_thread_in_core(cpu) == 0) {
> +
> + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val);
> + if (rc != 0)
> + return rc;
> +
> + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val);
> + if (rc != 0)
> + return rc;
> +
> + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
> + if (rc != 0)
> + return rc;
> +
> + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
> + if (rc != 0)
> + return rc;
> +
> + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
> + if (rc != 0)
> + return rc;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static void pnv_alloc_idle_core_states(void)
> +{
> + int i, j;
> + int nr_cores = cpu_nr_cores();
> + u32 *core_idle_state;
> +
> + /*
> +  * core_idle_state - First 8 bits track the idle state of each thread
> +  * of the core. The 8th bit is the lock bit. Initially all thread bits
> +  * are set. They are cleared when the thread enters deep idle state
> +  * like sleep and winkle. Initially the lock bit is cleared.
> +  * The lock bit has 2 purposes
> +  * a. While the first

Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1

2015-04-13 Thread Minfei Huang
On 04/13/15 at 11:57P, Josh Poimboeuf wrote:
> On Tue, Apr 14, 2015 at 08:26:29AM +0800, Minfei Huang wrote:
> > On 04/13/15 at 06:13P, Josh Poimboeuf wrote:
> > > On Sun, Apr 12, 2015 at 09:15:54PM +0800, Minfei Huang wrote:
> > > > For now, the kallsyms will only store the first (KSYM_NAME_LEN-1). The
> > > > kallsyms name is same for the function which first (KSYM_NAME_LEN-1) is
> > > > same, but the rest is not.
> > > > 
> > > > Then function will never be patched, although function name and address
> > > > are provided both. The reason caused this bug is livepatch cannt
> > > > recognize the function name.
> > > > 
> > > > Now, livepatch will verify the function name with first 
> > > > (KSYM_NAME_LEN-1)
> > > > and address, if provided. Once they are matched, we can confirm that the
> > > > patched function is found.
> > > 
> > > From scripts/kallsyms.c:
> > > 
> > >   if (strlen(str) > KSYM_NAME_LEN) {
> > >   fprintf(stderr, "Symbol %s too long for kallsyms (%zu vs %d).\n"
> > >   "Please increase KSYM_NAME_LEN both in kernel 
> > > and kallsyms.c\n",
> > >   str, strlen(str), KSYM_NAME_LEN);
> > >   return -1;
> > >   }
> > > 
> > > So I think such a long symbol name wouldn't be added to the kallsyms
> > > database in the first place.
> > > 
> > 
> > Actually, kernel allows overlength function name to be used. Following
> > is my testing module.
> > 
> > We can got the address in /proc/kallsyms.
> > $ cat /proc/kallsyms | grep sysfs_print
> > a000 t 
> > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> >   [sysfs_print]
> > a010 t kobj_release [sysfs_print]
> > a020 t 
> > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
> >   [sysfs_print]
> > a4e0 b root_kobj[sysfs_print]
> > a200 d print_ktype  [sysfs_print]
> > a4a0 b print_kobj   [sysfs_print]
> > a04c t sys_print_exit   [sysfs_print]
> > a144 r __func__.14514   [sysfs_print]
> > a230 d kobj_attrs   [sysfs_print]
> > a240 d sys_print_kobj_attr  [sysfs_print]
> > a260 d __this_module[sysfs_print]
> > a04c t cleanup_module   [sysfs_print]
> > 
> > Code:
> > 
> > static ssize_t 
> > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_store(struct
> >  kobject *kobj, s
> > const char *buf, size_t count)
> > {
> > return count;
> > }
> > 
> > static ssize_t 
> > sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_show(struct
> >  kobject *kobj,
> > struct kobj_attribute *attr, char *buf)
> > {
> > return snprintf(buf, PAGE_SIZE-1, "%s\n", "This is printed by module");
> > }
> > 
> > static struct kobj_attribute sys_print_kobj_attr = 
> > __ATTR_RW(sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_p
> > static struct attribute *kobj_attrs[] = {
> > &sys_print_kobj_attr.attr,
> > NULL
> > };
> > 
> 
> Hm, this seems like a kallsyms bug.  IMO it should either fail the build
> or omit the symbol from the kallsyms db.  Truncating it seems dangerous
> and counterintuitive.
> 

Kallsyms will record all of the function name, without truncating it.
But the kallsyms will return the truncated function name which is max to
127.

> But regardless I really don't see a good reason to encourage this kind
> of insanity in the livepatch code.
> 

Yes, the above code is terrible, but we cannt stop user composing like
that.

Once the function name is like above, user will never have chance to use
livepatch.

Thanks
Minfei

> -- 
> Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1

2015-04-13 Thread Josh Poimboeuf
On Tue, Apr 14, 2015 at 08:26:29AM +0800, Minfei Huang wrote:
> On 04/13/15 at 06:13P, Josh Poimboeuf wrote:
> > On Sun, Apr 12, 2015 at 09:15:54PM +0800, Minfei Huang wrote:
> > > For now, the kallsyms will only store the first (KSYM_NAME_LEN-1). The
> > > kallsyms name is same for the function which first (KSYM_NAME_LEN-1) is
> > > same, but the rest is not.
> > > 
> > > Then function will never be patched, although function name and address
> > > are provided both. The reason caused this bug is livepatch cannt
> > > recognize the function name.
> > > 
> > > Now, livepatch will verify the function name with first (KSYM_NAME_LEN-1)
> > > and address, if provided. Once they are matched, we can confirm that the
> > > patched function is found.
> > 
> > From scripts/kallsyms.c:
> > 
> > if (strlen(str) > KSYM_NAME_LEN) {
> > fprintf(stderr, "Symbol %s too long for kallsyms (%zu vs %d).\n"
> > "Please increase KSYM_NAME_LEN both in kernel 
> > and kallsyms.c\n",
> > str, strlen(str), KSYM_NAME_LEN);
> > return -1;
> > }
> > 
> > So I think such a long symbol name wouldn't be added to the kallsyms
> > database in the first place.
> > 
> 
> Actually, kernel allows overlength function name to be used. Following
> is my testing module.
> 
> We can got the address in /proc/kallsyms.
> $ cat /proc/kallsyms | grep sysfs_print
> a000 t 
> sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
>   [sysfs_print]
> a010 t kobj_release [sysfs_print]
> a020 t 
> sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_pri
>   [sysfs_print]
> a4e0 b root_kobj[sysfs_print]
> a200 d print_ktype  [sysfs_print]
> a4a0 b print_kobj   [sysfs_print]
> a04c t sys_print_exit   [sysfs_print]
> a144 r __func__.14514   [sysfs_print]
> a230 d kobj_attrs   [sysfs_print]
> a240 d sys_print_kobj_attr  [sysfs_print]
> a260 d __this_module[sysfs_print]
> a04c t cleanup_module   [sysfs_print]
> 
> Code:
> 
> static ssize_t 
> sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_store(struct
>  kobject *kobj, s
> const char *buf, size_t count)
> {
> return count;
> }
> 
> static ssize_t 
> sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_show(struct
>  kobject *kobj,
> struct kobj_attribute *attr, char *buf)
> {
> return snprintf(buf, PAGE_SIZE-1, "%s\n", "This is printed by module");
> }
> 
> static struct kobj_attribute sys_print_kobj_attr = 
> __ATTR_RW(sys_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_print_p
> static struct attribute *kobj_attrs[] = {
> &sys_print_kobj_attr.attr,
> NULL
> };
> 

Hm, this seems like a kallsyms bug.  IMO it should either fail the build
or omit the symbol from the kallsyms db.  Truncating it seems dangerous
and counterintuitive.

But regardless I really don't see a good reason to encourage this kind
of insanity in the livepatch code.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] livepatch: Add a new function to verify the address and name match for extra module

2015-04-13 Thread Minfei Huang
On 04/13/15 at 11:05P, Josh Poimboeuf wrote:
> On Tue, Apr 14, 2015 at 08:48:11AM +0800, Minfei Huang wrote:
> > On 04/14/15 at 08:17P, Minfei Huang wrote:
> > > On 04/13/15 at 05:58P, Josh Poimboeuf wrote:
> > > > On Mon, Apr 13, 2015 at 06:37:10PM +0800, Minfei Huang wrote:
> > > > > For my patches, I think it is used by the persion which will compose 
> > > > > the
> > > > > patch individually, not for the manufactor. 
> > > > > 
> > > > > Yes, Verifying extra function address is more useless in general, due 
> > > > > to
> > > > > the changable address on different system.
> > > > > 
> > > > > IMO, we shall do our best to make livepatch more robust.
> > > > 
> > > > IIUC, to use this, you'd have to load the module first, manually look up
> > > > the module function's address, and _then_ build the patch for the
> > > > running system.  And the resulting patch wouldn't work on other systems.
> > > > 
> > > > Do you have concrete plans to use it this way?
> > > > 
> > > > Just trying to understand if this is needed for a real world usage
> > > > scenario.
> > > 
> > > For some companies(like cloud computing company), they will compose
> > > their own module to improve the performance.
> > > 
> > > Once there is some bug for the own module, they cannt restart to reload
> > > the fixed-module. So it seems that livepatch is the best way to fix this
> > > issue.
> > > 
> > > Before livepatch being integrated in kernel, we usually use ksplice to
> > > patch the patch.
> > > 
> > > What the above scenario I met is in my previous work. 
> > > 
> > > For now, livepatch cannt patch the patch for extra module, once the
> > > function name is larger than 127.
> > > 
> > 
> > Also, Maybe there is some day, we can use script to detect the function
> > name and address in userspace, then generate the patch to patch the
> > defective kernel or extra module.
> 
> I'd rather wait until we have a real world use case before adding
> support for that.  Otherwise we end up bloating the code and have to
> support a nebulous feature which nobody uses.
> 

Hi, Josh.

The above scenario is not fake to be suit for the patches. And it is
normal that end user composes patch to patch the kernel for extra module.
It is significative that livepatch support to patch extra module.

Livepatch is more important for the system which cannt reboot without
schedule.

> > So the people who want to use livepatch never concern how to compose the
> > patch to patch the kernel or extra module by using livepatch. All they
> > will do is to provide a common patch which is different with the
> > original code.
> 
> We already have a kpatch tool named kpatch-build which does this.  It is
> not yet upstreamed into Linux.  The key difference is that it creates
> the patch at compile time rather than runtime.  The resulting patch
> works for _all_ systems running the given version of kernel, rather than
> only the current system.
> 

Yes, Linda mentioned the kpatch on one of the meeting. But we cannot
only consider what we know, because the end user's environment is
complicated.

For my previous work, the extra module which uses to improve the
performance is running on CentOS6.3, CentOS6.5. For per fixing, we will
compose the patch on different kernel version(maybe the different
zstream kernel version).

Meanwhile, for the patches, I just want to add a new function that end
user can have chance to use function name and function address to match
the function for extra module, not only the function name. Also maybe
the specified patch is only for the currect system.

Thanks
Minfei

> -- 
> Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scripts/extract-ikconfig: Support LZ4-compressed images.

2015-04-13 Thread Alex Pilon
Support for kernel image LZ4 compression was added around 3.11, but not
the corresponding kernel .config extraction.

This makes possible extracting the kernel config for LZ4-compressed
kernels you're not running, or the current LZ4-compressed kernel if
compiled without /proc/config.gz support.

Signed-off-by: Alex Pilon 
---
 scripts/extract-ikconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/extract-ikconfig b/scripts/extract-ikconfig
index e186242..3b42f25 100755
--- a/scripts/extract-ikconfig
+++ b/scripts/extract-ikconfig
@@ -61,6 +61,7 @@ try_decompress '\3757zXZ\000' abcde unxz
 try_decompress 'BZh'  xybunzip2
 try_decompress '\135\0\0\0'   xxx   unlzma
 try_decompress '\211\114\132' xy'lzop -d'
+try_decompress '\002\041\114\030' xyy 'lz4 -d -l'
 
 # Bail out:
 echo "$me: Cannot find kernel config." >&2
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] of/fdt: fix allocation size for device node path

2015-04-13 Thread Ricky Liang
The allocation size of device node path is off by one which drops the
'\0' terminator.

Signed-off-by: Ricky Liang 
---
 drivers/of/fdt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 3a896c9..98a9e6e 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -172,7 +172,7 @@ static void * unflatten_dt_node(void *blob,
if (!pathp)
return mem;
 
-   allocl = l++;
+   allocl = ++l;
 
/* version 0x10 has a more compact unit name here instead of the full
 * path. we accumulate the full path size using "fpsize", we'll rebuild
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] spi: bcm2835: Add GPIOLIB dependency

2015-04-13 Thread Guenter Roeck
Fix:

drivers/spi/spi-bcm2835.c: In function 'chip_match_name':
drivers/spi/spi-bcm2835.c:356:21: error:
dereferencing pointer to incomplete type
drivers/spi/spi-bcm2835.c: In function 'bcm2835_spi_setup':
drivers/spi/spi-bcm2835.c:382:2: error:
`   implicit declaration of function 'gpiochip_find'
drivers/spi/spi-bcm2835.c:387:21: error:
dereferencing pointer to incomplete type

by adding the now mandatory GPIOLIB dependency.

Fixes: a30a555d7435 ("spi: bcm2835: transform native-cs to gpio-cs
on first spi_setup")
Cc: Martin Sperl 
Signed-off-by: Guenter Roeck 
---
 drivers/spi/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index 198f96b7fb45..a132180a9251 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -78,6 +78,7 @@ config SPI_ATMEL
 config SPI_BCM2835
tristate "BCM2835 SPI controller"
depends on ARCH_BCM2835 || COMPILE_TEST
+   depends on GPIOLIB
help
  This selects a driver for the Broadcom BCM2835 SPI master.
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] mmc: cast unsigned int to typeof(sector_t) to avoid unexpected error

2015-04-13 Thread Kuninori Morimoto
From: Kuninori Morimoto 

card->csd.capacity is defined as "unsigned int",
and, sector_t is defined as "u64" or "unsigned long" (depends on CONFIG_LBDAF)
sector_t data might have strange data if first bit of unsigned int
was 1. this patch cast it to typeof(sector_t)

ex) if sector_t was u64

unsigned int data;
sector_t sector;

data = 0x80;
sector = (data << 8); // 0x8000
sector = (((typeof(sector_t))data) << 8); // 0x8000

or

data = 0x8000;
sector = (data << 8); // 0x0
sector = (((typeof(sector_t))data) << 8); // 0x80

Reported-by: coverity 
Signed-off-by: Kuninori Morimoto 
---
 drivers/mmc/card/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index c69afb5..4d09b0c 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -2205,7 +2205,7 @@ static struct mmc_blk_data *mmc_blk_alloc(struct mmc_card 
*card)
 * The CSD capacity field is in units of read_blkbits.
 * set_capacity takes units of 512 bytes.
 */
-   size = card->csd.capacity << (card->csd.read_blkbits - 9);
+   size = (typeof(sector_t))card->csd.capacity << 
(card->csd.read_blkbits - 9);
}
 
return mmc_blk_alloc_req(card, &card->dev, size, false, NULL,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] mmc: cast u8 to unsigned long long to avoid unexpected error

2015-04-13 Thread Kuninori Morimoto
From: Kuninori Morimoto 

card->ext_csd.enhanced_area_offset is defined as "unsigned long long",
and, ext_csd[] is defined as u8.
unsigned long long data might have strange data if first bit of ext_csd[]
was 1. this patch cast it to (unsigned long long)
ex)
u8  data8;
u64 data64;

data8 = 0x80;
data64 = (data8 << 24); // 0x8000
data64 = (((unsigned long long)data8) << 24); // 0x8000;

Reported-by: coverity 
Signed-off-by: Kuninori Morimoto 
---
 drivers/mmc/core/mmc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index c84131e..c6bb577 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -266,8 +266,10 @@ static void mmc_manage_enhanced_area(struct mmc_card 
*card, u8 *ext_csd)
 * calculate the enhanced data area offset, in bytes
 */
card->ext_csd.enhanced_area_offset =
-   (ext_csd[139] << 24) + (ext_csd[138] << 16) +
-   (ext_csd[137] << 8) + ext_csd[136];
+   (((unsigned long long)ext_csd[139]) << 24) +
+   (((unsigned long long)ext_csd[138]) << 16) +
+   (((unsigned long long)ext_csd[137]) << 8) +
+   (((unsigned long long)ext_csd[136]));
if (mmc_card_blockaddr(card))
card->ext_csd.enhanced_area_offset <<= 9;
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] KVM: MMU: fix comment in kvm_mmu_zap_collapsible_spte

2015-04-13 Thread Xiao Guangrong

Soft mmu uses direct shadow page to fill guest large mapping with small pages
if huge mamping is disallowed on host. So zapping direct shadow page works well
both for soft mmu and hard mmu

Fix the comment to reflect this truth

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 146f295..68c5487 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4481,9 +4481,11 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
pfn = spte_to_pfn(*sptep);

/*
-* Only EPT supported for now; otherwise, one would need to
-* find out efficiently whether the guest page tables are
-* also using huge pages.
+* We can not do huge page mapping for the indirect shadow
+* page (sp) found on the last rmap (level = 1 ) since
+* indirect sp is synced with the page table in guest and
+* indirect sp->level = 1 means the guest page table is
+* using 4K page size mapping.
 */
if (sp->role.direct &&
!kvm_is_reserved_pfn(pfn) &&
--
2.1.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] mmc: cast to avoid unexpected error

2015-04-13 Thread Kuninori Morimoto
Hi Ulf

These are non RFC version of mmc data cast patches
which were posted in
Subject: [PATCH 0/2][RFC] mmc: cast to avoid unexpected error
Date: Wed, 8 Apr 2015 07:32:35 +

These 2 patches adds cast to avoid unexpected error.
It tries copy to u64 without cast.
The data will be 0xfff... if last bit was 1.
These are reported by coverity tool.
I'm happy if someone tests it, or can get deep review.

Kuninori Morimoto (2):
  mmc: cast u8 to unsigned long long to avoid unexpected error
  mmc: cast unsigned int to typeof(sector_t) to avoid unexpected error

 drivers/mmc/card/block.c | 2 +-
 drivers/mmc/core/mmc.c   | 6 --
 2 files changed, 5 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] livepatch: Add a new function to verify the address and name match for extra module

2015-04-13 Thread Josh Poimboeuf
On Tue, Apr 14, 2015 at 08:48:11AM +0800, Minfei Huang wrote:
> On 04/14/15 at 08:17P, Minfei Huang wrote:
> > On 04/13/15 at 05:58P, Josh Poimboeuf wrote:
> > > On Mon, Apr 13, 2015 at 06:37:10PM +0800, Minfei Huang wrote:
> > > > For my patches, I think it is used by the persion which will compose the
> > > > patch individually, not for the manufactor. 
> > > > 
> > > > Yes, Verifying extra function address is more useless in general, due to
> > > > the changable address on different system.
> > > > 
> > > > IMO, we shall do our best to make livepatch more robust.
> > > 
> > > IIUC, to use this, you'd have to load the module first, manually look up
> > > the module function's address, and _then_ build the patch for the
> > > running system.  And the resulting patch wouldn't work on other systems.
> > > 
> > > Do you have concrete plans to use it this way?
> > > 
> > > Just trying to understand if this is needed for a real world usage
> > > scenario.
> > 
> > For some companies(like cloud computing company), they will compose
> > their own module to improve the performance.
> > 
> > Once there is some bug for the own module, they cannt restart to reload
> > the fixed-module. So it seems that livepatch is the best way to fix this
> > issue.
> > 
> > Before livepatch being integrated in kernel, we usually use ksplice to
> > patch the patch.
> > 
> > What the above scenario I met is in my previous work. 
> > 
> > For now, livepatch cannt patch the patch for extra module, once the
> > function name is larger than 127.
> > 
> 
> Also, Maybe there is some day, we can use script to detect the function
> name and address in userspace, then generate the patch to patch the
> defective kernel or extra module.

I'd rather wait until we have a real world use case before adding
support for that.  Otherwise we end up bloating the code and have to
support a nebulous feature which nobody uses.

> So the people who want to use livepatch never concern how to compose the
> patch to patch the kernel or extra module by using livepatch. All they
> will do is to provide a common patch which is different with the
> original code.

We already have a kpatch tool named kpatch-build which does this.  It is
not yet upstreamed into Linux.  The key difference is that it creates
the patch at compile time rather than runtime.  The resulting patch
works for _all_ systems running the given version of kernel, rather than
only the current system.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MAINTAINERS: fix incorrect email address of docking station

2015-04-13 Thread Chao Yu
The old email address of Shaohua will no longer be used, let's update it with
last valid one.

Signed-off-by: Chao Yu 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a36be4e..bbcbb83 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3279,7 +3279,7 @@ F:drivers/firmware/dmi_scan.c
 F: include/linux/dmi.h
 
 DOCKING STATION DRIVER
-M: Shaohua Li 
+M: Shaohua Li 
 L: linux-a...@vger.kernel.org
 S: Supported
 F: drivers/acpi/dock.c
-- 
2.3.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/6] clk: hi6220: Document devicetree bindings for hi6220 clock

2015-04-13 Thread Bintian

Hello Arnd,

On 2015/4/13 23:32, Arnd Bergmann wrote:

On Monday 13 April 2015 17:17:37 Bintian Wang wrote:

+- compatible: the compatible should be one of the following strings to
+   indicate the clock controller functionality.
+
+   - "hisilicon,aoctrl"
+   - "hisilicon,sysctrl"
+   - "hisilicon,mediactrl"
+   - "hisilicon,pmctrl"
+



These ones already have bindings, you can't reuse the strings.
Please work with someone in hisilicon to set up a registry of
device names so you can avoid conflicts in the future.
All the clock registers are under above four system controllers, 
discussed with Mark and Haojian two months ago, I think using above

same four binding strings is enough for clk module.
On second thoughts, there really some problems for future hisilicon
code upstream, how about change back to the first version of this
patch set, just like following:
+   sys_ctrl: sys_ctrl {
+   compatible = "hisilicon,sysctrl", "syscon";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   reg = <0x0 0xf703 0x0 0x2000>;
+   ranges = <0 0x0 0xf703 0x2000>;
+
+   clock_sys: clock1@0 {
+   compatible = "hisilicon,hi6220-clock-sys";
+   reg = <0 0x1000>;
+   #clock-cells = <1>;
+   };
+   };

Thanks,

Bintian


Arnd

.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] an introduction of library operating system for Linux (LibOS)

2015-04-13 Thread Hajime Tazaki

At Thu, 09 Apr 2015 10:36:23 +0200,
Richard Weinberger wrote:
> 
> Am 31.03.2015 um 09:47 schrieb Hajime Tazaki:
> > right now arch/lib/Makefile isn't fully on the Kbuild
> > system: build file dependency is not tracked at all.
> > 
> > while I should learn more about Kbuild, I'd be happy if you
> > would suggest how the Makefile should be.
> 
> You definitely have to use Kbuild.
> Please bite the bullet and dig into it. Maybe we
> need also new functions in Kbuild to support a library mode.
> Who knows? ;)

Thanks Richard for the comment: I've been struggling on this
and created a github issue to struggle more ;)

https://github.com/libos-nuse/net-next-nuse/issues/26

I'd be happy if any of you're interested to tackle
Kbuild-ish Makefile for LibOS.

I'll be back here once I come up with the nice one.

-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-13 Thread Jason Low
On Mon, 2015-04-13 at 15:49 -0700, Jason Low wrote:

> hmm, so taking a look at the patch again, it looks like we pass nohz
> balance even when the NOHZ_BALANCE_KICK is not set on the current CPU.
> We should separate the 2 conditions:
> 
> if (!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> return;
> 
> if (idle != CPU_IDLE) {
> /* another CPU continue balancing */
> pass_nohz_balance(this_rq, this_cpu);
> return;
> }

Here's the example patch with the above update.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ffeaa41..9aa48f7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7622,6 +7622,16 @@ out:
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static inline bool nohz_kick_needed(struct rq *rq);
+
+static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
+{
+   clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+   nohz.next_balance = jiffies;
+   if (nohz_kick_needed(this_rq))
+   nohz_balancer_kick();
+}
+
 /*
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
@@ -7632,9 +7642,13 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
struct rq *rq;
int balance_cpu;
 
-   if (idle != CPU_IDLE ||
-   !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
-   goto end;
+   if (!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
+   return;
+
+   if (idle != CPU_IDLE) {
+   pass_nohz_balance(this_rq, this_cpu);
+   return;
+   }
 
for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
@@ -7645,8 +7659,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
 * work being done for other cpus. Next load
 * balancing owner will pick it up.
 */
-   if (need_resched())
-   break;
+   if (need_resched()) {
+   pass_nohz_balance(this_rq, this_cpu);
+   return;
+   }
 
rq = cpu_rq(balance_cpu);
 
@@ -7666,7 +7682,6 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
this_rq->next_balance = rq->next_balance;
}
nohz.next_balance = this_rq->next_balance;
-end:
clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
 }
 
@@ -7689,7 +7704,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
int nr_busy, cpu = rq->cpu;
bool kick = false;
 
-   if (unlikely(rq->idle_balance))
+   if (unlikely(idle_cpu(cpu)))
return false;
 
/*
@@ -7709,7 +7724,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
if (time_before(now, nohz.next_balance))
return false;
 
-   if (rq->nr_running >= 2)
+   if (rq->nr_running >= 2 || rq->rd->overload)
return true;
 
rcu_read_lock();
@@ -7759,16 +7774,14 @@ static void run_rebalance_domains(struct softirq_action 
*h)
enum cpu_idle_type idle = this_rq->idle_balance ?
CPU_IDLE : CPU_NOT_IDLE;
 
+   rebalance_domains(this_rq, idle);
+
/*
 * If this cpu has a pending nohz_balance_kick, then do the
 * balancing on behalf of the other idle cpus whose ticks are
-* stopped. Do nohz_idle_balance *before* rebalance_domains to
-* give the idle cpus a chance to load balance. Else we may
-* load balance only within the local sched_domain hierarchy
-* and abort nohz_idle_balance altogether if we pull some load.
+* stopped.
 */
nohz_idle_balance(this_rq, idle);
-   rebalance_domains(this_rq, idle);
 }
 
 /*




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] perf kmem: Show warning when trying to run stat without record

2015-04-13 Thread Namhyung Kim
Sometimes one can mistakenly run perf kmem stat without perf kmem
record before or different configuration like recoding --slab and stat
--page.  Show a warning message like below to inform user:

  # perf kmem stat --page --caller
  Not found page events.  Have you run 'perf kmem record --page' before?

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-kmem.c | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index f0d018179e1c..ddb6ccb88b45 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -1882,6 +1882,7 @@ int cmd_kmem(int argc, const char **argv, const char 
*prefix __maybe_unused)
};
struct perf_session *session;
int ret = -1;
+   const char errmsg[] = "Not found %s events.  Have you run 'perf kmem 
record --%s' before?\n";
 
perf_config(kmem_config, NULL);
argc = parse_options_subcommand(argc, argv, kmem_options,
@@ -1908,11 +1909,35 @@ int cmd_kmem(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (session == NULL)
return -1;
 
+   if (kmem_slab) {
+   struct perf_evsel *evsel;
+   bool found = false;
+
+   evlist__for_each(session->evlist, evsel) {
+   if (!strcmp(perf_evsel__name(evsel), "kmem:kmalloc")) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   pr_err(errmsg, "slab", "slab");
+   return -1;
+   }
+   }
+
if (kmem_page) {
-   struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+   struct perf_evsel *evsel;
+   bool found = false;
 
-   if (evsel == NULL || evsel->tp_format == NULL) {
-   pr_err("invalid event found.. aborting\n");
+   evlist__for_each(session->evlist, evsel) {
+   if (!strcmp(perf_evsel__name(evsel),
+   "kmem:mm_page_alloc")) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   pr_err(errmsg, "page", "page");
return -1;
}
 
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] perf kmem: Add --live option for current allocation stat

2015-04-13 Thread Namhyung Kim
Currently perf kmem shows total (page) allocation stat by default, but
sometimes one might want to see live (total alloc-only) requests/pages
only.  The new --live option does this by subtracting freed allocation
from the stat.

Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-kmem.txt |   5 ++
 tools/perf/builtin-kmem.c  | 110 -
 2 files changed, 73 insertions(+), 42 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt 
b/tools/perf/Documentation/perf-kmem.txt
index 69e181272c51..ff0f433b3fce 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -56,6 +56,11 @@ OPTIONS
 --page::
Analyze page allocator events
 
+--live::
+   Show live page stat.  The perf kmem shows total allocation stat by
+   default, but this option shows live (currently allocated) pages
+   instead.  (This option works with --page option only)
+
 SEE ALSO
 
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index a9dd73f2a5d9..44a100caa172 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -244,6 +244,7 @@ static unsigned long nr_page_fails;
 static unsigned long nr_page_nomatch;
 
 static bool use_pfn;
+static bool live_page;
 static struct perf_session *kmem_session;
 
 #define MAX_MIGRATE_TYPES  6
@@ -264,7 +265,7 @@ struct page_stat {
int nr_free;
 };
 
-static struct rb_root page_tree;
+static struct rb_root page_live_tree;
 static struct rb_root page_alloc_tree;
 static struct rb_root page_alloc_sorted;
 static struct rb_root page_caller_tree;
@@ -403,10 +404,19 @@ static u64 find_callsite(struct perf_evsel *evsel, struct 
perf_sample *sample)
return sample->ip;
 }
 
+struct sort_dimension {
+   const char  name[20];
+   sort_fn_t   cmp;
+   struct list_headlist;
+};
+
+static LIST_HEAD(page_alloc_sort_input);
+static LIST_HEAD(page_caller_sort_input);
+
 static struct page_stat *
-__page_stat__findnew_page(u64 page, bool create)
+__page_stat__findnew_page(struct page_stat *this, bool create)
 {
-   struct rb_node **node = &page_tree.rb_node;
+   struct rb_node **node = &page_live_tree.rb_node;
struct rb_node *parent = NULL;
struct page_stat *data;
 
@@ -416,7 +426,7 @@ __page_stat__findnew_page(u64 page, bool create)
parent = *node;
data = rb_entry(*node, struct page_stat, node);
 
-   cmp = data->page - page;
+   cmp = data->page - this->page;
if (cmp < 0)
node = &parent->rb_left;
else if (cmp > 0)
@@ -430,34 +440,28 @@ __page_stat__findnew_page(u64 page, bool create)
 
data = zalloc(sizeof(*data));
if (data != NULL) {
-   data->page = page;
+   data->page = this->page;
+   data->order = this->order;
+   data->migrate_type = this->migrate_type;
+   data->gfp_flags = this->gfp_flags;
 
rb_link_node(&data->node, parent, node);
-   rb_insert_color(&data->node, &page_tree);
+   rb_insert_color(&data->node, &page_live_tree);
}
 
return data;
 }
 
-static struct page_stat *page_stat__find_page(u64 page)
+static struct page_stat *page_stat__find_page(struct page_stat *stat)
 {
-   return __page_stat__findnew_page(page, false);
+   return __page_stat__findnew_page(stat, false);
 }
 
-static struct page_stat *page_stat__findnew_page(u64 page)
+static struct page_stat *page_stat__findnew_page(struct page_stat *stat)
 {
-   return __page_stat__findnew_page(page, true);
+   return __page_stat__findnew_page(stat, true);
 }
 
-struct sort_dimension {
-   const char  name[20];
-   sort_fn_t   cmp;
-   struct list_headlist;
-};
-
-static LIST_HEAD(page_alloc_sort_input);
-static LIST_HEAD(page_caller_sort_input);
-
 static struct page_stat *
 __page_stat__findnew_alloc(struct page_stat *this, bool create)
 {
@@ -615,17 +619,8 @@ static int perf_evsel__process_page_alloc_event(struct 
perf_evsel *evsel,
 * This is to find the current page (with correct gfp flags and
 * migrate type) at free event.
 */
-   stat = page_stat__findnew_page(page);
-   if (stat == NULL)
-   return -ENOMEM;
-
-   stat->order = order;
-   stat->gfp_flags = gfp_flags;
-   stat->migrate_type = migrate_type;
-   stat->callsite = callsite;
-
this.page = page;
-   stat = page_stat__findnew_alloc(&this);
+   stat = page_stat__findnew_page(&this);
if (stat == NULL)
return -ENOMEM;
 
@@ -633,6 +628,16 @@ static int perf_evsel__process_page_alloc_event(struct 
perf_evsel *evsel,
stat->alloc_bytes += bytes;
stat->callsite = callsite;
 
+   if (!live_page) {
+

[PATCHSET 0/6] perf kmem: Implement page allocation analysis (v7)

2015-04-13 Thread Namhyung Kim
Hello,

Currently perf kmem command only analyzes SLAB memory allocation.  And
I'd like to introduce page allocation analysis also.  Users can use
 --slab and/or --page option to select it.  If none of these options
 are used, it does slab allocation analysis for backward compatibility.

 * changes in v7)
   - drop already merged patches
   - check return value of map__load()  (Arnaldo)
   - rename to page_stat__findnew_*() functions  (Arnaldo)
   - show warning when try to run stat before record
   
 * changes in v6)
   - add -i option fix  (Jiri)
   - libtraceevent operator priority fix

* changes in v5)
   - print migration type and gfp flags in more compact form  (Arnaldo)
   - add kmem.default config option

 * changes in v4)
   - use pfn instead of struct page * in tracepoints  (Joonsoo, Ingo)
   - print gfp flags in human readable string  (Joonsoo, Minchan)

* changes in v3)
  - add live page statistics

 * changes in v2)
   - Use thousand grouping for big numbers - i.e. 12345 -> 12,345  (Ingo)
   - Improve output stat readability  (Ingo)
   - Remove alloc size column as it can be calculated from hits and order

In this patchset, I used two kmem events: kmem:mm_page_alloc and
kmem_page_free for analysis as they can track almost all of memory
allocation/free path AFAIK.  However, unlike slab tracepoint events,
those page allocation events don't provide callsite info directly.  So
I recorded callchains and extracted callsites like below:

Normal page allocation callchains look like this:

  360a7e __alloc_pages_nodemask
  3a711c alloc_pages_current
  357bc7 __page_cache_alloc   <-- callsite
  357cf6 pagecache_get_page
   48b0a prepare_pages
   494d3 __btrfs_buffered_write
   49cdf btrfs_file_write_iter
  3ceb6e new_sync_write
  3cf447 vfs_write
  3cff99 sys_write
  7556e9 system_call
f880 __write_nocancel
   33eb9 cmd_record
   4b38e cmd_kmem
   7aa23 run_builtin
   27a9a main
   20800 __libc_start_main

But first two are internal page allocation functions so it should be
skipped.  To determine such allocation functions, I used following regex:

  ^_?_?(alloc|get_free|get_zeroed)_pages?
This gave me a following list of functions (you can see this with -v):

  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid

After skipping those function, it got '__page_cache_alloc'.

Other information such as allocation order, migration type and gfp
flags are provided by tracepoint events.

Basically the output will be sorted by total allocation bytes, but you
can change it by using -s/--sort option.  The following sort keys are
added to support page analysis: page, order, migtype, gfp.  Existing
'callsite', 'bytes' and 'hit' sort keys also can be used.

An example follows:

  # perf kmem record --page sleep 5
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1.065 MB perf.data (2949 samples) ]

  # perf kmem stat --page --caller -l 10
  #
  # GFP flags
  # -
  # 0010: NI: GFP_NOIO
  # 00d0:  K: GFP_KERNEL
  # 0200:NWR: GFP_NOWARN
  # 52d0: K|NWR|NR|C: GFP_KERNEL|GFP_NOWARN|GFP_NORETRY|GFP_COMP
  # 84d0:  K|R|Z: GFP_KERNEL|GFP_REPEAT|GFP_ZERO
  # 000200d0:  U: GFP_USER
  # 000200d2: HU: GFP_HIGHUSER
  # 000200da:HUM: GFP_HIGHUSER_MOVABLE
  # 000280da:  HUM|Z: GFP_HIGHUSER_MOVABLE|GFP_ZERO
  # 002084d0:   K|R|Z|NT: GFP_KERNEL|GFP_REPEAT|GFP_ZERO|GFP_NOTRACK
  # 0102005a:NF|HW|M: GFP_NOFS|GFP_HARDWALL|GFP_MOVABLE
  
-
   Total alloc (KB) | Hits  | Order | Mig.type | GFP flags  | Callsite
  
-
 16 | 1 | 2 | UNMOVABL | K|NWR|NR|C | 
alloc_skb_with_frags
 24 | 3 | 1 | UNMOVABL | K|NWR|NR|C | 
alloc_skb_with_frags
  3,876 |   969 | 0 |  MOVABLE | HUM| 
shmem_alloc_page
972 |   243 | 0 | UNMOVABL | K  | __pollwait
624 |   156 | 0 |  MOVABLE | NF|HW|M| 
__page_cache_alloc
304 |76 | 0 | UNMOVABL | U  | 
dma_generic_alloc_coherent
108 |27 | 0 |  MOVABLE | HUM|Z  | 
handle_mm_fault
 56 |14 | 0 | UNMOVABL | K|R|Z|NT   | pte_alloc_one
 24 | 6 | 0 |  MOVABLE | HUM| do_wp_page
 16 | 4 | 0 | UNMOVABL | NWR| 
__tlb_remove_page
   ...  | ...   | ...   | ...  | ...| ...
  
--

[PATCH 1/6] perf kmem: Implement stat --page --caller

2015-04-13 Thread Namhyung Kim
It perf kmem support caller statistics for page.  Unlike slab case,
the tracepoints in page allocator don't provide callsite info.  So
it records with callchain and extracts callsite info.

Note that the callchain contains several memory allocation functions
which has no meaning for users.  So skip those functions to get proper
callsites.  I used following regex pattern to skip the allocator
functions:

  ^_?_?(alloc|get_free|get_zeroed)_pages?

This gave me a following list of functions:

  # perf kmem record --page sleep 3
  # perf kmem stat --page -v
  ...
  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid
  ...

The output looks mostly same as --alloc (I also added callsite column
to that) but groups entries by callsite.  Currently, the order,
migrate type and GFP flag info is for the last allocation and not
guaranteed to be same for all allocations from the callsite.

  
-
   Total_alloc (KB) | Hits  | Order | Mig.type | GFP flags | Callsite
  
-
  1,064 |   266 | 0 | UNMOVABL |  00d0 | __pollwait
 52 |13 | 0 | UNMOVABL |  002084d0 | pte_alloc_one
 44 |11 | 0 |  MOVABLE |  000280da | handle_mm_fault
 20 | 5 | 0 |  MOVABLE |  000200da | do_cow_fault
 20 | 5 | 0 |  MOVABLE |  000200da | do_wp_page
 16 | 4 | 0 | UNMOVABL |  84d0 | __pmd_alloc
 16 | 4 | 0 | UNMOVABL |  0200 | 
__tlb_remove_page
 12 | 3 | 0 | UNMOVABL |  84d0 | __pud_alloc
  8 | 2 | 0 | UNMOVABL |  0010 | 
bio_copy_user_iov
  4 | 1 | 0 | UNMOVABL |  000200d2 | pipe_write
  4 | 1 | 0 |  MOVABLE |  000280da | do_wp_page
  4 | 1 | 0 | UNMOVABL |  002084d0 | pgd_alloc
  
-

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-kmem.c | 327 +++---
 1 file changed, 306 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 63ea01349b6e..cda36c5533d7 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -10,6 +10,7 @@
 #include "util/header.h"
 #include "util/session.h"
 #include "util/tool.h"
+#include "util/callchain.h"
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
@@ -21,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int kmem_slab;
 static int kmem_page;
@@ -241,6 +243,7 @@ static unsigned long nr_page_fails;
 static unsigned long nr_page_nomatch;
 
 static bool use_pfn;
+static struct perf_session *kmem_session;
 
 #define MAX_MIGRATE_TYPES  6
 #define MAX_PAGE_ORDER 11
@@ -250,6 +253,7 @@ static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
 struct page_stat {
struct rb_node  node;
u64 page;
+   u64 callsite;
int order;
unsignedgfp_flags;
unsignedmigrate_type;
@@ -262,8 +266,144 @@ struct page_stat {
 static struct rb_root page_tree;
 static struct rb_root page_alloc_tree;
 static struct rb_root page_alloc_sorted;
+static struct rb_root page_caller_tree;
+static struct rb_root page_caller_sorted;
 
-static struct page_stat *search_page(unsigned long page, bool create)
+struct alloc_func {
+   u64 start;
+   u64 end;
+   char *name;
+};
+
+static int nr_alloc_funcs;
+static struct alloc_func *alloc_func_list;
+
+static int funcmp(const void *a, const void *b)
+{
+   const struct alloc_func *fa = a;
+   const struct alloc_func *fb = b;
+
+   if (fa->start > fb->start)
+   return 1;
+   else
+   return -1;
+}
+
+static int callcmp(const void *a, const void *b)
+{
+   const struct alloc_func *fa = a;
+   const struct alloc_func *fb = b;
+
+   if (fb->start <= fa->start && fa->end < fb->end)
+   return 0;
+
+   if (fa->start > fb->start)
+   return 1;
+   else
+   return -1;
+}
+
+static int build_alloc_func_list(void)
+{
+   int ret;
+   struct map *kernel_map;
+   struct symbol *sym;
+   struct rb_node *node;
+   struct alloc_func *func;
+   struct machine *machine = &kmem_session->machines.host;
+   regex_t alloc_func_regex;
+   const char pattern[]

[PATCH 2/6] perf kmem: Support sort keys on page analysis

2015-04-13 Thread Namhyung Kim
Add new sort keys for page: page, order, migtype, gfp - existing
'bytes', 'hit' and 'callsite' sort keys also work for page.  Note that
-s/--sort option should be preceded by either of --slab or --page
option to determine where the sort keys applies.

Now it properly groups and sorts allocation stats - so same
page/caller with different order/migtype/gfp will be printed on a
different line.

  # perf kmem stat --page --caller -l 10 -s order,hit

  

   Total alloc (KB) |  Hits | Order | Mig.type | GFP flags | Callsite
  

 64 | 4 | 2 |  RECLAIM |  00285250 | new_slab
 50,144 |12,536 | 0 |  MOVABLE |  0102005a | 
__page_cache_alloc
 52 |13 | 0 | UNMOVABL |  002084d0 | pte_alloc_one
 40 |10 | 0 |  MOVABLE |  000280da | handle_mm_fault
 28 | 7 | 0 | UNMOVABL |  00d0 | __pollwait
 20 | 5 | 0 |  MOVABLE |  000200da | do_wp_page
 20 | 5 | 0 |  MOVABLE |  000200da | do_cow_fault
 16 | 4 | 0 | UNMOVABL |  0200 | 
__tlb_remove_page
 16 | 4 | 0 | UNMOVABL |  84d0 | __pmd_alloc
  8 | 2 | 0 | UNMOVABL |  84d0 | __pud_alloc
   ...  | ...   | ...   | ...  | ...   | ...
  


Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-kmem.txt |   6 +-
 tools/perf/builtin-kmem.c  | 403 +
 2 files changed, 318 insertions(+), 91 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt 
b/tools/perf/Documentation/perf-kmem.txt
index 23219c65c16f..69e181272c51 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -37,7 +37,11 @@ OPTIONS
 
 -s ::
 --sort=::
-   Sort the output (default: frag,hit,bytes)
+   Sort the output (default: 'frag,hit,bytes' for slab and 'bytes,hit'
+   for page).  Available sort keys are 'ptr, callsite, bytes, hit,
+   pingpong, frag' for slab and 'page, callsite, bytes, hit, order,
+   migtype, gfp' for page.  This option should be preceded by one of the
+   mode selection options - i.e. --slab, --page, --alloc and/or --caller.
 
 -l ::
 --line=::
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index cda36c5533d7..a9dd73f2a5d9 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -30,7 +30,7 @@ static intkmem_page;
 static longkmem_page_size;
 
 struct alloc_stat;
-typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
+typedef int (*sort_fn_t)(void *, void *);
 
 static int alloc_flag;
 static int caller_flag;
@@ -181,8 +181,8 @@ static int perf_evsel__process_alloc_node_event(struct 
perf_evsel *evsel,
return ret;
 }
 
-static int ptr_cmp(struct alloc_stat *, struct alloc_stat *);
-static int callsite_cmp(struct alloc_stat *, struct alloc_stat *);
+static int ptr_cmp(void *, void *);
+static int slab_callsite_cmp(void *, void *);
 
 static struct alloc_stat *search_alloc_stat(unsigned long ptr,
unsigned long call_site,
@@ -223,7 +223,8 @@ static int perf_evsel__process_free_event(struct perf_evsel 
*evsel,
s_alloc->pingpong++;
 
s_caller = search_alloc_stat(0, s_alloc->call_site,
-&root_caller_stat, callsite_cmp);
+&root_caller_stat,
+slab_callsite_cmp);
if (!s_caller)
return -1;
s_caller->pingpong++;
@@ -448,41 +449,35 @@ static struct page_stat *page_stat__findnew_page(u64 page)
return __page_stat__findnew_page(page, true);
 }
 
-static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
-{
-   if (a->page > b->page)
-   return -1;
-   if (a->page < b->page)
-   return 1;
-   if (a->order > b->order)
-   return -1;
-   if (a->order < b->order)
-   return 1;
-   if (a->migrate_type > b->migrate_type)
-   return -1;
-   if (a->migrate_type < b->migrate_type)
-   return 1;
-   if (a->gfp_flags > b->gfp_flags)
-   return -1;
-   if (a->gfp_flags < b->gfp_flags)
-   return 1;
-   return 0;
-}
+struct sort_dimension {
+   const char  name[20];
+   sort_fn_t   cmp;
+   struct list_headlist;
+};
+
+static LIST_HEAD(page_alloc_sort_input);
+static 

[PATCH 4/6] perf kmem: Print gfp flags in human readable string

2015-04-13 Thread Namhyung Kim
Save libtraceevent output and print it in the header.

  # perf kmem stat --page --caller
  #
  # GFP flags
  # -
  # 0010:   NI: GFP_NOIO
  # 00d0:K: GFP_KERNEL
  # 0200:  NWR: GFP_NOWARN
  # 84d0:K|R|Z: GFP_KERNEL|GFP_REPEAT|GFP_ZERO
  # 000200d2:   HU: GFP_HIGHUSER
  # 000200da:  HUM: GFP_HIGHUSER_MOVABLE
  # 000280da:HUM|Z: GFP_HIGHUSER_MOVABLE|GFP_ZERO
  # 002084d0: K|R|Z|NT: GFP_KERNEL|GFP_REPEAT|GFP_ZERO|GFP_NOTRACK
  # 0102005a:  NF|HW|M: GFP_NOFS|GFP_HARDWALL|GFP_MOVABLE

  
-
   Total alloc (KB) | Hits  | Order | Mig.type | GFP flags | Callsite
  
-
 60 |15 | 0 | UNMOVABL | K|R|Z|NT  | pte_alloc_one
 40 |10 | 0 |  MOVABLE | HUM|Z | handle_mm_fault
 24 | 6 | 0 |  MOVABLE | HUM   | do_wp_page
 24 | 6 | 0 | UNMOVABL | K | __pollwait
   ...

Requested-by: Joonsoo Kim 
Suggested-by: Minchan Kim 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-kmem.c | 222 +++---
 1 file changed, 209 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 44a100caa172..8c1673961067 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -581,6 +581,176 @@ static bool valid_page(u64 pfn_or_page)
return true;
 }
 
+struct gfp_flag {
+   unsigned int flags;
+   char *compact_str;
+   char *human_readable;
+};
+
+static struct gfp_flag *gfps;
+static int nr_gfps;
+
+static int gfpcmp(const void *a, const void *b)
+{
+   const struct gfp_flag *fa = a;
+   const struct gfp_flag *fb = b;
+
+   return fa->flags - fb->flags;
+}
+
+/* see include/trace/events/gfpflags.h */
+static const struct {
+   const char *original;
+   const char *compact;
+} gfp_compact_table[] = {
+   { "GFP_TRANSHUGE",  "THP" },
+   { "GFP_HIGHUSER_MOVABLE",   "HUM" },
+   { "GFP_HIGHUSER",   "HU" },
+   { "GFP_USER",   "U" },
+   { "GFP_TEMPORARY",  "TMP" },
+   { "GFP_KERNEL", "K" },
+   { "GFP_NOFS",   "NF" },
+   { "GFP_ATOMIC", "A" },
+   { "GFP_NOIO",   "NI" },
+   { "GFP_HIGH",   "H" },
+   { "GFP_WAIT",   "W" },
+   { "GFP_IO", "I" },
+   { "GFP_COLD",   "CO" },
+   { "GFP_NOWARN", "NWR" },
+   { "GFP_REPEAT", "R" },
+   { "GFP_NOFAIL", "NF" },
+   { "GFP_NORETRY","NR" },
+   { "GFP_COMP",   "C" },
+   { "GFP_ZERO",   "Z" },
+   { "GFP_NOMEMALLOC", "NMA" },
+   { "GFP_MEMALLOC",   "MA" },
+   { "GFP_HARDWALL",   "HW" },
+   { "GFP_THISNODE",   "TN" },
+   { "GFP_RECLAIMABLE","RC" },
+   { "GFP_MOVABLE","M" },
+   { "GFP_NOTRACK","NT" },
+   { "GFP_NO_KSWAPD",  "NK" },
+   { "GFP_OTHER_NODE", "ON" },
+   { "GFP_NOWAIT", "NW" },
+};
+
+static size_t max_gfp_len;
+
+static char *compact_gfp_flags(char *gfp_flags)
+{
+   char *orig_flags = strdup(gfp_flags);
+   char *new_flags = NULL;
+   char *str, *pos;
+   size_t len = 0;
+
+   if (orig_flags == NULL)
+   return NULL;
+
+   str = strtok_r(orig_flags, "|", &pos);
+   while (str) {
+   size_t i;
+   char *new;
+   const char *cpt;
+
+   for (i = 0; i < ARRAY_SIZE(gfp_compact_table); i++) {
+   if (strcmp(gfp_compact_table[i].original, str))
+   continue;
+
+   cpt = gfp_compact_table[i].compact;
+   new = realloc(new_flags, len + strlen(cpt) + 2);
+   if (new == NULL) {
+   free(new_flags);
+   return NULL;
+   }
+
+   new_flags = new;
+
+   if (!len) {
+   strcpy(new_flags, cpt);
+   } else {
+   strcat(new_flags, "|");
+   strcat(new_flags, cpt);
+   len++;
+   }
+
+   len += strlen(cpt);
+   }
+
+   str = strtok_r(NULL, "|", &pos);
+   }
+
+   if (max_gfp_len < len)
+   max_gfp_len = len;
+
+   free(orig_flags);
+

[PATCH 5/6] perf kmem: Add kmem.default config option

2015-04-13 Thread Namhyung Kim
Currently perf kmem command will select --slab if neither --slab nor
--page is given for backward compatibility.  Add kmem.default config
option to select the default value ('page' or 'slab').

  # cat ~/.perfconfig
  [kmem]
default = page

  # perf kmem stat

  SUMMARY (page allocator)
  
  Total allocation requests :1,518   [6,096 KB ]
  Total free requests   :1,431   [5,748 KB ]

  Total alloc+freed requests:1,330   [5,344 KB ]
  Total alloc-only requests :  188   [  752 KB ]
  Total free-only requests  :  101   [  404 KB ]

  Total allocation failures :0   [0 KB ]
  ...

Cc: Taeung Song 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-kmem.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 8c1673961067..f0d018179e1c 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -28,6 +28,10 @@ static int   kmem_slab;
 static int kmem_page;
 
 static longkmem_page_size;
+static enum {
+   KMEM_SLAB,
+   KMEM_PAGE,
+} kmem_default = KMEM_SLAB;  /* for backward compatibility */
 
 struct alloc_stat;
 typedef int (*sort_fn_t)(void *, void *);
@@ -1710,7 +1714,8 @@ static int parse_sort_opt(const struct option *opt 
__maybe_unused,
if (!arg)
return -1;
 
-   if (kmem_page > kmem_slab) {
+   if (kmem_page > kmem_slab ||
+   (kmem_page == 0 && kmem_slab == 0 && kmem_default == KMEM_PAGE)) {
if (caller_flag > alloc_flag)
return setup_page_sorting(&page_caller_sort, arg);
else
@@ -1826,6 +1831,22 @@ static int __cmd_record(int argc, const char **argv)
return cmd_record(i, rec_argv, NULL);
 }
 
+static int kmem_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "kmem.default")) {
+   if (!strcmp(value, "slab"))
+   kmem_default = KMEM_SLAB;
+   else if (!strcmp(value, "page"))
+   kmem_default = KMEM_PAGE;
+   else
+   pr_err("invalid default value ('slab' or 'page' 
required): %s\n",
+  value);
+   return 0;
+   }
+
+   return perf_default_config(var, value, cb);
+}
+
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
const char * const default_slab_sort = "frag,hit,bytes";
@@ -1862,14 +1883,19 @@ int cmd_kmem(int argc, const char **argv, const char 
*prefix __maybe_unused)
struct perf_session *session;
int ret = -1;
 
+   perf_config(kmem_config, NULL);
argc = parse_options_subcommand(argc, argv, kmem_options,
kmem_subcommands, kmem_usage, 0);
 
if (!argc)
usage_with_options(kmem_usage, kmem_options);
 
-   if (kmem_slab == 0 && kmem_page == 0)
-   kmem_slab = 1;  /* for backward compatibility */
+   if (kmem_slab == 0 && kmem_page == 0) {
+   if (kmem_default == KMEM_SLAB)
+   kmem_slab = 1;
+   else
+   kmem_page = 1;
+   }
 
if (!strncmp(argv[0], "rec", 3)) {
symbol__init(NULL);
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 10/10] module: Rework module_addr_{min,max}

2015-04-13 Thread Rusty Russell
Ingo Molnar  writes:
> * Peter Zijlstra  wrote:
>
>> __module_address() does an initial bound check before doing the 
>> {list/tree} iteration to find the actual module. The bound variables 
>> are nowhere near the mod_tree cacheline, in fact they're nowhere 
>> near one another.
>> 
>> module_addr_min lives in .data while module_addr_max lives in .bss 
>> (smarty pants GCC thinks the explicit 0 assignment is a mistake).
>> 
>> Rectify this by moving the two variables into a structure together 
>> with the latch_tree_root to guarantee they all share the same 
>> cacheline and avoid hitting two extra cachelines for the lookup.
>> 
>> While reworking the bounds code, move the bound update from 
>> allocation to insertion time, this avoids updating the bounds for a 
>> few error paths.
>
>> +static struct mod_tree_root {
>> +struct latch_tree_root root;
>> +unsigned long addr_min;
>> +unsigned long addr_max;
>> +} mod_tree __cacheline_aligned = {
>> +.addr_min = -1UL,
>> +};
>> +
>> +#define module_addr_min mod_tree.addr_min
>> +#define module_addr_max mod_tree.addr_max

Nice catch.

Does the min/max comparison still win us anything?  (I'm guessing yes...)

In general, I'm happy with this series.  Assume you want another
go-round for Ingo's tweaks, then I'll take them for 4.2.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/10] latched RB-trees and __module_address()

2015-04-13 Thread Rusty Russell
Peter Zijlstra  writes:
> This series is aimed at making __module_address() go fast(er).
>
> The reason for doing so is that most stack unwinders use kernel_text_address()
> to validate each frame. Perf and ftrace (can) end up doing a lot of stack
> traces from performance sensitive code.
>
> On the way there it:
>  - annotates and sanitizes module locking
>  - introduces the latched RB-tree
>  - employs it to make __module_address() go fast.
>
> I've build and boot tested this on x86_64 with modules and lockdep
> enabled.  Performance numbers (below) are done with lockdep disabled.
>
> As previously mentioned; the reason for writing the latched RB-tree as generic
> code is mostly for clarity/documentation purposes; as there are a number of
> separate and non trivial bits to the complete solution.
>
> As measued on my ivb-ep system with 84 modules loaded; prior to patching
> the test module (below) reports (cache hot, performance cpufreq):
>
>   avg +- stdev
> Before:   611 +- 10 [ns] per __module_address() call
> After: 17 +-  5 [ns] per __module_address() call
>
> PMI measurements for a cpu running loops in a module (also [ns]):
>
> Before:   Mean: 2719 +- 1, Stdev: 214, Samples: 40036
> After:  Mean:  947 +- 0, Stdev: 132, Samples: 40037
>
> Note; I have also tested things like: perf record -a -g modprobe
> mod_test, to make 'sure' to hit some of the more interesting paths.
>
> Changes since last time:
>
>  - reworked generic latch_tree API (Lai Jiangshan)
>  - reworked module bounds (me)
>  - reworked all the testing code (not included)
>
> Rusty, please consider merging this (for 4.2, I know its the merge window, no
> rush)

I was tempted to sneak in those module rcu fixes for 4.1, but seeing
Ingo's comments I'll wait for 4.2.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the idle tree with the pm tree

2015-04-13 Thread Stephen Rothwell
Hi Len,

Today's linux-next merge of the idle tree got a conflict in
tools/power/x86/turbostat/turbostat.c between commits from the pm tree
and similar commits from the idle tree.

There seem to be two differnet version of these patches, so I just
dropped the idle tree for today, please sort this out.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpn2iXL3O6Zl.pgp
Description: OpenPGP digital signature


[PATCH 2/2] More precise timestamps for nested writes

2015-04-13 Thread Suresh E. Warrier
When tracing the behavior of multiple fio jobs running in parallel
our performance team observed that some scsi_dispatch_cmd_done events
appeared to occur earlier, often several microseconds earlier, than
their associated scsi_dispatch_cmd_start event in the trace records.
Other interrupt events were also observed to have earlier timestamps
than the events that caused them. 

This incorrect chronological ordering of trace records occurs because
ALL nested writes have the same time stamp as the first writer, that is
the first writer in the stack that was preempted. In workloads where
interrupts occur frequently, the first writer can stay preempted across
multiple interrupts and nested trace events can record time stamps that
are many microseconds earlier than their actual value resulting in 
the wrong ordering.

For example, a scsi_dispatch_cmd_start on CPU A occurs, say at time
t1, and the corresponding scsi_dispatch_cmd_done occurs on another
CPU B, say at time t2. However, CPU B is in the middle of a nested
write with the first writer having recorded its event at time t0, where  
t0 < t1 < t2. Eventually, we get out of all the nested writes and
the first writer commits its trace event with a time of t0 and the
the nested scsi_dispatch_cmd_done gets assigned the same timestamp t0.
In the trace report, the scsi_dispatch_cmd_done thus appears to have
occurred before the scsi_dispatch_cmd_start.

In some cases, on larger systems with multiple fio jobs running and
interrupts being sent to a single CPU, we have noticed more than
400 trace events all with the same timestamp.

The reason all nested writes have the same time stamp is because
all nested trace events are assigned a time delta of 0. A correct
time delta cannot be computed for them because the first interrupted
write has not been committed yet and so the commit timestamp in the CPU
buffer does not record the time stamp of the just prior event.

One way of fixing this is to keep a timestamp of the last event we
reserved space for in the ring buffer, so that we can compute the
correct time delta for each event. We also need a short critical
section where we cannot be interrupted, in which we:
 1. Read the current time
 2. Compute the time delta for this event from the last time stamp
 3. Update the value of the last time stamp 
 4. Reserve space in the ring buffer for the event, 

Although the ring buffer design is carefully coded to avoid disabling
interrupts, in this case there does not appear to be a practical way
to solve this problem without disabling interrupts for a short time.
To accommodate those architectures where disabling interrupts is
expensive, a kernel tunable (/sys/kernel/debug/tracing/nested_precise_ts)
needs to be set to 1 to enable this behavior and get accurate
timestamps.

Signed-off-by: Suresh Warrier 
---
 kernel/trace/ring_buffer.c | 71 +++---
 1 file changed, 67 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c9b3005..0a2d862 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -482,6 +482,7 @@ struct ring_buffer_per_cpu {
unsigned long   read_bytes;
u64 write_stamp;
u64 read_stamp;
+   u64 last_stamp;
/* ring buffer pages to update, > 0 to add, < 0 to remove */
int nr_pages_to_update;
struct list_headnew_pages; /* new pages to add */
@@ -2010,7 +2011,8 @@ rb_update_event(struct ring_buffer_per_cpu *cpu_buffer,
 {
/* Only a commit updates the timestamp */
if (unlikely(!rb_event_is_commit(cpu_buffer, event)))
-   delta = 0;
+   if (!rb_precise_nested_write_ts())
+   delta = 0;
 
/*
 * If we need to add a timestamp, then we
@@ -2534,6 +2536,60 @@ void rb_disable_precise_nested_write_ts(void)
static_key_slow_dec(&__precise_nested_write_ts);
 }
 
+/**
+ * get_write_timestamp:
+ * Must be called before we read the current time stamp
+ * It returns a pointer to the location of the last
+ * time stamp to be used in the delta calculation.
+ * If  precise nested write timestamps are enabled, it first
+ * disables interrupts on the current processor so that
+ * we can reserve space on the buffer and save the event's
+ * timestamp without being preempted.
+ *
+ * put_write_timestamp:
+ * Must be called after we reserve space in the ring buffer.
+ * If precise nested write timestamps are enabled, it saves the
+ * timestamp in the specified timestamp location (passed in
+ * as *pstamp) so that the nested writers always have a valid
+ * timestamp to compute the timestamp deltas for their events.
+ * This must be done before we re-enable interrupts.
+ *
+ */
+static u64 *get_write_timestam

Re: [PATCH] serial: bfin: ctsrts: enfore Kconfig naming convention

2015-04-13 Thread Sonic Zhang
Acked-by: Sonic Zhang 

On Sun, Apr 12, 2015 at 11:54 PM, Valentin Rothberg
 wrote:
> The CONFIG_ prefix is reserved for Kconfig options in Make and CPP
> syntax; static analysis tools rely on this convention.  This patch
> enforces this behavior for SERIAL_BFIN_{HARD_}CTSRTS.
>
> Signed-off-by: Valentin Rothberg 
> ---
> I found this issue with ./scripts/checkkconfigsymbols.py
> ---
>  arch/blackfin/include/asm/bfin_serial.h |  8 
>  drivers/tty/serial/bfin_uart.c  | 24 
>  2 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/arch/blackfin/include/asm/bfin_serial.h 
> b/arch/blackfin/include/asm/bfin_serial.h
> index d00d732784b1..b550ada7321b 100644
> --- a/arch/blackfin/include/asm/bfin_serial.h
> +++ b/arch/blackfin/include/asm/bfin_serial.h
> @@ -22,9 +22,9 @@
>  defined(CONFIG_BFIN_UART2_CTSRTS) || \
>  defined(CONFIG_BFIN_UART3_CTSRTS)
>  # if defined(BFIN_UART_BF54X_STYLE) || defined(BFIN_UART_BF60X_STYLE)
> -#  define CONFIG_SERIAL_BFIN_HARD_CTSRTS
> +#  define SERIAL_BFIN_HARD_CTSRTS
>  # else
> -#  define CONFIG_SERIAL_BFIN_CTSRTS
> +#  define SERIAL_BFIN_CTSRTS
>  # endif
>  #endif
>
> @@ -50,8 +50,8 @@ struct bfin_serial_port {
>  #elif ANOMALY_05000363
> unsigned int anomaly_threshold;
>  #endif
> -#if defined(CONFIG_SERIAL_BFIN_CTSRTS) || \
> -   defined(CONFIG_SERIAL_BFIN_HARD_CTSRTS)
> +#if defined(SERIAL_BFIN_CTSRTS) || \
> +   defined(SERIAL_BFIN_HARD_CTSRTS)
> int cts_pin;
> int rts_pin;
>  #endif
> diff --git a/drivers/tty/serial/bfin_uart.c b/drivers/tty/serial/bfin_uart.c
> index 155781ece050..ae3cf94b146b 100644
> --- a/drivers/tty/serial/bfin_uart.c
> +++ b/drivers/tty/serial/bfin_uart.c
> @@ -74,8 +74,8 @@ static void bfin_serial_tx_chars(struct bfin_serial_port 
> *uart);
>
>  static void bfin_serial_reset_irda(struct uart_port *port);
>
> -#if defined(CONFIG_SERIAL_BFIN_CTSRTS) || \
> -   defined(CONFIG_SERIAL_BFIN_HARD_CTSRTS)
> +#if defined(SERIAL_BFIN_CTSRTS) || \
> +   defined(SERIAL_BFIN_HARD_CTSRTS)
>  static unsigned int bfin_serial_get_mctrl(struct uart_port *port)
>  {
> struct bfin_serial_port *uart = (struct bfin_serial_port *)port;
> @@ -110,7 +110,7 @@ static irqreturn_t bfin_serial_mctrl_cts_int(int irq, 
> void *dev_id)
> struct bfin_serial_port *uart = dev_id;
> struct uart_port *uport = &uart->port;
> unsigned int status = bfin_serial_get_mctrl(uport);
> -#ifdef CONFIG_SERIAL_BFIN_HARD_CTSRTS
> +#ifdef SERIAL_BFIN_HARD_CTSRTS
>
> UART_CLEAR_SCTS(uart);
> if (uport->hw_stopped) {
> @@ -700,7 +700,7 @@ static int bfin_serial_startup(struct uart_port *port)
>  # endif
>  #endif
>
> -#ifdef CONFIG_SERIAL_BFIN_CTSRTS
> +#ifdef SERIAL_BFIN_CTSRTS
> if (uart->cts_pin >= 0) {
> if (request_irq(gpio_to_irq(uart->cts_pin),
> bfin_serial_mctrl_cts_int,
> @@ -718,7 +718,7 @@ static int bfin_serial_startup(struct uart_port *port)
> gpio_direction_output(uart->rts_pin, 0);
> }
>  #endif
> -#ifdef CONFIG_SERIAL_BFIN_HARD_CTSRTS
> +#ifdef SERIAL_BFIN_HARD_CTSRTS
> if (uart->cts_pin >= 0) {
> if (request_irq(uart->status_irq, bfin_serial_mctrl_cts_int,
> 0, "BFIN_UART_MODEM_STATUS", uart)) {
> @@ -766,13 +766,13 @@ static void bfin_serial_shutdown(struct uart_port *port)
> free_irq(uart->tx_irq, uart);
>  #endif
>
> -#ifdef CONFIG_SERIAL_BFIN_CTSRTS
> +#ifdef SERIAL_BFIN_CTSRTS
> if (uart->cts_pin >= 0)
> free_irq(gpio_to_irq(uart->cts_pin), uart);
> if (uart->rts_pin >= 0)
> gpio_free(uart->rts_pin);
>  #endif
> -#ifdef CONFIG_SERIAL_BFIN_HARD_CTSRTS
> +#ifdef SERIAL_BFIN_HARD_CTSRTS
> if (uart->cts_pin >= 0)
> free_irq(uart->status_irq, uart);
>  #endif
> @@ -788,7 +788,7 @@ bfin_serial_set_termios(struct uart_port *port, struct 
> ktermios *termios,
> unsigned int ier, lcr = 0;
> unsigned long timeout;
>
> -#ifdef CONFIG_SERIAL_BFIN_CTSRTS
> +#ifdef SERIAL_BFIN_CTSRTS
> if (old == NULL && uart->cts_pin != -1)
> termios->c_cflag |= CRTSCTS;
> else if (uart->cts_pin == -1)
> @@ -1110,8 +1110,8 @@ bfin_serial_console_setup(struct console *co, char 
> *options)
> int baud = 57600;
> int bits = 8;
> int parity = 'n';
> -# if defined(CONFIG_SERIAL_BFIN_CTSRTS) || \
> -   defined(CONFIG_SERIAL_BFIN_HARD_CTSRTS)
> +# if defined(SERIAL_BFIN_CTSRTS) || \
> +   defined(SERIAL_BFIN_HARD_CTSRTS)
> int flow = 'r';
>  # else
> int flow = 'n';
> @@ -1322,8 +1322,8 @@ static int bfin_serial_probe(struct platform_device 
> *pdev)
> init_timer(&(uart->rx_dma_timer));
>  #endif
>
> -#if defined(CONFIG_SERIAL_BFIN_CTSRTS) || \
> -   defined(CONFIG_SERIAL_BFIN_HARD_CTSRTS)
> +#if defined(SERIAL_BFIN_CTSRTS) || \
> +   defined(S

[Patch Part2 v5 13/33] x86/irq: Kill irq_cfg.irq_remapped

2015-04-13 Thread Jiang Liu
Now there is no user of irq_cfg.irq_remapped, so kill it.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Greg Kroah-Hartman 
Cc: io...@lists.linux-foundation.org
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Link: 
http://lkml.kernel.org/r/1416901802-24211-23-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/hw_irq.h   |1 -
 drivers/iommu/amd_iommu.c   |1 -
 drivers/iommu/intel_irq_remapping.c |2 --
 3 files changed, 4 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index bbf90fe2a224..88632ea75fe0 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -193,7 +193,6 @@ struct irq_cfg {
u8  vector;
u8  move_in_progress : 1;
 #ifdef CONFIG_IRQ_REMAP
-   u8  remapped : 1;
union {
struct irq_2_iommu irq_2_iommu;
struct irq_2_irte  irq_2_irte;
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 20abdbaf80d9..fa26c742bc39 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4131,7 +4131,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
struct msi_msg *msg = &data->msi_entry;
struct IO_APIC_route_entry *entry;
 
-   irq_cfg->remapped = 1;
data->irq_2_irte.devid = devid;
data->irq_2_irte.index = index + sub_handle;
 
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index e76e5723ae87..21fc899e7c49 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -67,7 +67,6 @@ static int alloc_irte(struct intel_iommu *iommu, int irq,
  struct irq_2_iommu *irq_iommu, u16 count)
 {
struct ir_table *table = iommu->ir_table;
-   struct irq_cfg *cfg = irq_cfg(irq);
unsigned int mask = 0;
unsigned long flags;
int index;
@@ -94,7 +93,6 @@ static int alloc_irte(struct intel_iommu *iommu, int irq,
if (index < 0) {
pr_warn("IR%d: can't allocate an IRTE\n", iommu->seq_id);
} else {
-   cfg->remapped = 1;
irq_iommu->iommu = iommu;
irq_iommu->irte_index =  index;
irq_iommu->sub_handle = 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 32/33] x86/irq: Move irqdomain specific code into asm/irqdomain.h

2015-04-13 Thread Jiang Liu
Now we have dedicated asm/irqdomain.h, so move irqdomain specific
code into it.

Signed-off-by: Jiang Liu 
---
 arch/x86/include/asm/hw_irq.h|   24 ---
 arch/x86/include/asm/irq_remapping.h |2 +-
 arch/x86/include/asm/irqdomain.h |   35 +++---
 arch/x86/kernel/apic/htirq.c |2 +-
 arch/x86/kernel/apic/msi.c   |2 +-
 arch/x86/kernel/apic/vector.c|2 +-
 arch/x86/kernel/hpet.c   |2 +-
 arch/x86/platform/uv/uv_irq.c|2 +-
 8 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e7ae6eb84934..f671d4331e7e 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -94,8 +94,6 @@ extern void trace_call_function_single_interrupt(void);
 #define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
 #endif /* CONFIG_TRACING */
 
-struct irq_domain;
-
 #ifdef CONFIG_X86_LOCAL_APIC
 struct irq_data;
 struct pci_dev;
@@ -165,22 +163,11 @@ struct irq_alloc_info {
};
 };
 
-enum {
-   /* Allocate contigious CPU vectors */
-   X86_IRQ_ALLOC_CONTIGOUS_VECTORS = 0x1,
-};
-
 struct irq_cfg {
unsigned intdest_apicid;
u8  vector;
 };
 
-extern struct irq_domain *x86_vector_domain;
-
-extern void init_irq_alloc_info(struct irq_alloc_info *info,
-   const struct cpumask *mask);
-extern void copy_irq_alloc_info(struct irq_alloc_info *dst,
-   struct irq_alloc_info *src);
 extern struct irq_cfg *irq_cfg(unsigned int irq);
 extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data);
 extern void lock_vector_lock(void);
@@ -200,17 +187,6 @@ static inline void lock_vector_lock(void) {}
 static inline void unlock_vector_lock(void) {}
 #endif /* CONFIG_X86_LOCAL_APIC */
 
-#ifdef CONFIG_PCI_MSI
-extern void arch_init_msi_domain(struct irq_domain *domain);
-#else
-static inline void arch_init_msi_domain(struct irq_domain *domain) { }
-#endif
-#ifdef CONFIG_HT_IRQ
-extern void arch_init_htirq_domain(struct irq_domain *domain);
-#else
-static inline void arch_init_htirq_domain(struct irq_domain *domain) { }
-#endif
-
 /* Statistics */
 extern atomic_t irq_err_count;
 extern atomic_t irq_mis_count;
diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index 09efa358c831..78974fbc33b4 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -22,7 +22,7 @@
 #ifndef __X86_IRQ_REMAPPING_H
 #define __X86_IRQ_REMAPPING_H
 
-#include 
+#include 
 #include 
 #include 
 
diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index fe0d4c6636ec..ae941c3f8c51 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -2,6 +2,25 @@
 #define _ASM_IRQDOMAIN_H
 
 #include 
+#include 
+
+#ifdef CONFIG_X86_LOCAL_APIC
+enum {
+   /* Allocate contigious CPU vectors */
+   X86_IRQ_ALLOC_CONTIGOUS_VECTORS = 0x1,
+};
+
+extern struct irq_domain *x86_vector_domain;
+
+extern void init_irq_alloc_info(struct irq_alloc_info *info,
+   const struct cpumask *mask);
+extern void copy_irq_alloc_info(struct irq_alloc_info *dst,
+   struct irq_alloc_info *src);
+#endif /* CONFIG_X86_LOCAL_APIC */
+
+#ifdef CONFIG_X86_IO_APIC
+struct device_node;
+struct irq_data;
 
 enum ioapic_domain_type {
IOAPIC_DOMAIN_INVALID,
@@ -10,9 +29,6 @@ enum ioapic_domain_type {
IOAPIC_DOMAIN_DYNAMIC,
 };
 
-struct device_node;
-struct irq_data;
-
 struct ioapic_domain_cfg {
enum ioapic_domain_type type;
const struct irq_domain_ops *ops;
@@ -30,5 +46,18 @@ extern void mp_irqdomain_activate(struct irq_domain *domain,
 extern void mp_irqdomain_deactivate(struct irq_domain *domain,
struct irq_data *irq_data);
 extern int mp_irqdomain_ioapic_idx(struct irq_domain *domain);
+#endif /* CONFIG_X86_IO_APIC */
+
+#ifdef CONFIG_PCI_MSI
+extern void arch_init_msi_domain(struct irq_domain *domain);
+#else
+static inline void arch_init_msi_domain(struct irq_domain *domain) { }
+#endif
+
+#ifdef CONFIG_HT_IRQ
+extern void arch_init_htirq_domain(struct irq_domain *domain);
+#else
+static inline void arch_init_htirq_domain(struct irq_domain *domain) { }
+#endif
 
 #endif
diff --git a/arch/x86/kernel/apic/htirq.c b/arch/x86/kernel/apic/htirq.c
index 4ba6b3ae7a95..94bca925bf7d 100644
--- a/arch/x86/kernel/apic/htirq.c
+++ b/arch/x86/kernel/apic/htirq.c
@@ -16,7 +16,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index da163da5fdee..4ca80bbcd632 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -16,7 +16,7 @@
 #include 
 #include 
 #include 
-#include 
+#in

[Patch Part2 v5 17/33] x86/irq: Kill struct io_apic_irq_attr

2015-04-13 Thread Jiang Liu
Now there's no user of struct io_apic_irq_attr anymore, so kill it.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Grant Likely 
Link: 
http://lkml.kernel.org/r/1416901802-24211-27-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/io_apic.h |7 ---
 arch/x86/kernel/apic/io_apic.c |   10 --
 2 files changed, 17 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index fa4b25ebd658..4eb4bcc5f219 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -157,13 +157,6 @@ extern int restore_ioapic_entries(void);
 extern void setup_ioapic_ids_from_mpc(void);
 extern void setup_ioapic_ids_from_mpc_nocheck(void);
 
-struct io_apic_irq_attr {
-   int ioapic;
-   int ioapic_pin;
-   int trigger;
-   int polarity;
-};
-
 enum ioapic_domain_type {
IOAPIC_DOMAIN_INVALID,
IOAPIC_DOMAIN_LEGACY,
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a1abdcf2cb5f..76dc9f5bfdbc 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2959,16 +2959,6 @@ int mp_ioapic_registered(u32 gsi_base)
return 0;
 }
 
-static inline void set_io_apic_irq_attr(struct io_apic_irq_attr *irq_attr,
-   int ioapic, int ioapic_pin,
-   int trigger, int polarity)
-{
-   irq_attr->ioapic= ioapic;
-   irq_attr->ioapic_pin= ioapic_pin;
-   irq_attr->trigger   = trigger;
-   irq_attr->polarity  = polarity;
-}
-
 static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
  struct irq_alloc_info *info)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 33/33] x86/irq: Avoid memory allocation in __assign_irq_vector()

2015-04-13 Thread Jiang Liu
Function __assign_irq_vector() is protected by vector_lock, so use
a global temporary cpu_mask to avoid allocating/freeing cpu_mask.

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f5eb3999383f..4a04b25cdcdf 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,6 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
+static cpumask_var_t vector_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -116,14 +117,10 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
-   cpumask_var_t tmp_mask;
 
if (d->move_in_progress)
return -EBUSY;
 
-   if (!alloc_cpumask_var(&tmp_mask, GFP_ATOMIC))
-   return -ENOMEM;
-
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
@@ -131,21 +128,22 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
 
-   apic->vector_allocation_domain(cpu, tmp_mask, mask);
+   apic->vector_allocation_domain(cpu, vector_cpumask, mask);
 
-   if (cpumask_subset(tmp_mask, d->domain)) {
+   if (cpumask_subset(vector_cpumask, d->domain)) {
err = 0;
-   if (cpumask_equal(tmp_mask, d->domain))
+   if (cpumask_equal(vector_cpumask, d->domain))
break;
/*
 * New cpumask using the vector is a proper subset of
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
-   cpumask_andnot(d->old_domain, d->domain, tmp_mask);
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, tmp_mask);
+   cpumask_and(d->domain, d->domain, vector_cpumask);
break;
}
 
@@ -159,16 +157,18 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain, tmp_mask);
-   cpumask_andnot(tmp_mask, mask, d->old_domain);
-   cpu = cpumask_first_and(tmp_mask, cpu_online_mask);
+   cpumask_or(d->old_domain, d->old_domain,
+  vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpu = cpumask_first_and(vector_cpumask,
+   cpu_online_mask);
continue;
}
 
if (test_bit(vector, used_vectors))
goto next;
 
-   for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) {
+   for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (per_cpu(vector_irq, new_cpu)[vector] >
VECTOR_UNDEFINED)
goto next;
@@ -181,14 +181,13 @@ next:
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
}
-   for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
+   for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq;
d->cfg.vector = vector;
-   cpumask_copy(d->domain, tmp_mask);
+   cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
}
-   free_cpumask_var(tmp_mask);
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
@@ -397,6 +396,8 @@ int __init arch_early_irq_init(void)
arch_init_msi_domain(x86_vector_domain);
arch_init_htirq_domain(x86_vector_domain);
 
+   BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
+
return arch_early_ioapic_init();
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

[Patch Part2 v5 10/33] irq_remapping/vt-d: Clean up unsued code

2015-04-13 Thread Jiang Liu
Now we have converted to hierarchy irqdomain, so clean up unused code.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Greg Kroah-Hartman 
Cc: io...@lists.linux-foundation.org
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Link: 
http://lkml.kernel.org/r/1416901802-24211-20-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 drivers/iommu/intel_irq_remapping.c |  187 +--
 1 file changed, 1 insertion(+), 186 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 05941118a179..e76e5723ae87 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -63,35 +63,6 @@ static struct irq_domain_ops intel_ir_domain_ops;
 
 static int __init parse_ioapics_under_ir(void);
 
-static struct irq_2_iommu *irq_2_iommu(unsigned int irq)
-{
-   struct irq_cfg *cfg = irq_cfg(irq);
-   return cfg ? &cfg->irq_2_iommu : NULL;
-}
-
-static int get_irte(int irq, struct irte *entry)
-{
-   struct irq_2_iommu *irq_iommu = irq_2_iommu(irq);
-   unsigned long flags;
-   int index;
-
-   if (!entry || !irq_iommu)
-   return -1;
-
-   raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
-
-   if (unlikely(!irq_iommu->iommu)) {
-   raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
-   return -1;
-   }
-
-   index = irq_iommu->irte_index + irq_iommu->sub_handle;
-   *entry = *(irq_iommu->iommu->ir_table->base + index);
-
-   raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
-   return 0;
-}
-
 static int alloc_irte(struct intel_iommu *iommu, int irq,
  struct irq_2_iommu *irq_iommu, u16 count)
 {
@@ -229,29 +200,6 @@ static int clear_entries(struct irq_2_iommu *irq_iommu)
return qi_flush_iec(iommu, index, irq_iommu->irte_mask);
 }
 
-static int free_irte(int irq)
-{
-   struct irq_2_iommu *irq_iommu = irq_2_iommu(irq);
-   unsigned long flags;
-   int rc;
-
-   if (!irq_iommu || irq_iommu->iommu == NULL)
-   return -1;
-
-   raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
-
-   rc = clear_entries(irq_iommu);
-
-   irq_iommu->iommu = NULL;
-   irq_iommu->irte_index = 0;
-   irq_iommu->sub_handle = 0;
-   irq_iommu->irte_mask = 0;
-
-   raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
-
-   return rc;
-}
-
 /*
  * source validation type
  */
@@ -932,8 +880,7 @@ error:
return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector,
-unsigned int dest)
+static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 {
memset(irte, 0, sizeof(*irte));
 
@@ -953,135 +900,6 @@ static void prepare_irte(struct irte *irte, int vector,
irte->redir_hint = 1;
 }
 
-static int intel_setup_ioapic_entry(int irq,
-   struct IO_APIC_route_entry *route_entry,
-   unsigned int destination, int vector,
-   struct io_apic_irq_attr *attr)
-{
-   int ioapic_id = mpc_ioapic_id(attr->ioapic);
-   struct intel_iommu *iommu;
-   struct IR_IO_APIC_route_entry *entry;
-   struct irte irte;
-   int index;
-
-   down_read(&dmar_global_lock);
-   iommu = map_ioapic_to_ir(ioapic_id);
-   if (!iommu) {
-   pr_warn("No mapping iommu for ioapic %d\n", ioapic_id);
-   index = -ENODEV;
-   } else {
-   index = alloc_irte(iommu, irq, irq_2_iommu(irq), 1);
-   if (index < 0) {
-   pr_warn("Failed to allocate IRTE for ioapic %d\n",
-   ioapic_id);
-   index = -ENOMEM;
-   }
-   }
-   up_read(&dmar_global_lock);
-   if (index < 0)
-   return index;
-
-   prepare_irte(&irte, vector, destination);
-
-   /* Set source-id of interrupt request */
-   set_ioapic_sid(&irte, ioapic_id);
-
-   modify_irte(irq_2_iommu(irq), &irte);
-
-   apic_printk(APIC_VERBOSE, KERN_DEBUG "IOAPIC[%d]: "
-   "Set IRTE entry (P:%d FPD:%d Dst_Mode:%d "
-   "Redir_hint:%d Trig_Mode:%d Dlvry_Mode:%X "
-   "Avail:%X Vector:%02X Dest:%08X "
-   "SID:%04X SQ:%X SVT:%X)\n",
-   attr->ioapic, irte.present, irte.fpd, irte.dst_mode,
-   irte.redir_hint, irte.trigger_mode, irte.dlvry_mode,
-   irte.avail, irte.vector, irte.dest_id,
-   irte.sid, irte.sq, irte.svt);
-
-   entry = (struct IR_IO_APIC_route_entry *)route_entry;
-   memset(entry, 0, sizeof(*entry));
-
-   entry->index2   = (index >> 15) & 0x1;
-   entry->zero = 0;
-   entry->format   = 1;
-   entry->index= (inde

[Patch Part2 v5 29/33] x86, ioapic: Use proper defines for the entry fields

2015-04-13 Thread Jiang Liu
From: Thomas Gleixner 

While looking at the printout issue, I stumbled more than once over
the various 0/1 assignments which are either commented in strange ways
or force to lookup the meaning.

Use proper constants and fix the misleading comments. While at it
remove pointless 0 assignments in native_disable_io_apic() which have
no value for understanding the code.

Signed-off-by: Thomas Gleixner 
Cc: Jiang Liu 
Cc: x...@kernel.org
Signed-off-by: Jiang Liu 
---
 arch/x86/include/asm/io_apic.h |   16 +--
 arch/x86/kernel/apic/io_apic.c |  100 
 2 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index cca97c961641..53a70a30b674 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -98,9 +98,19 @@ struct IR_IO_APIC_route_entry {
 struct irq_alloc_info;
 struct irq_data;
 
-#define IOAPIC_AUTO -1
-#define IOAPIC_EDGE 0
-#define IOAPIC_LEVEL1
+#define IOAPIC_AUTO-1
+#define IOAPIC_EDGE0
+#define IOAPIC_LEVEL   1
+
+#define IOAPIC_MASKED  1
+#define IOAPIC_UNMASKED0
+
+#define IOAPIC_POL_HIGH0
+#define IOAPIC_POL_LOW 1
+
+#define IOAPIC_DEST_MODE_PHYSICAL  0
+#define IOAPIC_DEST_MODE_LOGICAL   1
+
 #defineIOAPIC_MAP_ALLOC0x1
 #defineIOAPIC_MAP_CHECK0x2
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 71971b89da59..cf42a6adf9c0 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -356,7 +356,7 @@ static void ioapic_write_entry(int apic, int pin, struct 
IO_APIC_route_entry e)
 static void ioapic_mask_entry(int apic, int pin)
 {
unsigned long flags;
-   union entry_union eu = { .entry.mask = 1 };
+   union entry_union eu = { .entry.mask = IOAPIC_MASKED };
 
raw_spin_lock_irqsave(&ioapic_lock, flags);
io_apic_write(apic, 0x10 + 2*pin, eu.w1);
@@ -517,7 +517,7 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector)
/*
 * Mask the entry and change the trigger mode to edge.
 */
-   entry1.mask = 1;
+   entry1.mask = IOAPIC_MASKED;
entry1.trigger = IOAPIC_EDGE;
 
__ioapic_write_entry(apic, pin, entry1);
@@ -553,8 +553,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned 
int pin)
 * Make sure the entry is masked and re-read the contents to check
 * if it is a level triggered pin and if the remote-IRR is set.
 */
-   if (!entry.mask) {
-   entry.mask = 1;
+   if (entry.mask == IOAPIC_UNMASKED) {
+   entry.mask = IOAPIC_MASKED;
ioapic_write_entry(apic, pin, entry);
entry = ioapic_read_entry(apic, pin);
}
@@ -567,7 +567,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned 
int pin)
 * doesn't clear the remote-IRR if the trigger mode is not
 * set to level.
 */
-   if (!entry.trigger) {
+   if (entry.trigger == IOAPIC_EDGE) {
entry.trigger = IOAPIC_LEVEL;
ioapic_write_entry(apic, pin, entry);
}
@@ -670,8 +670,8 @@ void mask_ioapic_entries(void)
struct IO_APIC_route_entry entry;
 
entry = ioapics[apic].saved_registers[pin];
-   if (!entry.mask) {
-   entry.mask = 1;
+   if (entry.mask == IOAPIC_UNMASKED) {
+   entry.mask = IOAPIC_MASKED;
ioapic_write_entry(apic, pin, entry);
}
}
@@ -773,11 +773,11 @@ static int EISA_ELCR(unsigned int irq)
 
 #endif
 
-/* ISA interrupts are always polarity zero edge triggered,
+/* ISA interrupts are always active high edge triggered,
  * when listed as conforming in the MP table. */
 
-#define default_ISA_trigger(idx)   (0)
-#define default_ISA_polarity(idx)  (0)
+#define default_ISA_trigger(idx)   (IOAPIC_EDGE)
+#define default_ISA_polarity(idx)  (IOAPIC_POL_HIGH)
 
 /* EISA interrupts are always polarity zero and can be edge or level
  * trigger depending on the ELCR value.  If an interrupt is listed as
@@ -787,11 +787,11 @@ static int EISA_ELCR(unsigned int irq)
 #define default_EISA_trigger(idx)  (EISA_ELCR(mp_irqs[idx].srcbusirq))
 #define default_EISA_polarity(idx) default_ISA_polarity(idx)
 
-/* PCI interrupts are always polarity one level triggered,
+/* PCI interrupts are always active low level triggered,
  * when listed as conforming in the MP table. */
 
-#define default_PCI_trigger(idx)   (1)
-#define default_PCI_polarity(idx)  (1)
+

[Patch Part2 v5 24/33] x86/irq: Kill function apic_set_affinity()

2015-04-13 Thread Jiang Liu
Now there's no user of apic_set_affinity(), so kill it.
Also rename vector_set_affinity() to apic_set_affinity() for consistency.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Link: 
http://lkml.kernel.org/r/1416901802-24211-33-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/hw_irq.h |2 --
 arch/x86/kernel/apic/vector.c |   40 +++-
 2 files changed, 3 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index ea9aebc65ff6..330445236484 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -198,8 +198,6 @@ static inline void irq_complete_move(struct irq_cfg *c) { }
 #endif
 
 extern void apic_ack_edge(struct irq_data *data);
-extern int apic_set_affinity(struct irq_data *data, const struct cpumask *mask,
-unsigned int *dest_id);
 #else  /*  CONFIG_X86_LOCAL_APIC */
 static inline void lock_vector_lock(void) {}
 static inline void unlock_vector_lock(void) {}
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index fba4958b6139..436a3400d9ac 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -463,42 +463,8 @@ void apic_ack_edge(struct irq_data *data)
ack_APIC_irq();
 }
 
-/*
- * Either sets data->affinity to a valid value, and returns
- * ->cpu_mask_to_apicid of that in dest_id, or returns -1 and
- * leaves data->affinity untouched.
- */
-int apic_set_affinity(struct irq_data *data, const struct cpumask *mask,
- unsigned int *dest_id)
-{
-   struct irq_cfg *cfg = irqd_cfg(data);
-   unsigned int irq = data->irq;
-   int err;
-
-   if (!config_enabled(CONFIG_SMP))
-   return -EPERM;
-
-   if (!cpumask_intersects(mask, cpu_online_mask))
-   return -EINVAL;
-
-   err = assign_irq_vector(irq, cfg, mask);
-   if (err)
-   return err;
-
-   err = apic->cpu_mask_to_apicid_and(mask, cfg->domain, dest_id);
-   if (err) {
-   if (assign_irq_vector(irq, cfg, data->affinity))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
-
-   cpumask_copy(data->affinity, mask);
-
-   return 0;
-}
-
-static int vector_set_affinity(struct irq_data *irq_data,
-  const struct cpumask *dest, bool force)
+static int apic_set_affinity(struct irq_data *irq_data,
+const struct cpumask *dest, bool force)
 {
int err;
int irq = irq_data->irq;
@@ -524,7 +490,7 @@ static int vector_set_affinity(struct irq_data *irq_data,
 
 static struct irq_chip lapic_controller = {
.irq_ack= apic_ack_edge,
-   .irq_set_affinity   = vector_set_affinity,
+   .irq_set_affinity   = apic_set_affinity,
.irq_retrigger  = apic_retrigger_irq,
 };
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 23/33] x86/irq: Change functions only used in vector.c as static

2015-04-13 Thread Jiang Liu
Function {assign|clear}_irq_vector() and apic_retrigger_irq() are only
used in file vector.c, so change them as static.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Link: 
http://lkml.kernel.org/r/1416901802-24211-32-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/hw_irq.h |3 ---
 arch/x86/kernel/apic/vector.c |7 ---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a88f5b325bf2..ea9aebc65ff6 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -188,8 +188,6 @@ extern struct irq_cfg *irq_cfg(unsigned int irq);
 extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data);
 extern void lock_vector_lock(void);
 extern void unlock_vector_lock(void);
-extern int assign_irq_vector(int, struct irq_cfg *, const struct cpumask *);
-extern void clear_irq_vector(int irq, struct irq_cfg *cfg);
 extern void setup_vector_irq(int cpu);
 #ifdef CONFIG_SMP
 extern void send_cleanup_vector(struct irq_cfg *);
@@ -199,7 +197,6 @@ static inline void send_cleanup_vector(struct irq_cfg *c) { 
}
 static inline void irq_complete_move(struct irq_cfg *c) { }
 #endif
 
-extern int apic_retrigger_irq(struct irq_data *data);
 extern void apic_ack_edge(struct irq_data *data);
 extern int apic_set_affinity(struct irq_data *data, const struct cpumask *mask,
 unsigned int *dest_id);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 8467ca47bd4a..fba4958b6139 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -185,7 +185,8 @@ next:
return err;
 }
 
-int assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
+static int assign_irq_vector(int irq, struct irq_cfg *cfg,
+const struct cpumask *mask)
 {
int err;
unsigned long flags;
@@ -196,7 +197,7 @@ int assign_irq_vector(int irq, struct irq_cfg *cfg, const 
struct cpumask *mask)
return err;
 }
 
-void clear_irq_vector(int irq, struct irq_cfg *cfg)
+static void clear_irq_vector(int irq, struct irq_cfg *cfg)
 {
int cpu, vector;
unsigned long flags;
@@ -441,7 +442,7 @@ void setup_vector_irq(int cpu)
__setup_vector_irq(cpu);
 }
 
-int apic_retrigger_irq(struct irq_data *data)
+static int apic_retrigger_irq(struct irq_data *data)
 {
struct irq_cfg *cfg = irqd_cfg(data);
unsigned long flags;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 25/33] x86/irq: Move check of cfg->move_in_progress into send_cleanup_vector()

2015-04-13 Thread Jiang Liu
Move check of cfg->move_in_progress into send_cleanup_vector() to
prepare for simplifying struct irq_cfg.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Greg Kroah-Hartman 
Cc: io...@lists.linux-foundation.org
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Link: 
http://lkml.kernel.org/r/1416901802-24211-34-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/kernel/apic/vector.c   |   10 --
 arch/x86/platform/uv/uv_irq.c   |3 +--
 drivers/iommu/amd_iommu.c   |3 +--
 drivers/iommu/intel_irq_remapping.c |3 +--
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 436a3400d9ac..a5ce2eef0528 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -495,7 +495,7 @@ static struct irq_chip lapic_controller = {
 };
 
 #ifdef CONFIG_SMP
-void send_cleanup_vector(struct irq_cfg *cfg)
+static void __send_cleanup_vector(struct irq_cfg *cfg)
 {
cpumask_var_t cleanup_mask;
 
@@ -513,6 +513,12 @@ void send_cleanup_vector(struct irq_cfg *cfg)
cfg->move_in_progress = 0;
 }
 
+void send_cleanup_vector(struct irq_cfg *cfg)
+{
+   if (cfg->move_in_progress)
+   __send_cleanup_vector(cfg);
+}
+
 asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
 {
unsigned vector, me;
@@ -583,7 +589,7 @@ static void __irq_complete_move(struct irq_cfg *cfg, 
unsigned vector)
me = smp_processor_id();
 
if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
-   send_cleanup_vector(cfg);
+   __send_cleanup_vector(cfg);
 }
 
 void irq_complete_move(struct irq_cfg *cfg)
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index 54af6e388a12..091b36ac44c4 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -63,8 +63,7 @@ uv_set_irq_affinity(struct irq_data *data, const struct 
cpumask *mask,
ret = parent->chip->irq_set_affinity(parent, mask, force);
if (ret >= 0) {
uv_program_mmr(cfg, data->chip_data);
-   if (cfg->move_in_progress)
-   send_cleanup_vector(cfg);
+   send_cleanup_vector(cfg);
}
 
return ret;
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a5a17f50691c..a8901fbaee0e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4333,8 +4333,7 @@ static int amd_ir_set_affinity(struct irq_data *data,
 * at the new destination. So, time to cleanup the previous
 * vector allocation.
 */
-   if (cfg->move_in_progress)
-   send_cleanup_vector(cfg);
+   send_cleanup_vector(cfg);
 
return IRQ_SET_MASK_OK_DONE;
 }
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 8a71ef6af93c..55e72ce52fc4 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1003,8 +1003,7 @@ intel_ir_set_affinity(struct irq_data *data, const struct 
cpumask *mask,
 * at the new destination. So, time to cleanup the previous
 * vector allocation.
 */
-   if (cfg->move_in_progress)
-   send_cleanup_vector(cfg);
+   send_cleanup_vector(cfg);
 
return IRQ_SET_MASK_OK_DONE;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 26/33] x86/irq: Move private data in struct irq_cfg into dedicated data structure

2015-04-13 Thread Jiang Liu
Several fields in struct irq_cfg are private to vector.c, so move it
into dedicated data structure. This helps to hide implementation
details.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Link: 
http://lkml.kernel.org/r/1416901802-24211-35-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/hw_irq.h |3 -
 arch/x86/kernel/apic/vector.c |  221 ++---
 2 files changed, 119 insertions(+), 105 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 330445236484..e7ae6eb84934 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -171,11 +171,8 @@ enum {
 };
 
 struct irq_cfg {
-   cpumask_var_t   domain;
-   cpumask_var_t   old_domain;
unsigned intdest_apicid;
u8  vector;
-   u8  move_in_progress : 1;
 };
 
 extern struct irq_domain *x86_vector_domain;
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a5ce2eef0528..0e7c39beefed 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -21,11 +21,18 @@
 #include 
 #include 
 
+struct apic_chip_data {
+   struct irq_cfg  cfg;
+   cpumask_var_t   domain;
+   cpumask_var_t   old_domain;
+   u8  move_in_progress : 1;
+};
+
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
-static struct irq_cfg *legacy_irq_cfgs[NR_IRQS_LEGACY];
+static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 #endif
 
 void lock_vector_lock(void)
@@ -41,12 +48,7 @@ void unlock_vector_lock(void)
raw_spin_unlock(&vector_lock);
 }
 
-struct irq_cfg *irq_cfg(unsigned int irq)
-{
-   return irqd_cfg(irq_get_irq_data(irq));
-}
-
-struct irq_cfg *irqd_cfg(struct irq_data *irq_data)
+static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data)
 {
if (!irq_data)
return NULL;
@@ -57,36 +59,48 @@ struct irq_cfg *irqd_cfg(struct irq_data *irq_data)
return irq_data->chip_data;
 }
 
-static struct irq_cfg *alloc_irq_cfg(int node)
+struct irq_cfg *irqd_cfg(struct irq_data *irq_data)
+{
+   struct apic_chip_data *data = apic_chip_data(irq_data);
+
+   return data ? &data->cfg : NULL;
+}
+
+struct irq_cfg *irq_cfg(unsigned int irq)
 {
-   struct irq_cfg *cfg;
+   return irqd_cfg(irq_get_irq_data(irq));
+}
 
-   cfg = kzalloc_node(sizeof(*cfg), GFP_KERNEL, node);
-   if (!cfg)
+static struct apic_chip_data *alloc_apic_chip_data(int node)
+{
+   struct apic_chip_data *data;
+
+   data = kzalloc_node(sizeof(*data), GFP_KERNEL, node);
+   if (!data)
return NULL;
-   if (!zalloc_cpumask_var_node(&cfg->domain, GFP_KERNEL, node))
-   goto out_cfg;
-   if (!zalloc_cpumask_var_node(&cfg->old_domain, GFP_KERNEL, node))
+   if (!zalloc_cpumask_var_node(&data->domain, GFP_KERNEL, node))
+   goto out_data;
+   if (!zalloc_cpumask_var_node(&data->old_domain, GFP_KERNEL, node))
goto out_domain;
-   return cfg;
+   return data;
 out_domain:
-   free_cpumask_var(cfg->domain);
-out_cfg:
-   kfree(cfg);
+   free_cpumask_var(data->domain);
+out_data:
+   kfree(data);
return NULL;
 }
 
-static void free_irq_cfg(struct irq_cfg *cfg)
+static void free_apic_chip_data(struct apic_chip_data *data)
 {
-   if (cfg) {
-   free_cpumask_var(cfg->domain);
-   free_cpumask_var(cfg->old_domain);
-   kfree(cfg);
+   if (data) {
+   free_cpumask_var(data->domain);
+   free_cpumask_var(data->old_domain);
+   kfree(data);
}
 }
 
-static int
-__assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
+static int __assign_irq_vector(int irq, struct apic_chip_data *d,
+  const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -104,7 +118,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const 
struct cpumask *mask)
int cpu, err;
cpumask_var_t tmp_mask;
 
-   if (cfg->move_in_progress)
+   if (d->move_in_progress)
return -EBUSY;
 
if (!alloc_cpumask_var(&tmp_mask, GFP_ATOMIC))
@@ -112,26 +126,26 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const 
struct cpumask *mask)
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
-   cpumask_clear(cfg->old_domain);
+   cpumask_clear(d->old_domain);
cp

[Patch Part2 v5 30/33] x86,ioapic: Cleanup irq_trigger/polarity()

2015-04-13 Thread Jiang Liu
From: Thomas Gleixner 

These functions are full of pointless indentations, useless comments
and even more useless printks.

Clean them up.

Signed-off-by: Thomas Gleixner 
Cc: Jiang Liu 
Cc: x...@kernel.org
Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/io_apic.c |  138 +++-
 1 file changed, 50 insertions(+), 88 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index cf42a6adf9c0..ce7063087f74 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -796,45 +796,47 @@ static int EISA_ELCR(unsigned int irq)
 static int irq_polarity(int idx)
 {
int bus = mp_irqs[idx].srcbus;
-   int polarity;
 
/*
 * Determine IRQ line polarity (high active or low active):
 */
-   switch (mp_irqs[idx].irqflag & 3)
-   {
-   case 0: /* conforms, ie. bus-type dependent polarity */
-   if (test_bit(bus, mp_bus_not_pci))
-   polarity = default_ISA_polarity(idx);
-   else
-   polarity = default_PCI_polarity(idx);
-   break;
-   case 1: /* high active */
-   {
-   polarity = IOAPIC_POL_HIGH;
-   break;
-   }
-   case 2: /* reserved */
-   {
-   pr_warn("broken BIOS!!\n");
-   polarity = IOAPIC_POL_LOW;
-   break;
-   }
-   case 3: /* low active */
-   {
-   polarity = IOAPIC_POL_LOW;
-   break;
-   }
-   default: /* invalid */
-   {
-   pr_warn("broken BIOS!!\n");
-   polarity = IOAPIC_POL_LOW;
-   break;
-   }
+   switch (mp_irqs[idx].irqflag & 0x03) {
+   case 0:
+   /* conforms to spec, ie. bus-type dependent polarity */
+   if (test_bit(bus, mp_bus_not_pci))
+   return default_ISA_polarity(idx);
+   else
+   return default_PCI_polarity(idx);
+   case 1:
+   return IOAPIC_POL_HIGH;
+   case 2:
+   pr_warn("IOAPIC: Invalid polarity: 2, defaulting to low\n");
+   case 3:
+   default: /* Pointless default required due to do gcc stupidity */
+   return IOAPIC_POL_LOW;
}
-   return polarity;
 }
 
+#ifdef CONFIG_EISA
+static int eisa_irq_trigger(int idx, int bus, int trigger)
+{
+   switch (mp_bus_id_to_type[bus]) {
+   case MP_BUS_PCI:
+   case MP_BUS_ISA:
+   return trigger;
+   case MP_BUS_EISA:
+   return default_EISA_trigger(idx);
+   }
+   pr_warn("IOAPIC: Invalid srcbus: %d defaulting to level\n", bus);
+   return IOAPIC_LEVEL;
+}
+#else
+static inline int eisa_irq_trigger(int idx, int bus, int trigger)
+{
+   return trigger;
+}
+#endif
+
 static int irq_trigger(int idx)
 {
int bus = mp_irqs[idx].srcbus;
@@ -843,63 +845,23 @@ static int irq_trigger(int idx)
/*
 * Determine IRQ trigger mode (edge or level sensitive):
 */
-   switch ((mp_irqs[idx].irqflag>>2) & 3)
-   {
-   case 0: /* conforms, ie. bus-type dependent */
-   if (test_bit(bus, mp_bus_not_pci))
-   trigger = default_ISA_trigger(idx);
-   else
-   trigger = default_PCI_trigger(idx);
-#ifdef CONFIG_EISA
-   switch (mp_bus_id_to_type[bus]) {
-   case MP_BUS_ISA: /* ISA pin */
-   {
-   /* set before the switch */
-   break;
-   }
-   case MP_BUS_EISA: /* EISA pin */
-   {
-   trigger = default_EISA_trigger(idx);
-   break;
-   }
-   case MP_BUS_PCI: /* PCI pin */
-   {
-   /* set before the switch */
-   break;
-   }
-   default:
-   {
-   pr_warn("broken BIOS!!\n");
-   trigger = IOAPIC_LEVEL;
-   break;
-   }
-   }
-#endif
-   break;
-   case 1: /* edge */
-   {
-   trigger = IOAPIC_EDGE;
-   break;
-   }
-   case 

[Patch Part2 v5 22/33] x86/irq: Kill unused alloc_irq_and_cfg_at()

2015-04-13 Thread Jiang Liu
There's no caller of alloc_irq_and_cfg_at() anymore, so kill it.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Link: 
http://lkml.kernel.org/r/1416901802-24211-31-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/hw_irq.h |1 -
 arch/x86/kernel/apic/vector.c |   21 -
 2 files changed, 22 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 1ce5f8164c64..a88f5b325bf2 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -186,7 +186,6 @@ extern void copy_irq_alloc_info(struct irq_alloc_info *dst,
struct irq_alloc_info *src);
 extern struct irq_cfg *irq_cfg(unsigned int irq);
 extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data);
-extern struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node);
 extern void lock_vector_lock(void);
 extern void unlock_vector_lock(void);
 extern int assign_irq_vector(int, struct irq_cfg *, const struct cpumask *);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 91a89500f88f..8467ca47bd4a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -76,27 +76,6 @@ out_cfg:
return NULL;
 }
 
-struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node)
-{
-   int res = irq_alloc_desc_at(at, node);
-   struct irq_cfg *cfg;
-
-   if (res < 0) {
-   if (res != -EEXIST)
-   return NULL;
-   cfg = irq_cfg(at);
-   if (cfg)
-   return cfg;
-   }
-
-   cfg = alloc_irq_cfg(node);
-   if (cfg)
-   irq_set_chip_data(at, cfg);
-   else
-   irq_free_desc(at);
-   return cfg;
-}
-
 static void free_irq_cfg(struct irq_cfg *cfg)
 {
if (cfg) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 28/33] x86/irq, ACPI: Kill private function mp_register_gsi()/ mp_unregister_gsi()

2015-04-13 Thread Jiang Liu
Function mp_register_gsi() is only called once, so fold it into caller
acpi_register_gsi_ioapic(). Do the same for mp_unregister_gsi().

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Len Brown 
Cc: Pavel Machek 
Link: 
http://lkml.kernel.org/r/1416901802-24211-37-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/kernel/acpi/boot.c |   57 ++-
 1 file changed, 18 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 21e460b3b360..91a10120ed10 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -400,42 +400,6 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, 
int trigger,
return 0;
 }
 
-static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
-  int polarity)
-{
-   int irq, node;
-   struct irq_alloc_info info;
-
-   if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
-   return gsi;
-
-   trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
-   polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
-   node = dev ? dev_to_node(dev) : NUMA_NO_NODE;
-   ioapic_set_alloc_attr(&info, node, trigger, polarity);
-   irq = mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC, &info);
-   if (irq < 0)
-   return irq;
-
-   /* Don't set up the ACPI SCI because it's already set up */
-   if (enable_update_mptable && acpi_gbl_FADT.sci_interrupt != gsi)
-   mp_config_acpi_gsi(dev, gsi, trigger, polarity);
-
-   return irq;
-}
-
-static void mp_unregister_gsi(u32 gsi)
-{
-   int irq;
-
-   if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
-   return;
-
-   irq = mp_map_gsi_to_irq(gsi, 0, NULL);
-   if (irq > 0)
-   mp_unmap_irq(irq);
-}
-
 static struct irq_domain_ops acpi_irqdomain_ops = {
.alloc = mp_irqdomain_alloc,
.free = mp_irqdomain_free,
@@ -662,10 +626,21 @@ static int acpi_register_gsi_ioapic(struct device *dev, 
u32 gsi,
int trigger, int polarity)
 {
int irq = gsi;
-
 #ifdef CONFIG_X86_IO_APIC
+   int node;
+   struct irq_alloc_info info;
+
+   node = dev ? dev_to_node(dev) : NUMA_NO_NODE;
+   trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
+   polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
+   ioapic_set_alloc_attr(&info, node, trigger, polarity);
+
mutex_lock(&acpi_ioapic_lock);
-   irq = mp_register_gsi(dev, gsi, trigger, polarity);
+   irq = mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC, &info);
+   /* Don't set up the ACPI SCI because it's already set up */
+   if (irq >= 0 && enable_update_mptable &&
+   acpi_gbl_FADT.sci_interrupt != gsi)
+   mp_config_acpi_gsi(dev, gsi, trigger, polarity);
mutex_unlock(&acpi_ioapic_lock);
 #endif
 
@@ -675,8 +650,12 @@ static int acpi_register_gsi_ioapic(struct device *dev, 
u32 gsi,
 static void acpi_unregister_gsi_ioapic(u32 gsi)
 {
 #ifdef CONFIG_X86_IO_APIC
+   int irq;
+
mutex_lock(&acpi_ioapic_lock);
-   mp_unregister_gsi(gsi);
+   irq = mp_map_gsi_to_irq(gsi, 0, NULL);
+   if (irq > 0)
+   mp_unmap_irq(irq);
mutex_unlock(&acpi_ioapic_lock);
 #endif
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 31/33] x86: Cleanup irq_domain ops

2015-04-13 Thread Jiang Liu
From: Thomas Gleixner 

We have 3 identical copies of the ioapic domain ops for acpi, mpparse,
and sfi. Have a global one in the io_apic code and be done with it.

To avoid include hell in io_apic.h, create a private irqdomain header
and include the generic irqdomain header from there.

Signed-off-by: Thomas Gleixner 
Cc: Jiang Liu 
Cc: x...@kernel.org
Signed-off-by: Jiang Liu 
---
 arch/x86/include/asm/io_apic.h   |   29 ++---
 arch/x86/include/asm/irqdomain.h |   34 ++
 arch/x86/kernel/acpi/boot.c  |   13 +++--
 arch/x86/kernel/apic/io_apic.c   |9 -
 arch/x86/kernel/devicetree.c |   12 ++--
 arch/x86/kernel/mpparse.c|9 +
 arch/x86/platform/sfi/sfi.c  |   10 ++
 7 files changed, 56 insertions(+), 60 deletions(-)
 create mode 100644 arch/x86/include/asm/irqdomain.h

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 53a70a30b674..144c4d37ae86 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -96,7 +96,7 @@ struct IR_IO_APIC_route_entry {
 } __attribute__ ((packed));
 
 struct irq_alloc_info;
-struct irq_data;
+struct ioapic_domain_cfg;
 
 #define IOAPIC_AUTO-1
 #define IOAPIC_EDGE0
@@ -163,23 +163,6 @@ extern int restore_ioapic_entries(void);
 extern void setup_ioapic_ids_from_mpc(void);
 extern void setup_ioapic_ids_from_mpc_nocheck(void);
 
-enum ioapic_domain_type {
-   IOAPIC_DOMAIN_INVALID,
-   IOAPIC_DOMAIN_LEGACY,
-   IOAPIC_DOMAIN_STRICT,
-   IOAPIC_DOMAIN_DYNAMIC,
-};
-
-struct device_node;
-struct irq_domain;
-struct irq_domain_ops;
-
-struct ioapic_domain_cfg {
-   enum ioapic_domain_type type;
-   const struct irq_domain_ops *ops;
-   struct device_node  *dev;
-};
-
 extern int mp_find_ioapic(u32 gsi);
 extern int mp_find_ioapic_pin(int ioapic, u32 gsi);
 extern int mp_map_gsi_to_irq(u32 gsi, unsigned int flags,
@@ -189,15 +172,7 @@ extern int mp_register_ioapic(int id, u32 address, u32 
gsi_base,
  struct ioapic_domain_cfg *cfg);
 extern int mp_unregister_ioapic(u32 gsi_base);
 extern int mp_ioapic_registered(u32 gsi_base);
-extern int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
- unsigned int nr_irqs, void *arg);
-extern void mp_irqdomain_free(struct irq_domain *domain, unsigned int virq,
- unsigned int nr_irqs);
-extern void mp_irqdomain_activate(struct irq_domain *domain,
- struct irq_data *irq_data);
-extern void mp_irqdomain_deactivate(struct irq_domain *domain,
-   struct irq_data *irq_data);
-extern int mp_irqdomain_ioapic_idx(struct irq_domain *domain);
+
 extern void ioapic_set_alloc_attr(struct irq_alloc_info *info,
  int node, int trigger, int polarity);
 
diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
new file mode 100644
index ..fe0d4c6636ec
--- /dev/null
+++ b/arch/x86/include/asm/irqdomain.h
@@ -0,0 +1,34 @@
+#ifndef _ASM_IRQDOMAIN_H
+#define _ASM_IRQDOMAIN_H
+
+#include 
+
+enum ioapic_domain_type {
+   IOAPIC_DOMAIN_INVALID,
+   IOAPIC_DOMAIN_LEGACY,
+   IOAPIC_DOMAIN_STRICT,
+   IOAPIC_DOMAIN_DYNAMIC,
+};
+
+struct device_node;
+struct irq_data;
+
+struct ioapic_domain_cfg {
+   enum ioapic_domain_type type;
+   const struct irq_domain_ops *ops;
+   struct device_node  *dev;
+};
+
+extern const struct irq_domain_ops mp_ioapic_irqdomain_ops;
+
+extern int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs, void *arg);
+extern void mp_irqdomain_free(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs);
+extern void mp_irqdomain_activate(struct irq_domain *domain,
+ struct irq_data *irq_data);
+extern void mp_irqdomain_deactivate(struct irq_domain *domain,
+   struct irq_data *irq_data);
+extern int mp_irqdomain_ioapic_idx(struct irq_domain *domain);
+
+#endif
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 91a10120ed10..cb9f6f12246b 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -31,12 +31,12 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -400,20 +400,13 @@ static int mp_config_acpi_gsi(struct device *dev, u32 
gsi, int trigger,
return 0;
 }
 
-static struct irq_domain_ops acpi_irqdomain_ops = {
-   .alloc = mp_irqdomain_alloc,
-   .free = mp_irqdomain_free,
-   .activate = mp_irqdomain_activate,
-   .deactivate = mp_irqdomain_deactivate,
-};
-
 static int __init

[Patch Part2 v5 27/33] x86/irq: Refine the way to calculate NR_IRQS

2015-04-13 Thread Jiang Liu
Now we have made MSI independent of IOAPIC, so we need to refine the
way to calculate NR_IRQS to support configuration with MSI enabled but
IOAPIC disabled.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Greg Kroah-Hartman 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Jan Beulich 
Link: 
http://lkml.kernel.org/r/1416901802-24211-36-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/irq_vectors.h |   18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 666c89ec4bd7..b26cb124a4f1 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -155,18 +155,22 @@ static inline int invalid_vm86_irq(int irq)
  * static arrays.
  */
 
-#define NR_IRQS_LEGACY   16
+#define NR_IRQS_LEGACY 16
 
-#define IO_APIC_VECTOR_LIMIT   ( 32 * MAX_IO_APICS )
+#define CPU_VECTOR_LIMIT   (64 * NR_CPUS)
+#define IO_APIC_VECTOR_LIMIT   (32 * MAX_IO_APICS)
 
-#ifdef CONFIG_X86_IO_APIC
-# define CPU_VECTOR_LIMIT  (64 * NR_CPUS)
-# define NR_IRQS   \
+#if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_PCI_MSI)
+#define NR_IRQS\
(CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ?  \
(NR_VECTORS + CPU_VECTOR_LIMIT)  :  \
(NR_VECTORS + IO_APIC_VECTOR_LIMIT))
-#else /* !CONFIG_X86_IO_APIC: */
-# define NR_IRQS   NR_IRQS_LEGACY
+#elif defined(CONFIG_X86_IO_APIC)
+#defineNR_IRQS (NR_VECTORS + 
IO_APIC_VECTOR_LIMIT)
+#elif defined(CONFIG_PCI_MSI)
+#define NR_IRQS(NR_VECTORS + CPU_VECTOR_LIMIT)
+#else
+#define NR_IRQSNR_IRQS_LEGACY
 #endif
 
 #endif /* _ASM_X86_IRQ_VECTORS_H */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 21/33] x86/irq: Remove sis apic bug workaround

2015-04-13 Thread Jiang Liu
From: Thomas Gleixner 

The SiS apic bug workaround is now obsolete as we cache the register
values for performance reasons.

Signed-off-by: Thomas Gleixner 
Cc: Jiang Liu 
Signed-off-by: Jiang Liu 
---
 arch/x86/include/asm/io_apic.h |3 ---
 arch/x86/kernel/apic/io_apic.c |   35 ++-
 drivers/pci/quirks.c   |7 ---
 3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 67bdb72caeaf..cca97c961641 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -120,9 +120,6 @@ extern int mp_irq_entries;
 /* MP IRQ source entries */
 extern struct mpc_intsrc mp_irqs[MAX_IRQ_SOURCES];
 
-/* Older SiS APIC requires we rewrite the index register */
-extern int sis_apic_bug;
-
 /* 1 if "noapic" boot option passed */
 extern int skip_ioapic_setup;
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c06f4b531392..71971b89da59 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -18,6 +18,16 @@
  * and Rolf G. Tews
  * for testing these extensively
  * Paul Diefenbaugh:   Added full ACPI support
+ *
+ * Historical information which is worth to be preserved:
+ *
+ * - SiS APIC rmw bug:
+ *
+ * We used to have a workaround for a bug in SiS chips which
+ * required to rewrite the index register for a read-modify-write
+ * operation as the chip lost the index information which was
+ * setup for the read already. We cache the data now, so that
+ * workaround has been removed.
  */
 
 #include 
@@ -66,17 +76,6 @@
 #define for_each_irq_pin(entry, head) \
list_for_each_entry(entry, &head, list)
 
-/*
- * Is the SiS APIC rmw bug present ?
- *  -1 = don't know, 0 = no, 1 = yes
- * When doing a read-modify-write operation on IOAPIC registers, older SiS APIC
- * requires we rewrite the index register again where the read already set up
- * the index register.
- * The code to make use of sis_apic_bug has been removed, but we don't want to
- * loss this knowledge.
- */
-int sis_apic_bug = -1;
-
 static DEFINE_RAW_SPINLOCK(ioapic_lock);
 static DEFINE_MUTEX(ioapic_mutex);
 static unsigned int ioapic_dynirq_base;
@@ -2320,20 +2319,6 @@ void __init setup_IO_APIC(void)
ioapic_initialized = 1;
 }
 
-/*
- *  Called after all the initialization is done. If we didn't find any
- *  APIC bugs then we can allow the modify fast path
- */
-
-static int __init io_apic_bug_finalize(void)
-{
-   if (sis_apic_bug == -1)
-   sis_apic_bug = 0;
-   return 0;
-}
-
-late_initcall(io_apic_bug_finalize);
-
 static void resume_ioapic_id(int ioapic_idx)
 {
unsigned long flags;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 85f247e28a80..d532b8ebf460 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -819,13 +819,6 @@ static void quirk_amd_ioapic(struct pci_dev *dev)
}
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_VIPER_7410,   
quirk_amd_ioapic);
-
-static void quirk_ioapic_rmw(struct pci_dev *dev)
-{
-   if (dev->devfn == 0 && dev->bus->number == 0)
-   sis_apic_bug = 1;
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SI,  PCI_ANY_ID, 
quirk_ioapic_rmw);
 #endif /* CONFIG_X86_IO_APIC */
 
 /*
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 14/33] irq_remapping/vt-d: Move struct irq_2_iommu into intel_irq_remapping.c

2015-04-13 Thread Jiang Liu
Now only intel_irq_remapping.c access irq_2_iommu, so move it from
hw_irq.h into intel_irq_remapping.c.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Greg Kroah-Hartman 
Cc: io...@lists.linux-foundation.org
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Link: 
http://lkml.kernel.org/r/1416901802-24211-24-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 arch/x86/include/asm/hw_irq.h   |9 -
 drivers/iommu/intel_irq_remapping.c |7 +++
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 88632ea75fe0..3520f71f168b 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -95,14 +95,6 @@ extern void trace_call_function_single_interrupt(void);
 #endif /* CONFIG_TRACING */
 
 #ifdef CONFIG_IRQ_REMAP
-/* Intel specific interrupt remapping information */
-struct irq_2_iommu {
-   struct intel_iommu *iommu;
-   u16 irte_index;
-   u16 sub_handle;
-   u8  irte_mask;
-};
-
 /* AMD specific interrupt remapping information */
 struct irq_2_irte {
u16 devid; /* Device ID for IRTE table */
@@ -194,7 +186,6 @@ struct irq_cfg {
u8  move_in_progress : 1;
 #ifdef CONFIG_IRQ_REMAP
union {
-   struct irq_2_iommu irq_2_iommu;
struct irq_2_irte  irq_2_irte;
};
 #endif
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 21fc899e7c49..8a71ef6af93c 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -32,6 +32,13 @@ struct hpet_scope {
unsigned int devfn;
 };
 
+struct irq_2_iommu {
+   struct intel_iommu *iommu;
+   u16 irte_index;
+   u16 sub_handle;
+   u8  irte_mask;
+};
+
 struct intel_ir_data {
struct irq_2_iommu  irq_2_iommu;
struct irte irte_entry;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch Part2 v5 12/33] irq_remapping: Clean up unused interfaces

2015-04-13 Thread Jiang Liu
Now we have converted to hierarchy irqdomain, so clean up unused
interfaces.

Signed-off-by: Jiang Liu 
Cc: Konrad Rzeszutek Wilk 
Cc: Tony Luck 
Cc: Greg Kroah-Hartman 
Cc: io...@lists.linux-foundation.org
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Rafael J. Wysocki 
Cc: Randy Dunlap 
Cc: Yinghai Lu 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Link: 
http://lkml.kernel.org/r/1416901802-24211-22-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
Tested-by: Joerg Roedel 
---
 drivers/iommu/irq_remapping.h |   16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 16b7d814e6fe..91d5a119956a 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -24,11 +24,7 @@
 
 #ifdef CONFIG_IRQ_REMAP
 
-struct IO_APIC_route_entry;
-struct io_apic_irq_attr;
 struct irq_data;
-struct cpumask;
-struct pci_dev;
 struct msi_msg;
 struct irq_domain;
 struct irq_alloc_info;
@@ -54,18 +50,6 @@ struct irq_remap_ops {
/* Enable fault handling */
int  (*enable_faulting)(void);
 
-   /* IO-APIC setup routine */
-   int (*setup_ioapic_entry)(int irq, struct IO_APIC_route_entry *,
- unsigned int, int,
- struct io_apic_irq_attr *);
-
-   /* Set the CPU affinity of a remapped interrupt */
-   int (*set_affinity)(struct irq_data *data, const struct cpumask *mask,
-   bool force);
-
-   /* Free an IRQ */
-   int (*free_irq)(int);
-
/* Get the irqdomain associated the IOMMU device */
struct irq_domain *(*get_ir_irq_domain)(struct irq_alloc_info *);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >