Re: [ANNOUNCE] 3.10.9-rt5

2013-08-22 Thread Fernando Lopez-Lezcano

On 08/22/2013 11:21 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.9-rt5 patch set.


Thanks!,


Changes since v3.10.9-rt4
- swait fixes from Steven. It fixed the issues with CONFIG_RCU_NOCB_CPU
   where the system suddenly froze and RCU wasn't doing its job anymore
- hwlat improvements by Steven

Known issues:

...
Trying to build I get (in make modules):

ERROR: "__udivdi3" [drivers/misc/hwlat_detector.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

(find attached the final configuration used for building)
-- Fernando


build.log.bz2
Description: application/bzip


RE: [linux-nfc] [PATCH RFC] nfc: add a driver for pn532 connected on uart

2013-08-22 Thread Rymarkiewicz, WaldemarX
Hi  Lars,

>This adds a driver for the nxp pn532 nfc chip.
>It is not meant for merging. Instead it is meant to show that some
>progress has been made and what the current state is and to help
>testing.
>Although I can do some basic things with this driver I expect it to
>contain lots of bugs. Be aware!
>This driver is heavily based on the pn533 driver and duplicates much
>code. This has do be factored out some time.

I'm not sure if this is expected approach adding new drivers. You duplicates 
most of pn533 code which is not good.

Also, note that pn533 and pn532  are pretty the same chips (with small 
differences) and it would be quite natural to support both with one driver. 
Pn533 already reads chip version on init, so at this point you already know 
with which chip you are dealing with.

I suggest to separate transport layer from the core in pn533 and add support 
for uart and usb separately. This is exactly what I've planned while changing 
pn533 to support acr122 device.

Thanks,
/Waldek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] fs: use inode_set_user to set uid/gid of inode

2013-08-22 Thread David Miller
From: Rui Xiang 
Date: Fri, 23 Aug 2013 10:48:38 +0800

> Use the new interface to set i_uid/i_gid in inode struct.
> 
> Signed-off-by: Rui Xiang 

For the networking bits:

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] irqchip: sun4i: Don't write to read-only registers

2013-08-22 Thread Axel Lin
According to the datasheet[1], the Interrupt IRQ Pending Registers are
read-only. The implementation of sun4i_irq_ack() is wrong because it writes
to these read-only registers.

This patch removes the wrong irq_ack callback implementation and all the code
writing to these read-only registers in sun4i_of_init().

[1] 
http://dl.linux-sunxi.org/A10/A10%20User%20Manual%20-%20v1.20%20%282012-04-09%2c%20DECRYPTED%29.pdf

Signed-off-by: Axel Lin 
Acked-by: Maxime Ripard 
---
Hi Thomas,
This patch was sent on https://lkml.org/lkml/2013/7/6/59 with Maxime's Ack.
And re-sent on https://lkml.org/lkml/2013/7/19/229
I change the subject line as the patch does is to avoid writing to read-only
registers.

Axel
 drivers/irqchip/irq-sun4i.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/drivers/irqchip/irq-sun4i.c b/drivers/irqchip/irq-sun4i.c
index a5438d8..29b75c0a 100644
--- a/drivers/irqchip/irq-sun4i.c
+++ b/drivers/irqchip/irq-sun4i.c
@@ -38,18 +38,6 @@ static struct irq_domain *sun4i_irq_domain;
 
 static asmlinkage void __exception_irq_entry sun4i_handle_irq(struct pt_regs 
*regs);
 
-static void sun4i_irq_ack(struct irq_data *irqd)
-{
-   unsigned int irq = irqd_to_hwirq(irqd);
-   unsigned int irq_off = irq % 32;
-   int reg = irq / 32;
-   u32 val;
-
-   val = readl(sun4i_irq_base + SUN4I_IRQ_PENDING_REG(reg));
-   writel(val | (1 << irq_off),
-  sun4i_irq_base + SUN4I_IRQ_PENDING_REG(reg));
-}
-
 static void sun4i_irq_mask(struct irq_data *irqd)
 {
unsigned int irq = irqd_to_hwirq(irqd);
@@ -76,7 +64,6 @@ static void sun4i_irq_unmask(struct irq_data *irqd)
 
 static struct irq_chip sun4i_irq_chip = {
.name   = "sun4i_irq",
-   .irq_ack= sun4i_irq_ack,
.irq_mask   = sun4i_irq_mask,
.irq_unmask = sun4i_irq_unmask,
 };
@@ -114,11 +101,6 @@ static int __init sun4i_of_init(struct device_node *node,
writel(0, sun4i_irq_base + SUN4I_IRQ_MASK_REG(1));
writel(0, sun4i_irq_base + SUN4I_IRQ_MASK_REG(2));
 
-   /* Clear all the pending interrupts */
-   writel(0x, sun4i_irq_base + SUN4I_IRQ_PENDING_REG(0));
-   writel(0x, sun4i_irq_base + SUN4I_IRQ_PENDING_REG(1));
-   writel(0x, sun4i_irq_base + SUN4I_IRQ_PENDING_REG(2));
-
/* Enable protection mode */
writel(0x01, sun4i_irq_base + SUN4I_IRQ_PROTECTION_REG);
 
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH -next] block: fix error return code in parse_parts()

2013-08-22 Thread Caizhiyong
> -Original Message-
> From: Wei Yongjun [mailto:weiyj...@gmail.com]
> Sent: Friday, August 23, 2013 10:48 AM
> To: ax...@kernel.dk; a...@linux-foundation.org; Caizhiyong; k...@redhat.com;
> m...@sysgo.de; dw...@infradead.org; computersforpe...@gmail.com;
> dedek...@infradead.org
> Cc: yongjun_...@trendmicro.com.cn; linux-kernel@vger.kernel.org
> Subject: [PATCH -next] block: fix error return code in parse_parts()
> 
> From: Wei Yongjun 
> 
> Fix to return -EINVAL in the parts parse error handling case instead
> of 0(may overwrite to 0 by parse_subpart()), as done elsewhere in this
> function.
> 
> Signed-off-by: Wei Yongjun 
> ---
>  block/cmdline-parser.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/cmdline-parser.c b/block/cmdline-parser.c
> index 18fb435..cc2637f 100644
> --- a/block/cmdline-parser.c
> +++ b/block/cmdline-parser.c
> @@ -135,6 +135,7 @@ static int parse_parts(struct cmdline_parts **parts, 
> const char
> *bdevdef)
> 
>   if (!newparts->subpart) {
>   pr_warn("cmdline partition has no valid partition.");
> + ret = -EINVAL;


Seems OK to me.

>   goto fail;
>   }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI: exynos: add support for MSI

2013-08-22 Thread Jingoo Han
On Monday, August 12, 2013 7:57 PM, Thierry Reding wrote:
> On Mon, Aug 12, 2013 at 05:56:47PM +0900, Jingoo Han wrote:
> [...]
> > diff --git a/arch/arm/mach-exynos/Kconfig b/arch/arm/mach-exynos/Kconfig
> > index 855d4a7..9ef1c95 100644
> > --- a/arch/arm/mach-exynos/Kconfig
> > +++ b/arch/arm/mach-exynos/Kconfig
> > @@ -93,6 +93,7 @@ config SOC_EXYNOS5440
> > default y
> > depends on ARCH_EXYNOS5
> > select ARCH_HAS_OPP
> > +   select ARCH_SUPPORTS_MSI
> 
> This symbol goes away in Thomas Petazzoni's MSI patch series which is
> targetted at 3.12, so I don't think you should add that here.

OK, I see.
I will remove ARCH_SUPPORTS_MSI.

[.]

> > +#endif
> > +
> >  static void exynos_pcie_enable_interrupts(struct pcie_port *pp)
> >  {
> > exynos_pcie_enable_irq_pulse(pp);
> > +#ifdef CONFIG_PCI_MSI
> > +   exynos_pcie_msi_init(pp);
> > +#endif
> > return;
> >  }
> 
> Instead of the whole #ifdef business above, can't you just use something
> like this in exynos_pcie_enable_interrupts():
> 
>   if (IS_ENABLED(CONFIG_PCI_MSI))
>   exynos_pcie_msi_init(pp);
> 
> Now you can drop the #ifdef guards and the compiler will throw away all
> the related code automatically if PCI_MSI is not selected because the
> functions are all static and unused. This has the advantage of compiling
> all the code whether or not PCI_MSI is selected or not, therefore
> increasing compile coverage of the driver.

OK, I see.
I will use 'if IS_ENABLED(CONFIG_PCI_MSI))', and remove #ifdef guards.

[.]

> > +
> > +void arch_teardown_msi_irq(unsigned int irq)
> > +{
> > +   clear_irq(irq);
> > +}
> 
> And we've reworked this largely so that drivers no longer provide arch_*
> functions because that prevents multi-platform support. So I think you
> need to port this to the new msi_chip infrastructure that's being
> introduced in 3.12.

OK, I have looked at the new msi_chip infrastructure made by Thomas Petazzoni.
I will use this msi_chip.

I really appreciate your comment. :)
Thank you.

Best regards,
Jingoo Han


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH -next] block: fix error return code in parse_parts()

2013-08-22 Thread Caizhiyong
> From: Wei Yongjun [mailto:weiyj...@gmail.com]
> Sent: Friday, August 23, 2013 10:48 AM
> To: ax...@kernel.dk; a...@linux-foundation.org; Caizhiyong; k...@redhat.com;
> m...@sysgo.de; dw...@infradead.org; computersforpe...@gmail.com;
> dedek...@infradead.org
> Cc: yongjun_...@trendmicro.com.cn; linux-kernel@vger.kernel.org
> Subject: [PATCH -next] block: fix error return code in parse_parts()
> 
> From: Wei Yongjun 
> 
> Fix to return -EINVAL in the parts parse error handling case instead
> of 0(may overwrite to 0 by parse_subpart()), as done elsewhere in this
> function.
> 
> Signed-off-by: Wei Yongjun 
> ---
>  block/cmdline-parser.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/cmdline-parser.c b/block/cmdline-parser.c
> index 18fb435..cc2637f 100644
> --- a/block/cmdline-parser.c
> +++ b/block/cmdline-parser.c
> @@ -135,6 +135,7 @@ static int parse_parts(struct cmdline_parts **parts, 
> const char
> *bdevdef)
> 
>   if (!newparts->subpart) {
>   pr_warn("cmdline partition has no valid partition.");
> + ret = -EINVAL;

Seems OK to me.

>   goto fail;
>   }
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irqchip: gic: Don't complain in gic_get_cpumask() if UP system

2013-08-22 Thread Nicolas Pitre
On Thu, 22 Aug 2013, Stephen Boyd wrote:

> On 08/22, Nicolas Pitre wrote:
> > On Thu, 22 Aug 2013, Stephen Boyd wrote:
> > 
> > > On 07/17, Stephen Boyd wrote:
> > > > On 07/17/13 15:53, Nicolas Pitre wrote:
> > > > > On Wed, 17 Jul 2013, Stephen Boyd wrote:
> > > > >
> > > > >> On 07/17/13 15:34, Nicolas Pitre wrote:
> > > > >>> On Wed, 17 Jul 2013, Stephen Boyd wrote:
> > > > >>>
> > > >  On 07/12/13 05:10, Stephen Boyd wrote:
> > > > > On 07/12, Javi Merino wrote:
> > > > >> I agree, we should drop the check.  It's annoying in 
> > > > >> uniprocessors and
> > > > >> unlikely to be found in the real world unless your gic entry in 
> > > > >> the dt
> > > > >> is wrong.
> > > > >>> And that's a likely outcome in the real world.
> > > > >>>
> > > > > Ok. How about this?
> > > >  Any comments?
> > > > >>> What about this instead:
> > > > >> Unfortunately arm64 doesn't have SMP_ON_UP. 
> > > > > And why does that matter?
> > > > 
> > > > Because the gic driver is compiled on both arm and arm64? I suppose we
> > > > could define is_smp() to 1 on arm64 but its probably better to rely on
> > > > generic kernel things instead of arch specific functions.
> > > > 
> > > > >
> > > > >> It sounds like you preferred the first patch using 
> > > > >> num_possible_cpus()
> > > > > Probably, yes.  I didn't follow the early conversation though.
> > > > 
> > > > This was the first patch:
> > > > 
> > > > ---8<
> > > > 
> > > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> > > > index 19ceaa6..589c760 100644
> > > > --- a/drivers/irqchip/irq-gic.c
> > > > +++ b/drivers/irqchip/irq-gic.c
> > > > @@ -368,7 +368,7 @@ static u8 gic_get_cpumask(struct gic_chip_data *gic)
> > > > break;
> > > > }
> > > >  
> > > > -   if (!mask)
> > > > +   if (!mask && num_possible_cpus() > 1)
> > > > pr_crit("GIC CPU mask not found - kernel will fail to 
> > > > boot.\n");
> > > >  
> > > > return mask;
> > > 
> > > Can one of these two patches be picked up?
> > 
> > Sure.  Just send it to RMK's patch system with my ACK.
> > 
> 
> I'm confused on that. MAINTAINERS says this patch should go
> through Thomas Gleixner's irq/core branch but it looks like only
> arm-soc has been taking patches for the current location.

Blah.  OK then, just send it to Thomas.

Initially this code was written and committed by RMK which is why I 
suggested you send him the fix.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/10] sched: Factor out code to should_we_balance()

2013-08-22 Thread Joonsoo Kim
On Thu, Aug 22, 2013 at 12:42:57PM +0200, Peter Zijlstra wrote:
> > >
> > > +redo:
> > 
> > One behavioral change worth noting here is that in the redo case if a
> > CPU has become idle we'll continue trying to load-balance in the
> > !new-idle case.
> > 
> > This could be unpleasant in the case where a package has a pinned busy
> > core allowing this and a newly idle cpu to start dueling for load.
> > 
> > While more deterministically bad in this case now, it could racily do
> > this before anyway so perhaps not worth worrying about immediately.
> 
> Ah, because the old code would effectively redo the check and find the
> idle cpu and thereby our cpu would no longer be the balance_cpu.
> 
> Indeed. And I don't think this was an intentional change. I'll go put
> the redo back before should_we_balance().

Ah, yes.
It isn't my intention. Please fix it.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] misc: Add crossbar driver

2013-08-22 Thread Rajendra Nayak
On Thursday 22 August 2013 05:03 PM, Sricharan R wrote:
>  maps crossbar number<->  to interrupt number and
>  calls request_irq(int_no, crossbar_handler,..)

So will this mapping happen based on some data passed from DT or
just based on whats available when the device does a request_irq()?

If its based on whats available then I see an issue when you need
to remap something thats already mapped by default (and not used)
since you run out of all free ones.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [BUGFIX] drivers/base: fix show_mem_removable section count

2013-08-22 Thread Greg Kroah-Hartman
On Thu, Aug 22, 2013 at 11:17:50PM -0500, Russ Anderson wrote:
> On Thu, Aug 22, 2013 at 09:10:45PM -0700, Greg Kroah-Hartman wrote:
> > On Thu, Aug 22, 2013 at 09:38:38PM -0500, Russ Anderson wrote:
> > > "cat /sys/devices/system/memory/memory*/removable" crashed the system.
> > 
> > On what kernels?  linux-next or Linus's tree, or 3.10.y?
> 
> Linus 3.11-rc6

So 3.10 is ok?  Trying to figure out where to send the fix to, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irqchip: gic: Don't complain in gic_get_cpumask() if UP system

2013-08-22 Thread Stephen Boyd
On 08/22, Nicolas Pitre wrote:
> On Thu, 22 Aug 2013, Stephen Boyd wrote:
> 
> > On 07/17, Stephen Boyd wrote:
> > > On 07/17/13 15:53, Nicolas Pitre wrote:
> > > > On Wed, 17 Jul 2013, Stephen Boyd wrote:
> > > >
> > > >> On 07/17/13 15:34, Nicolas Pitre wrote:
> > > >>> On Wed, 17 Jul 2013, Stephen Boyd wrote:
> > > >>>
> > >  On 07/12/13 05:10, Stephen Boyd wrote:
> > > > On 07/12, Javi Merino wrote:
> > > >> I agree, we should drop the check.  It's annoying in uniprocessors 
> > > >> and
> > > >> unlikely to be found in the real world unless your gic entry in 
> > > >> the dt
> > > >> is wrong.
> > > >>> And that's a likely outcome in the real world.
> > > >>>
> > > > Ok. How about this?
> > >  Any comments?
> > > >>> What about this instead:
> > > >> Unfortunately arm64 doesn't have SMP_ON_UP. 
> > > > And why does that matter?
> > > 
> > > Because the gic driver is compiled on both arm and arm64? I suppose we
> > > could define is_smp() to 1 on arm64 but its probably better to rely on
> > > generic kernel things instead of arch specific functions.
> > > 
> > > >
> > > >> It sounds like you preferred the first patch using num_possible_cpus()
> > > > Probably, yes.  I didn't follow the early conversation though.
> > > 
> > > This was the first patch:
> > > 
> > > ---8<
> > > 
> > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> > > index 19ceaa6..589c760 100644
> > > --- a/drivers/irqchip/irq-gic.c
> > > +++ b/drivers/irqchip/irq-gic.c
> > > @@ -368,7 +368,7 @@ static u8 gic_get_cpumask(struct gic_chip_data *gic)
> > >   break;
> > >   }
> > >  
> > > - if (!mask)
> > > + if (!mask && num_possible_cpus() > 1)
> > >   pr_crit("GIC CPU mask not found - kernel will fail to boot.\n");
> > >  
> > >   return mask;
> > 
> > Can one of these two patches be picked up?
> 
> Sure.  Just send it to RMK's patch system with my ACK.
> 

I'm confused on that. MAINTAINERS says this patch should go
through Thomas Gleixner's irq/core branch but it looks like only
arm-soc has been taking patches for the current location.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/5] clk: dt: bindings for mux, divider & gate clocks

2013-08-22 Thread Stephen Boyd
On 08/21, Mike Turquette wrote:
> 
> I just happened across a to-do list note telling me to respond to this
> email. Better late than never.
> 
[snip]
>
> This is a way to establish initial configuration from the consumer's
> perspective. Similarly something can be done for the clock rate with
> assigned-clock-rate.

Ok. Thanks for the information. Unfortunately it isn't what I
thought it was.

> 
> With all of that said this is consumer-level stuff. We'll definitely
> talk about the clock provider DT bindings at the ARM Summit, which is
> what you discuss above.

I can't wait another 2 months to start discussing the clock
provider DT bindings. We need to discuss it on the list.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 0/3] kprobes: add new dma insn slot cache for s390

2013-08-22 Thread Masami Hiramatsu
(2013/08/22 14:52), Heiko Carstens wrote:
> Hi Masami,
> 
>> (2013/08/21 21:01), Heiko Carstens wrote:
>>> The current kpropes insn caches allocate memory areas for insn slots with
>>> module_alloc(). The assumption is that the kernel image and module area
>>> are both within the same +/- 2GB memory area.
>>> This however is not true for s390 where the kernel image resides within
>>> the first 2GB (DMA memory area), but the module area is far away in the
>>> vmalloc area, usually somewhere close below the 4TB area.
>>>
>>> For new pc relative instructions s390 needs insn slots that are within
>>> +/- 2GB of each area. That way we can patch displacements of pc-relative
>>> instructions within the insn slots just like x86 and powerpc.
>>>
>>> The module area works already with the normal insn slot allocator, however
>>> there is currently no way to get insn slots that are within the first 2GB
>>> on s390 (aka DMA area).
>>
>> The reason why we allocate instruction buffers from module area is
>> to execute a piece of code on the buffer, which should be executable.
>> I'm not good for s390, is that allows kernel to execute the code
>> on such DMA buffer?
> 
> Yes, the kernel image itself resides in DMA capable memory and it is all
> executable.
> 
>>> Therefore this patch set introduces a third insn slot cache besides the
>>> normal insn and optinsn slot caches: the dmainsn slot cache. Slots can be
>>> allocated and freed with get_dmainsn_slot() and free_dmainsn_slot().
>>
>> OK, but it seems that your patch introduced unneeded complexity. Perhaps,
>> you just have to introduce 2 weak functions to allocate/release such
>> executable and jump-able buffers, like below,
>>
>> void * __weak arch_allocate_executable_page(void)
>> {
>>  return module_alloc(PAGE_SIZE);
>> }
>>
>> void __weak arch_free_executable_page(void *page)
>> {
>>  module_free(NULL, page);
>> }
>>
>> Thus, all you need to do is implementing dmaalloc() version of above
>> functions on s390. No kconfig, no ifdefs are needed. :)
> 
> Hm, I don't see how that can work, or maybe I just don't get your idea ;)
> Or maybe my intention was not clear? So let me try again:
> 
> If the to be probed instruction resides within the first 2GB of memory
> (aka DMA memory, aka kernel image) the insn slot must be within the first
> 2GB as well, otherwise I can't patch pc-relative instructions.
> 
> On the other hand if the to be probed instruction resides in a module
> (aka part of the vmalloc area), the insn slot must reside within the same
> 2GB area as well.
> 
> Therefore I need to different insn slot caches, where the slots are either
> allocated with __get_free_page(GFP_KERNEL | GFP_DMA) (for the kernel image)
> or module_alloc(PAGE_SIZE) for modules.
> 
> I can't have a single cache which satifies both areas.

Oh, I see.
Indeed, that enough reason to add a new cache... By the way, is there
any way to implement it without new kconfig like DMAPROBE and dma flag?
AFAICS, since such flag is strongly depends on the s390 arch, I don't
like to put it in kernel/kprobes.c.

Perhaps, we can make insn slot more generic, e.g. create new slot type
with passing page allocator.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer

2013-08-22 Thread Masami Hiramatsu
(2013/08/23 8:57), zhangwei(Jovi) wrote:
> On 2013/8/23 0:42, Steven Rostedt wrote:
>> On Fri, 09 Aug 2013 18:56:54 +0900
>> Masami Hiramatsu  wrote:
>>
>>> (2013/08/09 17:45), Namhyung Kim wrote:
 From: Namhyung Kim 

 Fetching from user space should be done in a non-atomic context.  So
 use a temporary buffer and copy its content to the ring buffer
 atomically.

 While at it, use __get_data_size() and store_trace_args() to reduce
 code duplication.
>>>
>>> I just concern using kmalloc() in the event handler. For fetching user
>>> memory which can be swapped out, that is true. But most of the cases,
>>> we can presume that it exists on the physical memory.
>>>
>>
>>
>> What about creating a per cpu buffer when uprobes are registered, and
>> delete them when they are finished? Basically what trace_printk() does
>> if it detects that there are users of trace_printk() in the kernel.
>> Note, it does not deallocate them when finished, as it is never
>> finished until reboot ;-)
>>
>> -- Steve
>>
> I also thought out this approach, but the issue is we cannot fetch user
> memory into per-cpu buffer, because use per-cpu buffer should under
> preempt disabled, and fetching user memory could sleep.

Hm, perhaps, we just need a "hot" buffer pool which can be allocate/free
soon, and whan the pool shortage caller just wait or allocate new page
from "cold" area, this is a.k.a. kmem_cache :)

Anyway, kmem_cache/kmalloc looks so heavy to just allocate temporally
buffers for trace handler (and also, those have tracepoints), so I think
you may just need a memory pool whose has enough number of slots with
a semaphore (which will wait if the all slots are currently used).

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [BUGFIX] drivers/base: fix show_mem_removable section count

2013-08-22 Thread Russ Anderson
On Thu, Aug 22, 2013 at 09:10:45PM -0700, Greg Kroah-Hartman wrote:
> On Thu, Aug 22, 2013 at 09:38:38PM -0500, Russ Anderson wrote:
> > "cat /sys/devices/system/memory/memory*/removable" crashed the system.
> 
> On what kernels?  linux-next or Linus's tree, or 3.10.y?

Linus 3.11-rc6

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [BUGFIX] drivers/base: fix show_mem_removable section count

2013-08-22 Thread Greg Kroah-Hartman
On Thu, Aug 22, 2013 at 09:38:38PM -0500, Russ Anderson wrote:
> "cat /sys/devices/system/memory/memory*/removable" crashed the system.

On what kernels?  linux-next or Linus's tree, or 3.10.y?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] fs: supply inode uid/gid setting interface

2013-08-22 Thread Greg KH
On Fri, Aug 23, 2013 at 10:48:36AM +0800, Rui Xiang wrote:
> This patchset implements an accessor functions to set uid/gid
> in inode struct. Just finish code clean up.

Why?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: unused swap offset / bad page map.

2013-08-22 Thread Dave Jones
On Fri, Aug 23, 2013 at 11:27:29AM +0800, Hillf Danton wrote:
 > On Fri, Aug 23, 2013 at 11:21 AM, Dave Jones  wrote:
 > >
 > > I still see the swap_free messages with this applied.
 > >
 > Decremented?

It actually seems worse, seems I can trigger it even easier now, as if
there's a leak.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] kernel/padata.c: share code between CPU_ONLINE and CPU_DOWN_FAILED, same to CPU_DOWN_PREPARE and CPU_UP_CANCELED

2013-08-22 Thread Chen Gang
On 08/22/2013 02:43 PM, Chen Gang wrote:
> Share code between CPU_ONLINE and CPU_DOWN_FAILED, same to
> CPU_DOWN_PREPARE and CPU_UP_CANCELED.
> 
> It will fix 2 bugs:
> 
>   "not check the return value of __padata_remove_cpu() and 
> __padata_add_cpu()".
>   "need add 'break' between CPU_UP_CANCELED and CPU_DOWN_FAILED".
> 

Do we need more details descriptions ?

If so, could Steffen give more expert details information ?

Thanks.

> 
> Signed-off-by: Chen Gang 
> ---
>  kernel/padata.c |   20 
>  1 files changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 072f4ee..2f0037a 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -846,6 +846,8 @@ static int padata_cpu_callback(struct notifier_block *nfb,
>   switch (action) {
>   case CPU_ONLINE:
>   case CPU_ONLINE_FROZEN:
> + case CPU_DOWN_FAILED:
> + case CPU_DOWN_FAILED_FROZEN:
>   if (!pinst_has_cpu(pinst, cpu))
>   break;
>   mutex_lock(>lock);
> @@ -857,6 +859,8 @@ static int padata_cpu_callback(struct notifier_block *nfb,
>  
>   case CPU_DOWN_PREPARE:
>   case CPU_DOWN_PREPARE_FROZEN:
> + case CPU_UP_CANCELED:
> + case CPU_UP_CANCELED_FROZEN:
>   if (!pinst_has_cpu(pinst, cpu))
>   break;
>   mutex_lock(>lock);
> @@ -865,22 +869,6 @@ static int padata_cpu_callback(struct notifier_block 
> *nfb,
>   if (err)
>   return notifier_from_errno(err);
>   break;
> -
> - case CPU_UP_CANCELED:
> - case CPU_UP_CANCELED_FROZEN:
> - if (!pinst_has_cpu(pinst, cpu))
> - break;
> - mutex_lock(>lock);
> - __padata_remove_cpu(pinst, cpu);
> - mutex_unlock(>lock);
> -
> - case CPU_DOWN_FAILED:
> - case CPU_DOWN_FAILED_FROZEN:
> - if (!pinst_has_cpu(pinst, cpu))
> - break;
> - mutex_lock(>lock);
> - __padata_add_cpu(pinst, cpu);
> - mutex_unlock(>lock);
>   }
>  
>   return NOTIFY_OK;
> 


-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/10] sched, fair: Make group power more consitent

2013-08-22 Thread Preeti U Murthy
On 08/19/2013 09:31 PM, Peter Zijlstra wrote:


Reviewed-by: Preeti U Murthy 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu: WARN_ON when removing a device with no iommu_group associated

2013-08-22 Thread Alex Williamson
[+cc iommu]

On Fri, 2013-08-23 at 09:55 +0800, Wei Yang wrote:
> When removing a device from the system, iommu_group driver will try to
> disconnect it from its group. While in some cases, one device may not
> associated with any iommu_group. For example, not enough DMA address space.
> 
> In the generic bus notification, it will check dev->iommu_group before calling
> iommu_group_remove_device(). While in some cases, developers may call
> iommu_group_remove_device() in a different code path and without check. For
> those devices with dev->iommu_group set to NULL, kernel will crash.
> 
> This patch gives a warning and return when trying to remove a device from an
> iommu_group with dev->iommu_group set to NULL. This helps to indicate some bad
> behavior and also guard the kernel.
> 
> Signed-off-by: Wei Yang 

Acked-by: Alex Williamson 

> ---
>  drivers/iommu/iommu.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index fbe9ca7..43396f0 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -379,6 +379,9 @@ void iommu_group_remove_device(struct device *dev)
>   struct iommu_group *group = dev->iommu_group;
>   struct iommu_device *tmp_device, *device = NULL;
>  
> + if (WARN_ON(!group))
> + return;
> +
>   /* Pre-notify listeners that a device is being removed. */
>   blocking_notifier_call_chain(>notifier,
>IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] mm/hwpoison: fix num_poisoned_pages error statistics for thp

2013-08-22 Thread Naoya Horiguchi
Hi Wanpeng,

On Fri, Aug 23, 2013 at 07:52:40AM +0800, Wanpeng Li wrote:
> Hi Naoya,
> On Thu, Aug 22, 2013 at 12:43:08PM -0400, Naoya Horiguchi wrote:
> >On Thu, Aug 22, 2013 at 05:48:24PM +0800, Wanpeng Li wrote:
> >> There is a race between hwpoison page and unpoison page, memory_failure 
> >> set the page hwpoison and increase num_poisoned_pages without hold page 
> >> lock, and one page count will be accounted against thp for 
> >> num_poisoned_pages.
> >> However, unpoison can occur before memory_failure hold page lock and 
> >> split transparent hugepage, unpoison will decrease num_poisoned_pages 
> >> by 1 << compound_order since memory_failure has not yet split transparent 
> >> hugepage with page lock held. That means we account one page for hwpoison
> >> and 1 << compound_order for unpoison. This patch fix it by decrease one 
> >> account for num_poisoned_pages against no hugetlbfs pages case.
> >> 
> >> Signed-off-by: Wanpeng Li 
> >
> >I think that a thp never becomes hwpoisoned without splitting, so "trying
> >to unpoison thp" never happens (I think that this implicit fact should be
> 
> There is a race window here for hwpoison thp: 

OK, thanks for great explanation (it's worth written in description.)
And I found my previous comment was comletely pointless, sorry :(

>   A   
> B
>   memory_failue 
>   TestSetPageHWPoison(p);
>   if (PageHuge(p))
>   nr_pages = 1 << compound_order(hpage);
>   else 
>   nr_pages = 1;
>   atomic_long_add(nr_pages, _poisoned_pages); 
>   
> unpoison_memory
>   
> nr_pages = 1<< 
> compound_trans_order(page;)
> 
>   
> if(TestClearPageHWPoison(p))
>   
> 
> atomic_long_sub(nr_pages, _poisoned_pages);
>   lock page 
>   if (!PageHWPoison(p))
>   unlock page and return 
>   hwpoison_user_mappings
>   if (PageTransHuge(hpage))
>   split_huge_page(hpage);

When this race happens, our expectation is that num_poisoned_pages is
increased by 1 because finally thread A succeeds to hwpoison one normal page.
So thread B should fail to unpoison without clearing PageHWPoison nor
decreasing num_poisoned_pages.  My suggestion is inserting a PageTransHuge
check before doing TestClearPageHWPoison like follows:

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1cb3b7d..f551b72 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1336,6 +1336,16 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
 
+   /*
+* unpoison_memory() can encounter thp only when the thp is being
+* worked by memory_failure() and the page lock is not held yet.
+* In such case, we yield to memory_failure() and make unpoison fail.
+*/
+   if (PageTransHuge(page)) {
+   pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
+   return 0;
+   }
+
nr_pages = 1 << compound_trans_order(page);
 
if (!get_page_unless_zero(page)) {


I think that replacing atomic_long_sub() with atomic_long_dec() still
has a meaning, so you don't have to drop that.

> 
> We increase one page count, however, decrease 1 << compound_trans_order.
> The compound_trans_order you mentioned is used here for thp, that's why 
> I don't drop it in patch 2/6.

I don't think that we have to use compound_trans_order() any more, because
with the above change we don't calculate nr_pages any more for thp.
We can reduce the cost to lock/unlock compound_lock as described in 2/6.

> >commented somewhere or asserted with VM_BUG_ON().)
> 
> I will add the VM_BUG_ON() in unpoison_memory after lock page in next
> version.

Sorry, my previous suggestion didn't make sense.

Thank you!
Naoya Horiguchi

> >And nr_pages in unpoison_memory() can be greater than 1 for hugetlbfs page.
> >So does this patch break counting when unpoisoning free hugetlbfs pages?
> >
> >Thanks,
> >Naoya Horiguchi
> >
> >> ---
> >>  mm/memory-failure.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 5092e06..6bfd51e 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1350,7 +1350,7 @@ int unpoison_memory(unsigned long pfn)
> >>  

RE: [PATCH] cpuidle: coupled: fix dead loop corner case

2013-08-22 Thread Neil Zhang

> -Original Message-
> From: Colin Cross [mailto:ccr...@google.com]
> Sent: 2013年8月23日 5:08
> To: Neil Zhang
> Cc: Rafael J. Wysocki; Daniel Lezcano; Linux PM list; lkml
> Subject: Re: [PATCH] cpuidle: coupled: fix dead loop corner case
> 
> On Mon, Aug 19, 2013 at 10:17 PM, Neil Zhang 
> wrote:
> > There is a corener case when no peripheral irqs route to secondary
> > cores.
> > Let's take dual core system for example, the sequence is as following:
> >
> > Core 0  Core1
> > 1. set waiting bit and enter waiting
> loop
> > 2. set waiting bit and poke core1
> > 3. clear poke in irq and enter safe
> state
> > 4. set ready bit and enter ready loop
> >
> > Since there is no peripheral irq route to core 1, so it will stay in
> > safe state forever, and core 0 will dead loop in the following code.
> > while (!cpuidle_coupled_cpus_ready(coupled)) {
> > /* Check if any other cpus bailed out of idle. */
> > if (!cpuidle_coupled_cpus_waiting(coupled))
> > }
> >
> > The solution is don't let secondary core enter safe state when it has
> > already handled the poke interrupt.
> >
> > Signed-off-by: Neil Zhang 
> > Reviewed-by: Fangsuo Wu 
> > ---
> >  drivers/cpuidle/coupled.c |7 +++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> > index 2a297f8..a37c718 100644
> > --- a/drivers/cpuidle/coupled.c
> > +++ b/drivers/cpuidle/coupled.c
> > @@ -119,6 +119,7 @@ struct cpuidle_coupled {
> >  #define CPUIDLE_COUPLED_NOT_IDLE   (-1)
> >
> >  static DEFINE_MUTEX(cpuidle_coupled_lock);
> > +static DEFINE_PER_CPU(bool, poke_sync);
> >  static DEFINE_PER_CPU(struct call_single_data,
> > cpuidle_coupled_poke_cb);
> >
> >  /*
> > @@ -295,6 +296,7 @@ static void cpuidle_coupled_poked(void *info)  {
> > int cpu = (unsigned long)info;
> > cpumask_clear_cpu(cpu, _coupled_poked_mask);
> > +   __this_cpu_write(poke_sync, true);
> >  }
> >
> >  /**
> > @@ -473,6 +475,7 @@ retry:
> >  * allowed for a single cpu.
> >  */
> > while (!cpuidle_coupled_cpus_waiting(coupled)) {
> > +   __this_cpu_write(poke_sync, false);
> > if (cpuidle_coupled_clear_pokes(dev->cpu)) {
> > cpuidle_coupled_set_not_waiting(dev->cpu,
> coupled);
> > goto out;
> > @@ -483,6 +486,10 @@ retry:
> > goto out;
> > }
> >
> > +   if (cpuidle_coupled_cpus_waiting(coupled)
> > +   && __this_cpu_read(poke_sync))
> > +   break;
> > +
> > entered_state = cpuidle_enter_state(dev, drv,
> > dev->safe_state_index);
> > }
> > --
> > 1.7.4.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> I have a similar patch that avoids adding another check for
> cpuidle_coupled_cpus_waiting, and uses the return value from
> cpuidle_coupled_clear_pokes instead of adding a percpu bool.  I will post it
> shortly.
> 
> Do you have a test case that can reproduce this easily?

It's not easy to reproduce.
We only catch one time till now.

Best Regards,
Neil Zhang


Re: unused swap offset / bad page map.

2013-08-22 Thread Dave Jones
On Thu, Aug 22, 2013 at 11:21:28AM +0800, Hillf Danton wrote:
 > On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones  wrote:
 > >
 > > didn't hit the bug_on, but got a bunch of
 > >
 > > [  424.077993] swap_free: Unused swap offset entry 000187d5
 > > [  439.377194] swap_free: Unused swap offset entry 000187e7
 > > [  441.998411] swap_free: Unused swap offset entry 000187ee
 > > [  446.956551] swap_free: Unused swap offset entry 245f
 > >
 > If page is reused, its swap entry is freed.
 > 
 > reuse_swap_page()
 >   delete_from_swap_cache()
 > swapcache_free()
 >   count = swap_entry_free(p, entry, SWAP_HAS_CACHE);
 > 
 > If count drops to zero, then swap_free() gives warning.
 > 
 > 
 > --- a/mm/memory.c Wed Aug  7 16:29:34 2013
 > +++ b/mm/memory.c Thu Aug 22 10:44:32 2013
 > @@ -3123,6 +3123,7 @@ static int do_swap_page(struct mm_struct
 >   /* It's better to call commit-charge after rmap is established */
 >   mem_cgroup_commit_charge_swapin(page, ptr);
 > 
 > + if (!exclusive)
 >   swap_free(entry);
 >   if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 >   try_to_free_swap(page);
 > --

I still see the swap_free messages with this applied.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] dm: allow error target to replace either bio-based and request-based targets

2013-08-22 Thread Jun'ichi Nomura
Hello Mike,

On 08/23/13 09:17, Mike Snitzer wrote:
>> I do like the idea of a single error target that is hybrid (supports
>> both bio-based and request-based) but the DM core would need to be
>> updated to support this.
>>
>> Specifically, we'd need to check if the device (and active table) is
>> already bio-based or request-based and select the appropriate type.  If
>> it is a new device, default to selecting bio-based.
>>
>> There are some wrappers and other logic thoughout DM core that will need
>> auditing too.
> 
> Here is a patch that should work for your needs (I tested it to work
> with 'dmsetup wipe_table' on both request-based and bio-based devices):

How about moving the default handling in dm_table_set_type() outside of
the for-each-target loop, like the modified patch below?

For example, if a table has 2 targets, hybrid and request_based,
and live_md_type is DM_TYPE_NONE, the table should be considered as
request_based, not inconsistent.
Though the end result is same as such a table is rejected by other
constraint anyway, I think it's good to keep the semantics clean
and error messages consistent.

I.e. for the above case, the error message should be
"Request-based dm doesn't support multiple targets yet",
not "Inconsistent table: different target types can't be mixed up".

---
Jun'ichi Nomura, NEC Corporation


diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f221812..6e683c8 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -860,14 +860,16 @@ EXPORT_SYMBOL(dm_consume_args);
 static int dm_table_set_type(struct dm_table *t)
 {
unsigned i;
-   unsigned bio_based = 0, request_based = 0;
+   unsigned bio_based = 0, request_based = 0, hybrid = 0;
struct dm_target *tgt;
struct dm_dev_internal *dd;
struct list_head *devices;
 
for (i = 0; i < t->num_targets; i++) {
tgt = t->targets + i;
-   if (dm_target_request_based(tgt))
+   if (dm_target_hybrid(tgt))
+   hybrid = 1;
+   else if (dm_target_request_based(tgt))
request_based = 1;
else
bio_based = 1;
@@ -879,6 +881,25 @@ static int dm_table_set_type(struct dm_table *t)
}
}
 
+   if (hybrid && !bio_based && !request_based) {
+   /*
+* The targets can work either way.
+* Determine the type from the live device.
+*/
+   unsigned live_md_type;
+   dm_lock_md_type(t->md);
+   live_md_type = dm_get_md_type(t->md);
+   dm_unlock_md_type(t->md);
+   switch (live_md_type) {
+   case DM_TYPE_REQUEST_BASED:
+   request_based = 1;
+   break;
+   default:
+   bio_based = 1;
+   break;
+   }
+   }
+
if (bio_based) {
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 37ba5db..242e3ce 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -131,12 +131,19 @@ static int io_err_map(struct dm_target *tt, struct bio 
*bio)
return -EIO;
 }
 
+static int io_err_map_rq(struct dm_target *ti, struct request *clone,
+union map_info *map_context)
+{
+   return -EIO;
+}
+
 static struct target_type error_target = {
.name = "error",
-   .version = {1, 1, 0},
+   .version = {1, 2, 0},
.ctr  = io_err_ctr,
.dtr  = io_err_dtr,
.map  = io_err_map,
+   .map_rq = io_err_map_rq,
 };
 
 int __init dm_target_init(void)
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 45b97da..8b4c075 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -89,10 +89,21 @@ int dm_setup_md_queue(struct mapped_device *md);
 #define dm_target_is_valid(t) ((t)->table)
 
 /*
+ * To check whether the target type is bio-based or not (request-based).
+ */
+#define dm_target_bio_based(t) ((t)->type->map != NULL)
+
+/*
  * To check whether the target type is request-based or not (bio-based).
  */
 #define dm_target_request_based(t) ((t)->type->map_rq != NULL)
 
+/*
+ * To check whether the target type is a hybrid (capable of being
+ * either request-based or bio-based).
+ */
+#define dm_target_hybrid(t) (dm_target_bio_based(t) && 
dm_target_request_based(t))
+
 /*-
  * A registry of target types.
  *---*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/3] DMA: Freescale: Add new 8-channel DMA engine device tree nodes

2013-08-22 Thread Hongbo Zhang

On 08/22/2013 07:16 AM, Stephen Warren wrote:

On 08/21/2013 05:00 PM, Scott Wood wrote:

On Wed, 2013-08-21 at 16:40 -0600, Stephen Warren wrote:

On 07/29/2013 04:49 AM, hongbo.zh...@freescale.com wrote:

+- reg   : 
+- interrupts: 

s/interrupts/specifier/

Do you mean s/interrupt mapping/interrupt specifier/?

And probably s/registers mapping/register specifier/ as well.

Yup.


OK, I will update these descriptions.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V5 3/5] POWER/cpuidle: Generic IBM-POWER backend cpuidle driver.

2013-08-22 Thread Wang Dongsheng-B40534

> diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
> index 0e2cd5c..e805dcd 100644
> --- a/drivers/cpuidle/Kconfig
> +++ b/drivers/cpuidle/Kconfig

Maybe drivers/cpuidle/Kconfig.powerpc is better? Like arm.

> +obj-$(CONFIG_CPU_IDLE_IBM_POWER) += cpuidle-ibm-power.o
> diff --git a/drivers/cpuidle/cpuidle-ibm-power.c
> b/drivers/cpuidle/cpuidle-ibm-power.c
> new file mode 100644
> index 000..4ee5a94
> --- /dev/null
> +++ b/drivers/cpuidle/cpuidle-ibm-power.c
> @@ -0,0 +1,304 @@
> +/*
> + *  cpuidle-ibm-power - idle state cpuidle driver.
> + *  Adapted from drivers/idle/intel_idle.c and
> + *  drivers/acpi/processor_idle.c
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct cpuidle_driver power_idle_driver = {
> + .name = "IBM-POWER-Idle",
> + .owner= THIS_MODULE,
> +};
> +
> +#define MAX_IDLE_STATE_COUNT 2
> +
> +static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;

Again, do not use the macro.

> +static struct cpuidle_state *cpuidle_state_table;
> +
> +static inline void idle_loop_prolog(unsigned long *in_purr)
> +{
> + *in_purr = mfspr(SPRN_PURR);
> + /*
> +  * Indicate to the HV that we are idle. Now would be
> +  * a good time to find other work to dispatch.
> +  */
> + get_lppaca()->idle = 1;
> +}
> +
> +static inline void idle_loop_epilog(unsigned long in_purr)
> +{
> + get_lppaca()->wait_state_cycles += mfspr(SPRN_PURR) - in_purr;
> + get_lppaca()->idle = 0;
> +}
> +
> +static int snooze_loop(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv,
> + int index)
> +{
> + unsigned long in_purr;
> +
> + idle_loop_prolog(_purr);
> + local_irq_enable();

snooze_loop has already registered in cpuidle framework to handle snooze state.
where disable the irq? Why do "enable" here?

> +/*
> + * States for dedicated partition case.
> + */
> +static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
> + { /* Snooze */
> + .name = "snooze",
> + .desc = "snooze",
> + .flags = CPUIDLE_FLAG_TIME_VALID,
> + .exit_latency = 0,
> + .target_residency = 0,
> + .enter = _loop },
> + { /* CEDE */
> + .name = "CEDE",
> + .desc = "CEDE",
> + .flags = CPUIDLE_FLAG_TIME_VALID,
> + .exit_latency = 10,
> + .target_residency = 100,
> + .enter = _cede_loop },
> +};
> +
> +/*
> + * States for shared partition case.
> + */
> +static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
> + { /* Shared Cede */
> + .name = "Shared Cede",
> + .desc = "Shared Cede",
> + .flags = CPUIDLE_FLAG_TIME_VALID,
> + .exit_latency = 0,
> + .target_residency = 0,
> + .enter = _cede_loop },
> +};
> +
> +static void __exit power_processor_idle_exit(void)
> +{
> +
> + unregister_cpu_notifier(_hotplug_notifier);

Remove a blank line.

> + cpuidle_unregister(_idle_driver);
> + return;
> +}
> +
> +module_init(power_processor_idle_init);
> +module_exit(power_processor_idle_exit);
> +

Did you have tested the module? If not tested, please don't use the module.

> +MODULE_AUTHOR("Deepthi Dharwar ");
> +MODULE_DESCRIPTION("Cpuidle driver for IBM POWER platforms");
> +MODULE_LICENSE("GPL");
> 



[PATCH -next] dma: cppi41: fix error return code in cppi41_dma_probe()

2013-08-22 Thread Wei Yongjun
From: Wei Yongjun 

Fix to return -EINVAL in the irq parse and map error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun 
---
 drivers/dma/cppi41.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
index 5dcebca..49ea05a 100644
--- a/drivers/dma/cppi41.c
+++ b/drivers/dma/cppi41.c
@@ -973,8 +973,10 @@ static int cppi41_dma_probe(struct platform_device *pdev)
goto err_chans;
 
irq = irq_of_parse_and_map(pdev->dev.of_node, 0);
-   if (!irq)
+   if (!irq) {
+   ret = -EINVAL;
goto err_irq;
+   }
 
cppi_writel(USBSS_IRQ_PD_COMP, cdd->usbss_mem + USBSS_IRQ_ENABLER);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] fs: use inode_set_user to set uid/gid of inode

2013-08-22 Thread Rui Xiang
Use the new interface to set i_uid/i_gid in inode struct.

Signed-off-by: Rui Xiang 
---
 arch/ia64/kernel/perfmon.c|  3 +--
 arch/powerpc/platforms/cell/spufs/inode.c |  3 +--
 arch/s390/hypfs/inode.c   |  3 +--
 drivers/infiniband/hw/qib/qib_fs.c|  3 +--
 drivers/usb/gadget/f_fs.c |  3 +--
 drivers/usb/gadget/inode.c|  5 +++--
 fs/9p/vfs_inode.c |  6 ++
 fs/adfs/inode.c   |  3 +--
 fs/affs/inode.c   |  6 ++
 fs/afs/inode.c|  6 ++
 fs/anon_inodes.c  |  3 +--
 fs/autofs4/inode.c|  4 ++--
 fs/befs/linuxvfs.c|  8 
 fs/ceph/caps.c|  5 +++--
 fs/ceph/inode.c   |  8 
 fs/cifs/inode.c   |  6 ++
 fs/configfs/inode.c   |  3 +--
 fs/debugfs/inode.c|  3 +--
 fs/devpts/inode.c |  7 +++
 fs/ext2/ialloc.c  |  3 +--
 fs/ext3/ialloc.c  |  3 +--
 fs/ext4/ialloc.c  |  3 +--
 fs/fat/inode.c|  6 ++
 fs/fuse/control.c |  3 +--
 fs/fuse/inode.c   |  4 ++--
 fs/hfs/inode.c|  6 ++
 fs/hfsplus/inode.c|  3 +--
 fs/hpfs/inode.c   |  3 +--
 fs/hpfs/namei.c   | 12 
 fs/hugetlbfs/inode.c  |  3 +--
 fs/isofs/inode.c  |  3 +--
 fs/isofs/rock.c   |  3 +--
 fs/ncpfs/inode.c  |  3 +--
 fs/nfs/inode.c|  4 ++--
 fs/ntfs/inode.c   | 12 
 fs/ntfs/mft.c |  3 +--
 fs/ntfs/super.c   |  3 +--
 fs/ocfs2/refcounttree.c   |  3 +--
 fs/omfs/inode.c   |  3 +--
 fs/pipe.c |  3 +--
 fs/proc/base.c| 15 +--
 fs/proc/fd.c  |  8 
 fs/proc/inode.c   |  3 +--
 fs/proc/self.c|  3 +--
 fs/stack.c|  3 +--
 fs/sysfs/inode.c  |  3 +--
 fs/xfs/xfs_iops.c |  4 ++--
 ipc/mqueue.c  |  3 +--
 kernel/cgroup.c   |  3 +--
 mm/shmem.c|  3 +--
 net/socket.c  |  3 +--
 51 files changed, 86 insertions(+), 142 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 5a9ff1c..73e1e55 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2202,8 +2202,7 @@ pfm_alloc_file(pfm_context_t *ctx)
DPRINT(("new inode ino=%ld @%p\n", inode->i_ino, inode));
 
inode->i_mode = S_IFCHR|S_IRUGO;
-   inode->i_uid  = current_fsuid();
-   inode->i_gid  = current_fsgid();
+   inode_set_user(inode, current_fsuid(), current_fsgid());
 
/*
 * allocate a new dcache entry
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index 87ba7cf..4580c9b 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -101,8 +101,7 @@ spufs_new_inode(struct super_block *sb, umode_t mode)
 
inode->i_ino = get_next_ino();
inode->i_mode = mode;
-   inode->i_uid = current_fsuid();
-   inode->i_gid = current_fsgid();
+   inode_set_user(inode, current_fsuid(), current_fsgid());
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 out:
return inode;
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 7a539f4..742e430 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -103,8 +103,7 @@ static struct inode *hypfs_make_inode(struct super_block 
*sb, umode_t mode)
struct hypfs_sb_info *hypfs_info = sb->s_fs_info;
ret->i_ino = get_next_ino();
ret->i_mode = mode;
-   ret->i_uid = hypfs_info->uid;
-   ret->i_gid = hypfs_info->gid;
+   inode_set_user(ret, hypfs_info->uid, hypfs_info->gid);
ret->i_atime = ret->i_mtime = ret->i_ctime = CURRENT_TIME;
if (S_ISDIR(mode))
set_nlink(ret, 2);
diff --git a/drivers/infiniband/hw/qib/qib_fs.c 
b/drivers/infiniband/hw/qib/qib_fs.c
index f247fc6e..6683837 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -61,13 +61,12 @@ static int qibfs_mknod(struct inode *dir, struct 

[PATCH 1/2] fs: implement inode uid/gid setting function

2013-08-22 Thread Rui Xiang
Supply a interface inode_set_user  to set uid/gid of inode
structs.

Signed-off-by: Rui Xiang 
---
 fs/inode.c | 7 +++
 include/linux/fs.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index e315c0a..3f90499 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -343,6 +343,13 @@ void inc_nlink(struct inode *inode)
 }
 EXPORT_SYMBOL(inc_nlink);
 
+void inode_set_user(struct inode *inode, kuid_t uid, kgid_t gid)
+{
+   inode->i_uid = uid;
+   inode->i_gid = gid;
+}
+EXPORT_SYMBOL(inode_set_user);
+
 void address_space_init_once(struct address_space *mapping)
 {
memset(mapping, 0, sizeof(*mapping));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 729e81b..36ac51b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2619,6 +2619,7 @@ void __inode_sub_bytes(struct inode *inode, loff_t bytes);
 void inode_sub_bytes(struct inode *inode, loff_t bytes);
 loff_t inode_get_bytes(struct inode *inode);
 void inode_set_bytes(struct inode *inode, loff_t bytes);
+void inode_set_user(struct inode *inode, kuid_t uid, kgid_t gid);
 
 extern int vfs_readdir(struct file *, filldir_t, void *);
 extern int iterate_dir(struct file *, struct dir_context *);
-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] fs: supply inode uid/gid setting interface

2013-08-22 Thread Rui Xiang
This patchset implements an accessor functions to set uid/gid
in inode struct. Just finish code clean up.

Rui Xiang (2):
  fs: implement inode uid/gid setting function
  fs: use inode_set_user to set uid/gid of inode

 arch/ia64/kernel/perfmon.c|  3 +--
 arch/powerpc/platforms/cell/spufs/inode.c |  3 +--
 arch/s390/hypfs/inode.c   |  3 +--
 drivers/infiniband/hw/qib/qib_fs.c|  3 +--
 drivers/usb/gadget/f_fs.c |  3 +--
 drivers/usb/gadget/inode.c|  5 +++--
 fs/9p/vfs_inode.c |  6 ++
 fs/adfs/inode.c   |  3 +--
 fs/affs/inode.c   |  6 ++
 fs/afs/inode.c|  6 ++
 fs/anon_inodes.c  |  3 +--
 fs/autofs4/inode.c|  4 ++--
 fs/befs/linuxvfs.c|  8 
 fs/ceph/caps.c|  5 +++--
 fs/ceph/inode.c   |  8 
 fs/cifs/inode.c   |  6 ++
 fs/configfs/inode.c   |  3 +--
 fs/debugfs/inode.c|  3 +--
 fs/devpts/inode.c |  7 +++
 fs/ext2/ialloc.c  |  3 +--
 fs/ext3/ialloc.c  |  3 +--
 fs/ext4/ialloc.c  |  3 +--
 fs/fat/inode.c|  6 ++
 fs/fuse/control.c |  3 +--
 fs/fuse/inode.c   |  4 ++--
 fs/hfs/inode.c|  6 ++
 fs/hfsplus/inode.c|  3 +--
 fs/hpfs/inode.c   |  3 +--
 fs/hpfs/namei.c   | 12 
 fs/hugetlbfs/inode.c  |  3 +--
 fs/inode.c|  7 +++
 fs/isofs/inode.c  |  3 +--
 fs/isofs/rock.c   |  3 +--
 fs/ncpfs/inode.c  |  3 +--
 fs/nfs/inode.c|  4 ++--
 fs/ntfs/inode.c   | 12 
 fs/ntfs/mft.c |  3 +--
 fs/ntfs/super.c   |  3 +--
 fs/ocfs2/refcounttree.c   |  3 +--
 fs/omfs/inode.c   |  3 +--
 fs/pipe.c |  3 +--
 fs/proc/base.c| 15 +--
 fs/proc/fd.c  |  8 
 fs/proc/inode.c   |  3 +--
 fs/proc/self.c|  3 +--
 fs/stack.c|  3 +--
 fs/sysfs/inode.c  |  3 +--
 fs/xfs/xfs_iops.c |  4 ++--
 include/linux/fs.h|  1 +
 ipc/mqueue.c  |  3 +--
 kernel/cgroup.c   |  3 +--
 mm/shmem.c|  3 +--
 net/socket.c  |  3 +--
 53 files changed, 94 insertions(+), 142 deletions(-)

-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -next] block: fix error return code in parse_parts()

2013-08-22 Thread Wei Yongjun
From: Wei Yongjun 

Fix to return -EINVAL in the parts parse error handling case instead
of 0(may overwrite to 0 by parse_subpart()), as done elsewhere in this
function.

Signed-off-by: Wei Yongjun 
---
 block/cmdline-parser.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/cmdline-parser.c b/block/cmdline-parser.c
index 18fb435..cc2637f 100644
--- a/block/cmdline-parser.c
+++ b/block/cmdline-parser.c
@@ -135,6 +135,7 @@ static int parse_parts(struct cmdline_parts **parts, const 
char *bdevdef)
 
if (!newparts->subpart) {
pr_warn("cmdline partition has no valid partition.");
+   ret = -EINVAL;
goto fail;
}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/9] target: Make spc_parse_naa_6h_vendor_specific non static

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch makes spc_parse_naa_6h_vendor_specific() available to
other target code, which is required by EXTENDED_COPY when comparing
the received NAA WWN device identifer for locating the associated
se_device backend.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_spc.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c
index ed7077a..bd16a93 100644
--- a/drivers/target/target_core_spc.c
+++ b/drivers/target/target_core_spc.c
@@ -126,8 +126,8 @@ spc_emulate_evpd_80(struct se_cmd *cmd, unsigned char *buf)
return 0;
 }
 
-static void spc_parse_naa_6h_vendor_specific(struct se_device *dev,
-   unsigned char *buf)
+void spc_parse_naa_6h_vendor_specific(struct se_device *dev,
+ unsigned char *buf)
 {
unsigned char *p = >t10_wwn.unit_serial[0];
int cnt;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] target: Add global device list for EXTENDED_COPY

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

EXTENDED_COPY needs to be able to search a global list of devices
based on NAA WWN device identifiers, so add a simple g_device_list
protected by g_device_mutex.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_device.c |   13 +
 include/target/target_core_base.h   |1 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index de89046..458944e 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -47,6 +47,9 @@
 #include "target_core_pr.h"
 #include "target_core_ua.h"
 
+DEFINE_MUTEX(g_device_mutex);
+LIST_HEAD(g_device_list);
+
 static struct se_hba *lun0_hba;
 /* not static, needed by tpg.c */
 struct se_device *g_lun0_dev;
@@ -1406,6 +1409,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, 
const char *name)
INIT_LIST_HEAD(>delayed_cmd_list);
INIT_LIST_HEAD(>state_list);
INIT_LIST_HEAD(>qf_cmd_list);
+   INIT_LIST_HEAD(>g_dev_node);
spin_lock_init(>stats_lock);
spin_lock_init(>execute_task_lock);
spin_lock_init(>delayed_cmd_lock);
@@ -1525,6 +1529,11 @@ int target_configure_device(struct se_device *dev)
spin_lock(>device_lock);
hba->dev_count++;
spin_unlock(>device_lock);
+
+   mutex_lock(_device_mutex);
+   list_add_tail(>g_dev_node, _device_list);
+   mutex_unlock(_device_mutex);
+
return 0;
 
 out_free_alua:
@@ -1543,6 +1552,10 @@ void target_free_device(struct se_device *dev)
if (dev->dev_flags & DF_CONFIGURED) {
destroy_workqueue(dev->tmr_wq);
 
+   mutex_lock(_device_mutex);
+   list_del(>g_dev_node);
+   mutex_unlock(_device_mutex);
+
spin_lock(>device_lock);
hba->dev_count--;
spin_unlock(>device_lock);
diff --git a/include/target/target_core_base.h 
b/include/target/target_core_base.h
index 0783b2c..6b14f3c 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -686,6 +686,7 @@ struct se_device {
struct list_headdelayed_cmd_list;
struct list_headstate_list;
struct list_headqf_cmd_list;
+   struct list_headg_dev_node;
/* Pointer to associated SE HBA */
struct se_hba   *se_hba;
/* T10 Inquiry and VPD WWN Information */
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/9] target: Add support for EXTENDED_COPY copy offload emulation

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch adds support for EXTENDED_COPY emulation from SPC-3, that
enables full copy offload target support within both a single virtual
backend device, and across multiple virtual backend devices.  It also
functions independent of target fabric, and supports copy offload
across multiple target fabric ports.

This implemenation supports both EXTENDED_COPY PUSH and PULL models
of operation, so the actual CDB may be received on either source or
desination logical unit.

For Target Descriptors, it currently supports the NAA IEEE Registered
Extended designator (type 0xe4), which allows the reference of target
ports to occur independent of fabric type using EVPD 0x83 WWNs.

For Segment Descriptors, it currently supports copy from block to
block (0x02) mode.

It also honors any present SCSI reservations of the destination target
port.  Note that only Supports No List Identifier (SNLID=1) mode is
supported.

Also included is basic RECEIVE_COPY_RESULTS with service action type
OPERATING PARAMETERS (0x03) required for SNLID=1 operation.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/Makefile|3 +-
 drivers/target/target_core_xcopy.c | 1122 
 drivers/target/target_core_xcopy.h |   62 ++
 include/target/target_core_base.h  |1 +
 4 files changed, 1187 insertions(+), 1 deletions(-)
 create mode 100644 drivers/target/target_core_xcopy.c
 create mode 100644 drivers/target/target_core_xcopy.h

diff --git a/drivers/target/Makefile b/drivers/target/Makefile
index 9fdcb56..85b012d 100644
--- a/drivers/target/Makefile
+++ b/drivers/target/Makefile
@@ -13,7 +13,8 @@ target_core_mod-y := target_core_configfs.o \
   target_core_spc.o \
   target_core_ua.o \
   target_core_rd.o \
-  target_core_stat.o
+  target_core_stat.o \
+  target_core_xcopy.o
 
 obj-$(CONFIG_TARGET_CORE)  += target_core_mod.o
 
diff --git a/drivers/target/target_core_xcopy.c 
b/drivers/target/target_core_xcopy.c
new file mode 100644
index 000..e0fabea
--- /dev/null
+++ b/drivers/target/target_core_xcopy.c
@@ -0,0 +1,1122 @@
+/***
+ * Filename: target_core_xcopy.c
+ *
+ * This file contains support for SPC-4 Extended-Copy offload with generic
+ * TCM backends.
+ *
+ * Copyright (c) 2011-2013 Datera, Inc. All rights reserved.
+ *
+ * Author:
+ * Nicholas A. Bellinger 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ 
**/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "target_core_pr.h"
+#include "target_core_ua.h"
+#include "target_core_xcopy.h"
+
+/* #define XCOPY_DBG_CTL */
+#ifdef XCOPY_DBG_CTL
+#define XCOPY_CTL(x...) printk(KERN_INFO x)
+#else
+#define XCOPY_CTL(x...)
+#endif
+
+/* #define XCOPY_DBG_IO */
+#ifdef XCOPY_DBG_IO
+#define XCOPY_IO(x...) printk(KERN_INFO x)
+#else
+#define XCOPY_IO(x...)
+#endif
+
+static struct workqueue_struct *xcopy_wq = NULL;
+/*
+ * From target_core_spc.c
+ */
+extern void spc_parse_naa_6h_vendor_specific(struct se_device *, unsigned char 
*);
+/*
+ * From target_core_device.c
+ */
+extern struct mutex g_device_mutex;
+extern struct list_head g_device_list;
+/*
+ * From target_core_configfs.c
+ */
+extern struct configfs_subsystem *target_core_subsystem[];
+
+static int target_xcopy_gen_naa_ieee(struct se_device *dev, unsigned char *buf)
+{
+   int off = 0;
+
+   buf[off++] = (0x6 << 4);
+   buf[off++] = 0x01;
+   buf[off++] = 0x40;
+   buf[off] = (0x5 << 4);
+
+   spc_parse_naa_6h_vendor_specific(dev, [off]);
+   return 0;
+}
+
+static int target_xcopy_locate_se_dev_e4(struct se_cmd *se_cmd, struct 
xcopy_op *xop,
+   bool src)
+{
+   struct se_device *se_dev;
+   struct configfs_subsystem *subsys = target_core_subsystem[0];
+   unsigned char tmp_dev_wwn[XCOPY_NAA_IEEE_REGEX_LEN], *dev_wwn;
+   int rc;
+
+   if (src == true)
+   dev_wwn = >dst_tid_wwn[0];
+  

[PATCH 3/9] target: Make helpers non static for EXTENDED_COPY command setup

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Both transport_generic_get_mem() and transport_generic_map_mem_to_cmd()
are required by EXTENDED_COPY logic when setting up internally
dispatched command descriptors, so go ahead and make both of these
non static.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_transport.c |5 ++---
 include/target/target_core_backend.h   |4 
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 3009cda..2f9c402 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -67,7 +67,6 @@ struct kmem_cache *t10_alua_tg_pt_gp_mem_cache;
 static void transport_complete_task_attr(struct se_cmd *cmd);
 static void transport_handle_queue_full(struct se_cmd *cmd,
struct se_device *dev);
-static int transport_generic_get_mem(struct se_cmd *cmd);
 static int transport_put_cmd(struct se_cmd *cmd);
 static void target_complete_ok_work(struct work_struct *work);
 
@@ -1254,7 +1253,7 @@ int transport_handle_cdb_direct(
 }
 EXPORT_SYMBOL(transport_handle_cdb_direct);
 
-static sense_reason_t
+sense_reason_t
 transport_generic_map_mem_to_cmd(struct se_cmd *cmd, struct scatterlist *sgl,
u32 sgl_count, struct scatterlist *sgl_bidi, u32 sgl_bidi_count)
 {
@@ -2164,7 +2163,7 @@ out:
return -ENOMEM;
 }
 
-static int
+int
 transport_generic_get_mem(struct se_cmd *cmd)
 {
u32 length = cmd->data_length;
diff --git a/include/target/target_core_backend.h 
b/include/target/target_core_backend.h
index 77f25e0..9f07231 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -74,6 +74,10 @@ int  transport_set_vpd_ident(struct t10_vpd *, unsigned char 
*);
 /* core helpers also used by command snooping in pscsi */
 void   *transport_kmap_data_sg(struct se_cmd *);
 void   transport_kunmap_data_sg(struct se_cmd *);
+/* core helpers also used by xcopy during internal command setup */
+inttransport_generic_get_mem(struct se_cmd *);
+sense_reason_t transport_generic_map_mem_to_cmd(struct se_cmd *,
+   struct scatterlist *, u32, struct scatterlist *, u32);
 
 void   array_free(void *array, int n);
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/9] target: Make target_core_subsystem defined as non static

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch makes the top-level target_core_subsystem array available
to other target code, which is required by EXTENDED_COPY to pin the
backend se_device using configfs_depend_item(), in order to ensure
it can't be removed for the duration of a EXTENDED_COPY operation.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_configfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/target/target_core_configfs.c 
b/drivers/target/target_core_configfs.c
index 24517d4..939ecc5 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -268,7 +268,7 @@ static struct configfs_subsystem target_core_fabrics = {
},
 };
 
-static struct configfs_subsystem *target_core_subsystem[] = {
+struct configfs_subsystem *target_core_subsystem[] = {
_core_fabrics,
NULL,
 };
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 00/10] tracing: trace event triggers

2013-08-22 Thread Steven Rostedt
On Thu, 22 Aug 2013 18:27:16 -0500
Tom Zanussi  wrote:

> Hi,
> 
> This is v6 of the trace event triggers patchset.  This is essentially
> the same as v5, but rebased to trace/for-next, which had a couple of
> new conflicting patches pulled in since I had cut v5.  This version
> just fixes up those conflicts.
> 
> v6:
>  - fixed up the conflicts in trace_events.c related to the actual
>creation of the per-event 'trigger' files.

Thanks Tom!

Just to let you know, I wont be able to take a look at these till
Monday.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/9] target: Enable EXTENDED_COPY setup in spc_parse_cdb

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Setup up the se_cmd->execute_cmd() pointers for EXTENDED_COPY and
RECEIVE_COPY_RESULTS handling within spc_parse_cdb()

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_spc.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c
index bd16a93..894e83b 100644
--- a/drivers/target/target_core_spc.c
+++ b/drivers/target/target_core_spc.c
@@ -35,7 +35,7 @@
 #include "target_core_alua.h"
 #include "target_core_pr.h"
 #include "target_core_ua.h"
-
+#include "target_core_xcopy.h"
 
 static void spc_fill_alua_data(struct se_port *port, unsigned char *buf)
 {
@@ -1252,8 +1252,14 @@ spc_parse_cdb(struct se_cmd *cmd, unsigned int *size)
*size = (cdb[6] << 24) | (cdb[7] << 16) | (cdb[8] << 8) | 
cdb[9];
break;
case EXTENDED_COPY:
-   case READ_ATTRIBUTE:
+   *size = get_unaligned_be32([10]);
+   cmd->execute_cmd = target_do_xcopy;
+   break;
case RECEIVE_COPY_RESULTS:
+   *size = get_unaligned_be32([10]);
+   cmd->execute_cmd = target_do_receive_copy_results;
+   break;
+   case READ_ATTRIBUTE:
case WRITE_ATTRIBUTE:
*size = (cdb[10] << 24) | (cdb[11] << 16) |
   (cdb[12] << 8) | cdb[13];
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/9] target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch adds an check for a non-existent port->sep_alua_tg_pt_gp_mem
within target_alua_state_check(), which is not present for internally
dispatched EXTENDED_COPY WRITE I/O to the destination target port.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_alua.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/target/target_core_alua.c 
b/drivers/target/target_core_alua.c
index 5403186..ea928c4 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -557,6 +557,9 @@ target_alua_state_check(struct se_cmd *cmd)
 * a ALUA logical unit group.
 */
tg_pt_gp_mem = port->sep_alua_tg_pt_gp_mem;
+   if (!tg_pt_gp_mem)
+   return 0;
+
spin_lock(_pt_gp_mem->tg_pt_gp_mem_lock);
tg_pt_gp = tg_pt_gp_mem->tg_pt_gp;
out_alua_state = atomic_read(_pt_gp->tg_pt_gp_alua_access_state);
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/9] target: Enable global EXTENDED_COPY setup/release

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Add calls to target_xcopy_setup_pt() + target_xcopy_release_pt() to
target_core_init_configfs() and target_core_exit_configfs()
respectively.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_configfs.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/target/target_core_configfs.c 
b/drivers/target/target_core_configfs.c
index 026e42b..328f425 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -48,6 +48,7 @@
 #include "target_core_alua.h"
 #include "target_core_pr.h"
 #include "target_core_rd.h"
+#include "target_core_xcopy.h"
 
 extern struct t10_alua_lu_gp *default_lu_gp;
 
@@ -2935,6 +2936,10 @@ static int __init target_core_init_configfs(void)
if (ret < 0)
goto out;
 
+   ret = target_xcopy_setup_pt();
+   if (ret < 0)
+   goto out;
+
return 0;
 
 out:
@@ -3007,6 +3012,7 @@ static void __exit target_core_exit_configfs(void)
 
core_dev_release_virtual_lun0();
rd_module_exit();
+   target_xcopy_release_pt();
release_se_kmem_caches();
 }
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] w1: mxc_w1: remove unnecessary platform_set_drvdata()

2013-08-22 Thread Shawn Guo
On Thu, Aug 22, 2013 at 11:20:58AM +0900, Jingoo Han wrote:
> The driver core clears the driver data to NULL after device_release
> or on probe failure. Thus, it is not needed to manually clear the
> device driver data to NULL.
> 
> Signed-off-by: Jingoo Han 

Acked-by: Shawn Guo 

> ---
>  drivers/w1/masters/mxc_w1.c |2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/w1/masters/mxc_w1.c b/drivers/w1/masters/mxc_w1.c
> index 47e12cf..15c7251 100644
> --- a/drivers/w1/masters/mxc_w1.c
> +++ b/drivers/w1/masters/mxc_w1.c
> @@ -152,8 +152,6 @@ static int mxc_w1_remove(struct platform_device *pdev)
>  
>   clk_disable_unprepare(mdev->clk);
>  
> - platform_set_drvdata(pdev, NULL);
> -
>   return 0;
>  }
>  
> -- 
> 1.7.10.4
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] target: Add Third Party Copy (3PC) bit in INQUIRY response

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch adds the Third Party Copy (3PC) bit to signal support
for EXTENDED_COPY within standard inquiry response data.

Also add emulate_3pc device attribute in configfs (enabled by default)
to allow the exposure of this bit to be disabled, if necessary.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Martin Petersen 
Cc: Chris Mason 
Cc: Roland Dreier 
Cc: Zach Brown 
Cc: James Bottomley 
Cc: Nicholas Bellinger 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_configfs.c |4 
 drivers/target/target_core_device.c   |   14 ++
 drivers/target/target_core_internal.h |1 +
 drivers/target/target_core_spc.c  |6 ++
 include/target/target_core_base.h |3 +++
 5 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/drivers/target/target_core_configfs.c 
b/drivers/target/target_core_configfs.c
index 939ecc5..026e42b 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -639,6 +639,9 @@ SE_DEV_ATTR(emulate_tpws, S_IRUGO | S_IWUSR);
 DEF_DEV_ATTRIB(emulate_caw);
 SE_DEV_ATTR(emulate_caw, S_IRUGO | S_IWUSR);
 
+DEF_DEV_ATTRIB(emulate_3pc);
+SE_DEV_ATTR(emulate_3pc, S_IRUGO | S_IWUSR);
+
 DEF_DEV_ATTRIB(enforce_pr_isids);
 SE_DEV_ATTR(enforce_pr_isids, S_IRUGO | S_IWUSR);
 
@@ -697,6 +700,7 @@ static struct configfs_attribute 
*target_core_dev_attrib_attrs[] = {
_core_dev_attrib_emulate_tpu.attr,
_core_dev_attrib_emulate_tpws.attr,
_core_dev_attrib_emulate_caw.attr,
+   _core_dev_attrib_emulate_3pc.attr,
_core_dev_attrib_enforce_pr_isids.attr,
_core_dev_attrib_is_nonrot.attr,
_core_dev_attrib_emulate_rest_reord.attr,
diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index 458944e..6f492c7 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -906,6 +906,19 @@ int se_dev_set_emulate_caw(struct se_device *dev, int flag)
return 0;
 }
 
+int se_dev_set_emulate_3pc(struct se_device *dev, int flag)
+{
+   if (flag != 0 && flag != 1) {
+   pr_err("Illegal value %d\n", flag);
+   return -EINVAL;
+   }
+   dev->dev_attrib.emulate_3pc = flag;
+   pr_debug("dev[%p]: SE Device 3rd Party Copy (EXTENDED_COPY): %d\n",
+   dev, flag);
+
+   return 0;
+}
+
 int se_dev_set_enforce_pr_isids(struct se_device *dev, int flag)
 {
if ((flag != 0) && (flag != 1)) {
@@ -1442,6 +1455,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, 
const char *name)
dev->dev_attrib.emulate_tpu = DA_EMULATE_TPU;
dev->dev_attrib.emulate_tpws = DA_EMULATE_TPWS;
dev->dev_attrib.emulate_caw = DA_EMULATE_CAW;
+   dev->dev_attrib.emulate_3pc = DA_EMULATE_3PC;
dev->dev_attrib.enforce_pr_isids = DA_ENFORCE_PR_ISIDS;
dev->dev_attrib.is_nonrot = DA_IS_NONROT;
dev->dev_attrib.emulate_rest_reord = DA_EMULATE_REST_REORD;
diff --git a/drivers/target/target_core_internal.h 
b/drivers/target/target_core_internal.h
index 805ceb4..579128a 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -34,6 +34,7 @@ int   se_dev_set_emulate_tas(struct se_device *, int);
 intse_dev_set_emulate_tpu(struct se_device *, int);
 intse_dev_set_emulate_tpws(struct se_device *, int);
 intse_dev_set_emulate_caw(struct se_device *, int);
+intse_dev_set_emulate_3pc(struct se_device *, int);
 intse_dev_set_enforce_pr_isids(struct se_device *, int);
 intse_dev_set_is_nonrot(struct se_device *, int);
 intse_dev_set_emulate_rest_reord(struct se_device *dev, int);
diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c
index 894e83b..566dd27 100644
--- a/drivers/target/target_core_spc.c
+++ b/drivers/target/target_core_spc.c
@@ -95,6 +95,12 @@ spc_emulate_inquiry_std(struct se_cmd *cmd, unsigned char 
*buf)
 */
spc_fill_alua_data(lun->lun_sep, buf);
 
+   /*
+* Set Third-Party Copy (3PC) bit to indicate support for EXTENDED_COPY
+*/
+   if (dev->dev_attrib.emulate_3pc)
+   buf[5] |= 0x8;
+
buf[7] = 0x2; /* CmdQue=1 */
 
snprintf([8], 8, "LIO-ORG");
diff --git a/include/target/target_core_base.h 
b/include/target/target_core_base.h
index f54a015..ba9ca79 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -99,6 +99,8 @@
 #define DA_EMULATE_TPWS0
 /* Emulation for CompareAndWrite (AtomicTestandSet) by default */
 #define DA_EMULATE_CAW 1
+/* Emulation for 3rd Party Copy (ExtendedCopy) by default */
+#define DA_EMULATE_3PC 1
 /* No Emulation for PSCSI by default */
 #define DA_EMULATE_ALUA0
 /* Enforce SCSI Initiator Port TransportID with 'ISID' for PR */
@@ -606,6 +608,7 @@ struct se_dev_attrib {

[PATCH 0/9] target: Add support for EXTENDED_COPY (VAAI) offload emulation

2013-08-22 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Hi folks!

This series adds support to target-core for generic EXTENDED_COPY offload
emulation as defined by SPC-4 using virtual (IBLOCK, FILEIO, RAMDISK)
backends.

EXTENDED_COPY is a VMWare ESX VAAI primative that is used to perform copy
offload, that allows a target to perform local READ + WRITE I/O requests
for bulk data transfers (cloning a virtual machine for example), instead
of requiring these I/Os to actually be sent to/from the requesting SCSI
initiator port.

This implemenation fully supports copy offload between the same device
backend, and across multiple device backends.  It supports copy offload
transparently across multiple target ports of different fabrics, eg:
iSCSI -> FC, FC -> iSER, iSER -> FCoE and so on.

It also supports both PUSH and PULL models of operation, so the actual
EXTENDED_COPY CDB may be received on either source or destination logical
unit.

For Target Descriptors, it currently supports the NAA IEEE Registered
Extended designator (type 0xe4), which allows the reference of target
ports to occur independent of fabric type using EVPD 0x83 WWNs.  For
Segment Descriptors, it currently supports copy from block to block
(0x02) mode.

Here's a quick snippet of the code in action with sg_xcopy performing
copy offload between two IBLOCK and FILEIO backends:

[  644.638215] Processing XCOPY with list_id: 0x00 list_id_usage: 0x10 tdll: 64 
sdll: 28 inline_dl: 0
[  644.648227] XCOPY 0xe4: RELATIVE INITIATOR PORT IDENTIFIER: 0
[  644.654639] XCOPY 0xe4: desig_len: 16
[  644.658722] XCOPY 0xe4: Set xop->src_dev 88045d77 from source 
received xop
[  644.667179] XCOPY 0xe4: RELATIVE INITIATOR PORT IDENTIFIER: 0
[  644.673597] XCOPY 0xe4: desig_len: 16
[  644.677699] XCOPY 0xe4: Setting xop->dst_dev: 88045d771048 from located 
se_dev
[  644.686297] Called configfs_depend_item for subsys: a00f2570 se_dev: 
88045d771048 se_dev->se_dev_group: 88045d7714f8
[  644.699607] XCOPY TGT desc: Source dev: 88045d77 NAA IEEE WWN: 
0x6001405d2e0745b08564acea3ca401e5
[  644.710296] XCOPY TGT desc: Dest dev: 88045d771048 NAA IEEE WWN: 
0x60014056da9d8672d4b437596ab764b3
[  644.720782] XCOPY: Processed 2 target descriptors, length: 64
[  644.727203] XCOPY seg desc 0x02: desc_len: 24 stdi: 0 dtdi: 1, DC: 2
[  644.734304] XCOPY seg desc 0x02: nolb: 1 src_lba: 0 dst_lba: 0
[  644.740819] XCOPY seg desc 0x02: DC=1 w/ dbl: 0
[  644.745881] XCOPY: Processed 1 segment descriptors, length: 28
[  644.752402] target_xcopy_do_work: nolb: 1, max_nolb: 1024 end_lba: 1
[  644.759504] target_xcopy_do_work: Starting src_lba: 0, dst_lba: 0
[  644.766303] target_xcopy_do_work: Calling read src_dev: 88045d77 
src_lba: 0, cur_nolb: 1
[  644.776115] XCOPY: Built READ_16: LBA: 0 Sectors: 1 Length: 512
[  644.782751] Honoring local SRC port from ec_cmd->se_dev: 88045d77
[  644.790335] Honoring local SRC port from ec_cmd->se_lun: 88085a1977e0
[  644.797921] XCOPY-READ: Saved xop->xop_data_sg: 880459d3e3a8, num: 1 for 
READ memory
[  644.807203] target_xcopy_issue_pt_cmd(): SCSI status: 0x00
[  644.81] target_xcopy_do_work: Incremented READ src_lba to 1
[  644.819947] target_xcopy_do_work: Calling write dst_dev: 88045d771048 
dst_lba: 0, cur_nolb: 1
[  644.829854] XCOPY: Built WRITE_16: LBA: 0 Sectors: 1 Length: 512
[  644.836568] Setup emulated se_dev: 88045d771048 from se_dev
[  644.843185] Setup emulated se_dev: 88045d771048 to 
pt_cmd->se_lun->lun_se_dev
[  644.851545] Setup emulated remote DEST xcopy_pt_port: a00f7610 to 
cmd->se_lun->lun_sep for X-COPY data PUSH
[  644.863198] Setup PASSTHROUGH_NOALLOC t_data_sg: 880459d3e3a8 
t_data_nents: 1
[  644.895203] target_xcopy_issue_pt_cmd(): SCSI status: 0x00
[  644.901332] target_xcopy_do_work: Incremented WRITE dst_lba to 1
[  644.908044] Calling configfs_undepend_item for subsys: a00f2570 
remote_dev: 88045d771048 remote_dev->dev_group: 88045d7714f8
[  644.922129] target_xcopy_do_work: Final src_lba: 1, dst_lba: 1
[  644.928646] target_xcopy_do_work: Blocks copied: 1, Bytes Copied: 512
[  644.935840] target_xcopy_do_work: Setting X-COPY GOOD status -> sending 
response

For all intensive purposes this code is completely standalone, and the amount
of changes required to enable it's function within target-core code is small.

Please review as v3.12 material.

Thank you,

--nab

Nicholas Bellinger (9):
  target: Make target_core_subsystem defined as non static
  target: Make spc_parse_naa_6h_vendor_specific non static
  target: Make helpers non static for EXTENDED_COPY command setup
  target: Add global device list for EXTENDED_COPY
  target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
  target: Add support for EXTENDED_COPY copy offload emulation
  target: Enable EXTENDED_COPY setup in spc_parse_cdb
  target: Add Third Party Copy (3PC) bit in INQUIRY response
  target: Enable global EXTENDED_COPY setup/release

 

$22.5 Million

2013-08-22 Thread LEUNG CHEUNG



Hello,I have a mutual business for us worth $22.5 Million ,contact me for
details,e-mail at
mr.leungwche...@outlook.com

Mr Cheung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/4] bus: mvebu: add missing of_node_put() to fix reference leak

2013-08-22 Thread Jisheng Zhang
Add of_node_put to properly decrement the refcount when we are
done using a given node.

Signed-off-by: Jisheng Zhang 
---
 drivers/bus/mvebu-mbus.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 33c6947..20da90f 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -837,6 +837,7 @@ int __init mvebu_mbus_init(const char *soc, phys_addr_t 
mbuswins_phys_base,
 {
struct mvebu_mbus_state *mbus = _state;
const struct of_device_id *of_id;
+   struct device_node *np;
int win;
 
for (of_id = of_mvebu_mbus_ids; of_id->compatible; of_id++)
@@ -860,8 +861,11 @@ int __init mvebu_mbus_init(const char *soc, phys_addr_t 
mbuswins_phys_base,
return -ENOMEM;
}
 
-   if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
+   np = of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric");
+   if (np) {
mbus->hw_io_coherency = 1;
+   of_node_put(np);
+   }
 
for (win = 0; win < mbus->soc->num_wins; win++)
mvebu_mbus_disable_window(mbus, win);
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/4] clk: mvebu: add missing iounmap

2013-08-22 Thread Jisheng Zhang
Add missing iounmap to setup error path.

Change-Id: I4371569d14d7026aa9f90d7cd53f669d365fe26a
Signed-off-by: Jisheng Zhang 
---
 drivers/clk/mvebu/clk-cpu.c |  4 +++-
 drivers/clk/mvebu/common.c  | 18 --
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/mvebu/clk-cpu.c b/drivers/clk/mvebu/clk-cpu.c
index b0fbc07..1466865 100644
--- a/drivers/clk/mvebu/clk-cpu.c
+++ b/drivers/clk/mvebu/clk-cpu.c
@@ -119,7 +119,7 @@ void __init of_cpu_clk_setup(struct device_node *node)
 
cpuclk = kzalloc(ncpus * sizeof(*cpuclk), GFP_KERNEL);
if (WARN_ON(!cpuclk))
-   return;
+   goto cpuclk_out;
 
clks = kzalloc(ncpus * sizeof(*clks), GFP_KERNEL);
if (WARN_ON(!clks))
@@ -170,6 +170,8 @@ bail_out:
kfree(cpuclk[ncpus].clk_name);
 clks_out:
kfree(cpuclk);
+cpuclk_out:
+   iounmap(clock_complex_base);
 }
 
 CLK_OF_DECLARE(armada_xp_cpu_clock, "marvell,armada-xp-cpu-clock",
diff --git a/drivers/clk/mvebu/common.c b/drivers/clk/mvebu/common.c
index adaa4a1..25ceccf 100644
--- a/drivers/clk/mvebu/common.c
+++ b/drivers/clk/mvebu/common.c
@@ -45,8 +45,10 @@ void __init mvebu_coreclk_setup(struct device_node *np,
clk_data.clk_num = 2 + desc->num_ratios;
clk_data.clks = kzalloc(clk_data.clk_num * sizeof(struct clk *),
GFP_KERNEL);
-   if (WARN_ON(!clk_data.clks))
+   if (WARN_ON(!clk_data.clks)) {
+   iounmap(base);
return;
+   }
 
/* Register TCLK */
of_property_read_string_index(np, "clock-output-names", 0,
@@ -134,7 +136,7 @@ void __init mvebu_clk_gating_setup(struct device_node *np,
 
ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
if (WARN_ON(!ctrl))
-   return;
+   goto ctrl_out;
 
spin_lock_init(>lock);
 
@@ -145,10 +147,8 @@ void __init mvebu_clk_gating_setup(struct device_node *np,
ctrl->num_gates = n;
ctrl->gates = kzalloc(ctrl->num_gates * sizeof(struct clk *),
  GFP_KERNEL);
-   if (WARN_ON(!ctrl->gates)) {
-   kfree(ctrl);
-   return;
-   }
+   if (WARN_ON(!ctrl->gates))
+   goto gates_out;
 
for (n = 0; n < ctrl->num_gates; n++) {
const char *parent =
@@ -160,4 +160,10 @@ void __init mvebu_clk_gating_setup(struct device_node *np,
}
 
of_clk_add_provider(np, clk_gating_get_src, ctrl);
+
+   return;
+gates_out:
+   kfree(ctrl);
+ctrl_out:
+   iounmap(base);
 }
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/4] arm: mvebu: add missing of_node_put() to fix reference leak

2013-08-22 Thread Jisheng Zhang
Add of_node_put to properly decrement the refcount when we are
done using a given node.

Signed-off-by: Jisheng Zhang 
---
 arch/arm/mach-mvebu/armada-370-xp.c | 1 +
 arch/arm/mach-mvebu/coherency.c | 8 +++-
 arch/arm/mach-mvebu/platsmp.c   | 1 +
 arch/arm/mach-mvebu/pmsu.c  | 1 +
 arch/arm/mach-mvebu/system-controller.c | 1 +
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-mvebu/armada-370-xp.c 
b/arch/arm/mach-mvebu/armada-370-xp.c
index 97cbb80..8a1ae83 100644
--- a/arch/arm/mach-mvebu/armada-370-xp.c
+++ b/arch/arm/mach-mvebu/armada-370-xp.c
@@ -64,6 +64,7 @@ static void __init armada_370_xp_mbus_init(void)
ARMADA_370_XP_MBUS_WINS_SIZE,
of_translate_address(dn, _wins_offs),
ARMADA_370_XP_SDRAM_WINS_SIZE);
+   of_node_put(dn);
 }
 
 static void __init armada_370_xp_timer_and_clk_init(void)
diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 4c24303..58adf2f 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -140,6 +140,7 @@ int __init coherency_init(void)
coherency_base = of_iomap(np, 0);
coherency_cpu_base = of_iomap(np, 1);
set_cpu_coherent(cpu_logical_map(smp_processor_id()), 0);
+   of_node_put(np);
}
 
return 0;
@@ -147,9 +148,14 @@ int __init coherency_init(void)
 
 static int __init coherency_late_init(void)
 {
-   if (of_find_matching_node(NULL, of_coherency_table))
+   struct device_node *np;
+
+   np = of_find_matching_node(NULL, of_coherency_table);
+   if (np) {
bus_register_notifier(_bus_type,
  _hwcc_platform_nb);
+   of_node_put(np);
+   }
return 0;
 }
 
diff --git a/arch/arm/mach-mvebu/platsmp.c b/arch/arm/mach-mvebu/platsmp.c
index ce81d30..e7edb82 100644
--- a/arch/arm/mach-mvebu/platsmp.c
+++ b/arch/arm/mach-mvebu/platsmp.c
@@ -95,6 +95,7 @@ static void __init armada_xp_smp_init_cpus(void)
panic("No 'cpus' node found\n");
 
ncores = of_get_child_count(np);
+   of_node_put(np);
if (ncores == 0 || ncores > ARMADA_XP_MAX_CPUS)
panic("Invalid number of CPUs in DT\n");
 
diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c
index 3cc4bef..27fc4f0 100644
--- a/arch/arm/mach-mvebu/pmsu.c
+++ b/arch/arm/mach-mvebu/pmsu.c
@@ -67,6 +67,7 @@ int __init armada_370_xp_pmsu_init(void)
pr_info("Initializing Power Management Service Unit\n");
pmsu_mp_base = of_iomap(np, 0);
pmsu_reset_base = of_iomap(np, 1);
+   of_node_put(np);
}
 
return 0;
diff --git a/arch/arm/mach-mvebu/system-controller.c 
b/arch/arm/mach-mvebu/system-controller.c
index f875124..5175083c 100644
--- a/arch/arm/mach-mvebu/system-controller.c
+++ b/arch/arm/mach-mvebu/system-controller.c
@@ -98,6 +98,7 @@ static int __init mvebu_system_controller_init(void)
BUG_ON(!match);
system_controller_base = of_iomap(np, 0);
mvebu_sc = (struct mvebu_system_controller *)match->data;
+   of_node_put(np);
}
 
return 0;
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [BUGFIX] drivers/base: fix show_mem_removable section count

2013-08-22 Thread Russ Anderson
"cat /sys/devices/system/memory/memory*/removable" crashed the system.

The problem is that show_mem_removable() is passing a
bad pfn to is_mem_section_removable(), which causes
if (!node_online(page_to_nid(page))) to blow up.
Why is it passing in a bad pfn?

show_mem_removable() will loop sections_per_block times.
sections_per_block is 16, but mem->section_count is 8
for this memory block.  Changing to loop the actual number
of sections (mem->section_count) fixes the problem.
The assumption that all memory blocks will have the same
sections_per_block is not always true.

I suspect other usages of sections_per_block will also
need to be fixed.

Signed-off-by: Russ Anderson 


The failing output:
---
harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
0
1
1
1
1
1
1
1
1
1
1
1
1
1
[  372.78] BUG: unable to handle kernel paging request at ea00c320
[  372.119230] IP: [] is_pageblock_removable_nolock+0x1/0x90
[  372.127022] PGD 83ffd4067 PUD 37bdfce067 PMD 0
[  372.132109] Oops:  [#1] SMP
[  372.135730] Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm 
ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad 
iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core 
fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid 
hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 
ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core 
dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore 
usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) 
xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih 
scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix 
ahci libahci libata scsi_mod
[  372.213536] CPU: 4 PID: 5991 Comm: cat Tainted: G   O 
3.11.0-rc5-rja-uv+ #10
[  372.222173] Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series 
BIOS 01/15/2013
[  372.231391] task: 88081f034580 ti: 880820022000 task.ti: 
880820022000
[  372.239737] RIP: 0010:[]  [] 
is_pageblock_removable_nolock+0x1/0x90
[  372.250229] RSP: 0018:880820023df8  EFLAGS: 00010287
[  372.256151] RAX: 0004 RBX: ea00c320 RCX: 0004
[  372.264111] RDX: ea00c30b RSI: 001c RDI: ea00c320
[  372.272071] RBP: 880820023e38 R08:  R09: 0001
[  372.280030] R10:  R11: 0001 R12: ea00c33c
[  372.287987] R13: 1600 R14: 6db6db6db6db6db7 R15: 0001
[  372.295945] FS:  77fb2700() GS:88083fc8() 
knlGS:
[  372.304970] CS:  0010 DS:  ES:  CR0: 80050033
[  372.311378] CR2: ea00c320 CR3: 00081b954000 CR4: 000407e0
[  372.319335] Stack:
[  372.321575]  880820023e38 81161e94 81d9e940 
0009
[  372.329872]   8817bb97b800 88081e928000 
8817bb97b870
[  372.338167]  880820023e68 813730d1 fffb 
81a97600
[  372.346463] Call Trace:
[  372.349201]  [] ? is_mem_section_removable+0x84/0x110
[  372.356579]  [] show_mem_removable+0x41/0x70
[  372.363094]  [] dev_attr_show+0x2a/0x60
[  372.369122]  [] sysfs_read_file+0xf7/0x1c0
[  372.375441]  [] vfs_read+0xc8/0x130
[  372.381076]  [] SyS_read+0x5d/0xa0
[  372.386624]  [] system_call_fastpath+0x16/0x1b
[  372.393313] Code: 01 00 00 00 e9 3c ff ff ff 90 0f b6 4a 30 44 89 d8 d3 e0 
89 c1 83 e9 01 48 63 c9 49 01 c8 eb 92 66 2e 0f 1f 84 00 00 00 00 00 55 <48> 8b 
0f 49 89 f8 48 89 e5 48 89 ca 48 c1 ea 36 0f a3 15 d8 2f
[  372.415032] RIP  [] is_pageblock_removable_nolock+0x1/0x90
[  372.422905]  RSP 
[  372.426792] CR2: ea00c320
-


---
 drivers/base/memory.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/drivers/base/memory.c
===
--- linux.orig/drivers/base/memory.c2013-08-22 21:16:03.477826999 -0500
+++ linux/drivers/base/memory.c 2013-08-22 21:22:38.885478035 -0500
@@ -140,7 +140,7 @@ static ssize_t show_mem_removable(struct
struct memory_block *mem =
container_of(dev, struct memory_block, dev);
 
-   for (i = 0; i < sections_per_block; i++) {
+   for (i = 0; i < mem->section_count; i++) {
pfn = section_nr_to_pfn(mem->start_section_nr + i);
ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
}
-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/4] arm: mvebu: fix resource leak

2013-08-22 Thread Jisheng Zhang
These patches try to fix resource leak by adding missing of_node_put(),
iounmap or using devm_ioremap_resource() if available.

v2:
  - use devm_ioremap_resource() as suggested by Ezequiel Garcia
  - use gates_out instead of bail_out as suggested by Mike Turquette

Jisheng Zhang (4):
  arm: mvebu: add missing of_node_put() to fix reference leak
  bus: mvebu: add missing of_node_put() to fix reference leak
  clk: mvebu: add missing iounmap
  pinctrl: mvebu: Convert to use devm_ioremap_resource

 arch/arm/mach-mvebu/armada-370-xp.c |  1 +
 arch/arm/mach-mvebu/coherency.c |  8 +++-
 arch/arm/mach-mvebu/platsmp.c   |  1 +
 arch/arm/mach-mvebu/pmsu.c  |  1 +
 arch/arm/mach-mvebu/system-controller.c |  1 +
 drivers/bus/mvebu-mbus.c|  6 +-
 drivers/clk/mvebu/clk-cpu.c |  4 +++-
 drivers/clk/mvebu/common.c  | 18 --
 drivers/pinctrl/mvebu/pinctrl-mvebu.c   | 11 +--
 9 files changed, 36 insertions(+), 15 deletions(-)

-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/4] pinctrl: mvebu: Convert to use devm_ioremap_resource

2013-08-22 Thread Jisheng Zhang
Signed-off-by: Jisheng Zhang 
---
 drivers/pinctrl/mvebu/pinctrl-mvebu.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/mvebu/pinctrl-mvebu.c 
b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
index bb7ddb1..1caa45f 100644
--- a/drivers/pinctrl/mvebu/pinctrl-mvebu.c
+++ b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
@@ -579,7 +579,7 @@ static int mvebu_pinctrl_build_functions(struct 
platform_device *pdev,
 int mvebu_pinctrl_probe(struct platform_device *pdev)
 {
struct mvebu_pinctrl_soc_info *soc = dev_get_platdata(>dev);
-   struct device_node *np = pdev->dev.of_node;
+   struct resource *res;
struct mvebu_pinctrl *pctl;
void __iomem *base;
struct pinctrl_pin_desc *pdesc;
@@ -591,11 +591,10 @@ int mvebu_pinctrl_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   base = of_iomap(np, 0);
-   if (!base) {
-   dev_err(>dev, "unable to get base address\n");
-   return -ENODEV;
-   }
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   base = devm_ioremap_resource(>dev, res);
+   if (IS_ERR(base))
+   return PTR_ERR(base);
 
pctl = devm_kzalloc(>dev, sizeof(struct mvebu_pinctrl),
GFP_KERNEL);
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -next] drm/rcar-du: fix return value check in rcar_du_lvdsenc_get_resources()

2013-08-22 Thread Wei Yongjun
From: Wei Yongjun 

In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR(). Also remove the dev_err call to avoid redundant
error message.

Signed-off-by: Wei Yongjun 
---
 drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.c 
b/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.c
index a0f6a17..f59cbc4 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.c
@@ -151,11 +151,8 @@ static int rcar_du_lvdsenc_get_resources(struct 
rcar_du_lvdsenc *lvds,
}
 
lvds->mmio = devm_ioremap_resource(>dev, mem);
-   if (lvds->mmio == NULL) {
-   dev_err(>dev, "failed to remap memory resource for %s\n",
-   name);
-   return -ENOMEM;
-   }
+   if (IS_ERR(lvds->mmio))
+   return PTR_ERR(lvds->mmio);
 
lvds->clock = devm_clk_get(>dev, name);
if (IS_ERR(lvds->clock)) {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iommu: WARN_ON when removing a device with no iommu_group associated

2013-08-22 Thread Wei Yang
When removing a device from the system, iommu_group driver will try to
disconnect it from its group. While in some cases, one device may not
associated with any iommu_group. For example, not enough DMA address space.

In the generic bus notification, it will check dev->iommu_group before calling
iommu_group_remove_device(). While in some cases, developers may call
iommu_group_remove_device() in a different code path and without check. For
those devices with dev->iommu_group set to NULL, kernel will crash.

This patch gives a warning and return when trying to remove a device from an
iommu_group with dev->iommu_group set to NULL. This helps to indicate some bad
behavior and also guard the kernel.

Signed-off-by: Wei Yang 
---
 drivers/iommu/iommu.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fbe9ca7..43396f0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -379,6 +379,9 @@ void iommu_group_remove_device(struct device *dev)
struct iommu_group *group = dev->iommu_group;
struct iommu_device *tmp_device, *device = NULL;
 
+   if (WARN_ON(!group))
+   return;
+
/* Pre-notify listeners that a device is being removed. */
blocking_notifier_call_chain(>notifier,
 IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] powerpc/iommu: check dev->iommu_group before remove a device from iommu_group

2013-08-22 Thread Wei Yang
On Thu, Aug 22, 2013 at 10:17:20AM -0600, Alex Williamson wrote:
>On Thu, 2013-08-22 at 23:41 +0800, Wei Yang wrote:
>> >> 
>> >> Alex,
>> >> 
>> >> Sorry for not including you in the very beginning, which may spend you 
>> >> more
>> >> efforts to track previous mails in this thread.
>> >> 
>> >> Do you think it is reasonable to check the dev->iommu_group in
>> >> iommu_group_remove_device()? Or we can count on the bus notifier to check 
>> >> it?
>> >> 
>> >> Welcome your suggestions~
>> >
>> >I don't really see the point of patch 1/2. iommu_group_remove_device()
>> >is specifically to remove a device from an iommu_group, so why would you
>> >call it on a device that's not part of an iommu_group.  If you want to
>> >avoid testing dev->iommu_group, then implement the .remove_device
>> >callback rather than using the notifier.  Thanks,
>> >
>> 
>> You mean the .remove_device like intel_iommu_remove_device()? 
>> 
>> Hmm... this function didn't check the dev->iommu_group and just call
>> iommu_group_remove_device(). I see this guard is put in 
>> iommu_bus_notifier(), 
>> which will check dev->iommu_group before invoke .remove_device.
>> 
>> Let me explain the case to triger the problem a little. 
>> 
>> On some platform, like powernv, we implement another bus notifier when 
>> devices
>> are added or removed in the system. Like Alexey mentioned, he missed the 
>> check
>> for dev->iommu_group in the notifier before removing it from iommu_group. 
>> This
>> trigger the crash.
>> 
>> So do you think it is reasonable to guard the kernel in
>> iommu_group_remove_device(), or we give the platform developers the
>> responsibility to check the dev->iommu_group before calling it?
>
>I don't see it as we need either patch 1/2 or patch 2/2.  We absolutely
>need some form of patch 2/2.  Patch 1/2 isn't necessarily bad, but it
>facilitates sloppy usage.  The iommu driver shouldn't be calling
>iommu_group_remove_device() on arbitrary devices that may or may not be
>part of an iommu_group.  Perhaps patch 1/2 should be:
>
>if (WARN_ON(!group))
>   return;
>

Agree, this one sounds more reasonable. :-)

Since patch 2/2 is merged by Alexey, I will re-send patch 1/2 alone.

Thanks for your comments ~

>Thanks,
>
>Alex
>
>___
>Linuxppc-dev mailing list
>linuxppc-...@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

-- 
Richard Yang
Help you, Help me

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] DMA: let filter functions of of_dma_simple_xlate possible check of_node

2013-08-22 Thread Richard Zhao
On Fri, Aug 23, 2013 at 04:18:27AM +0800, Stephen Warren wrote:
> On 08/21/2013 11:19 PM, Richard Zhao wrote:
> > On Fri, Aug 02, 2013 at 10:00:00AM +0800, Richard Zhao wrote:
> >> pass of_phandle_args dma_spec to dma_request_channel in 
> >> of_dma_simple_xlate,
> >> so the filter function could access of_node in of_phandle_args.
> >>
> >> It also remove restriction of #dma-cells has to be one.
> >>
> >> Signed-off-by: Richard Zhao 
> >> ---
> >>  drivers/dma/edma.c |  7 +--
> >>  drivers/dma/of-dma.c   | 10 --
> >>  drivers/dma/omap-dma.c |  6 --
> >>  3 files changed, 13 insertions(+), 10 deletions(-)
> >>
> > 
> > Hi Vinod,
> > 
> > Can you please pick up this change?
> > 
> > Hi Stephen,
> > 
> > Can you please give a ack or reviewed-by etc?
> 
> Hmm. Looking at the patch, I'm not sure it's right.
> 
> This patch simply passes all the specfier args to the filter function,
> and the code to check the equality of the of_node to the filter args is
> still duplicated in each DMA driver. Instead, the DMA core should be
> implementing the equality check, and only even calling the
> driver-specific filter function for devices where the client's phandle
> matches the DMA providing device's of_node handle.

Filter function is called in dmaengine core code, independent of dt.
And the reason why the driver has to write its own filter function is
it has to store slave id there in its own way.

Thanks
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] extcon: arizona: Fix up minor coding style to remove unnecessary braces

2013-08-22 Thread Chanwoo Choi
This fixes up braces coding style issue by using checkpatch script.

Cc: Charles Keepax 
Cc: Mark Brown 
Signed-off-by: Chanwoo Choi 
---
 drivers/extcon/extcon-arizona.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/extcon/extcon-arizona.c b/drivers/extcon/extcon-arizona.c
index 08c4590..72fc28e 100644
--- a/drivers/extcon/extcon-arizona.c
+++ b/drivers/extcon/extcon-arizona.c
@@ -564,11 +564,10 @@ static irqreturn_t arizona_hpdet_irq(int irq, void *data)
}
 
ret = arizona_hpdet_read(info);
-   if (ret == -EAGAIN) {
+   if (ret == -EAGAIN)
goto out;
-   } else if (ret < 0) {
+   else if (ret < 0)
goto done;
-   }
reading = ret;
 
/* Reset back to starting range */
@@ -578,11 +577,10 @@ static irqreturn_t arizona_hpdet_irq(int irq, void *data)
   0);
 
ret = arizona_hpdet_do_id(info, , );
-   if (ret == -EAGAIN) {
+   if (ret == -EAGAIN)
goto out;
-   } else if (ret < 0) {
+   else if (ret < 0)
goto done;
-   }
 
/* Report high impedence cables as line outputs */
if (reading >= 5000)
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] extcon: class: Remove unnecessary extern declaration

2013-08-22 Thread Chanwoo Choi
This patch remove unnecessary extern declaration (extcon_set_state).
checkpatch found this coding style issue.

Signed-off-by: Chanwoo Choi 
Signed-off-by: Myungjoo Ham 
---
 drivers/extcon/extcon-class.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/extcon/extcon-class.c b/drivers/extcon/extcon-class.c
index 7704a3d..b8589cc 100644
--- a/drivers/extcon/extcon-class.c
+++ b/drivers/extcon/extcon-class.c
@@ -129,7 +129,6 @@ static ssize_t state_show(struct device *dev, struct 
device_attribute *attr,
return count;
 }
 
-int extcon_set_state(struct extcon_dev *edev, u32 state);
 static ssize_t state_store(struct device *dev, struct device_attribute *attr,
   const char *buf, size_t count)
 {
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] extcon: Fix up 80 column coding style issues

2013-08-22 Thread Chanwoo Choi
This patch fix 80 column coding sytle issues by using checkpatch script.

Cc: Charles Keepax 
Cc: Mark Brown 
Signed-off-by: Chanwoo Choi 
Signed-off-by: Myungjoo Ham 
---
 drivers/extcon/extcon-arizona.c  |  25 +---
 drivers/extcon/extcon-class.c|   6 +-
 drivers/extcon/extcon-max77693.c | 129 +--
 drivers/extcon/extcon-max8997.c  |   6 +-
 4 files changed, 94 insertions(+), 72 deletions(-)

diff --git a/drivers/extcon/extcon-arizona.c b/drivers/extcon/extcon-arizona.c
index 2064eac..08c4590 100644
--- a/drivers/extcon/extcon-arizona.c
+++ b/drivers/extcon/extcon-arizona.c
@@ -738,8 +738,8 @@ err:
 static void arizona_micd_timeout_work(struct work_struct *work)
 {
struct arizona_extcon_info *info = container_of(work,
-   struct 
arizona_extcon_info,
-   micd_timeout_work.work);
+   struct arizona_extcon_info,
+   micd_timeout_work.work);
 
mutex_lock(>lock);
 
@@ -756,8 +756,8 @@ static void arizona_micd_timeout_work(struct work_struct 
*work)
 static void arizona_micd_detect(struct work_struct *work)
 {
struct arizona_extcon_info *info = container_of(work,
-   struct 
arizona_extcon_info,
-   micd_detect_work.work);
+   struct arizona_extcon_info,
+   micd_detect_work.work);
struct arizona *arizona = info->arizona;
unsigned int val = 0, lvl;
int ret, i, key;
@@ -769,7 +769,8 @@ static void arizona_micd_detect(struct work_struct *work)
for (i = 0; i < 10 && !(val & 0x7fc); i++) {
ret = regmap_read(arizona->regmap, ARIZONA_MIC_DETECT_3, );
if (ret != 0) {
-   dev_err(arizona->dev, "Failed to read MICDET: %d\n", 
ret);
+   dev_err(arizona->dev,
+   "Failed to read MICDET: %d\n", ret);
mutex_unlock(>lock);
return;
}
@@ -777,7 +778,8 @@ static void arizona_micd_detect(struct work_struct *work)
dev_dbg(arizona->dev, "MICDET: %x\n", val);
 
if (!(val & ARIZONA_MICD_VALID)) {
-   dev_warn(arizona->dev, "Microphone detection state 
invalid\n");
+   dev_warn(arizona->dev,
+"Microphone detection state invalid\n");
mutex_unlock(>lock);
return;
}
@@ -925,8 +927,8 @@ static irqreturn_t arizona_micdet(int irq, void *data)
 static void arizona_hpdet_work(struct work_struct *work)
 {
struct arizona_extcon_info *info = container_of(work,
-   struct 
arizona_extcon_info,
-   hpdet_work.work);
+   struct arizona_extcon_info,
+   hpdet_work.work);
 
mutex_lock(>lock);
arizona_start_hpdet_acc_id(info);
@@ -973,10 +975,13 @@ static irqreturn_t arizona_jackdet(int irq, void *data)
   >hpdet_work,
   msecs_to_jiffies(HPDET_DEBOUNCE));
 
-   if (cancelled_mic)
+   if (cancelled_mic) {
+   int micd_timeout = info->micd_timeout;
+
queue_delayed_work(system_power_efficient_wq,
   >micd_timeout_work,
-  
msecs_to_jiffies(info->micd_timeout));
+  msecs_to_jiffies(micd_timeout));
+   }
 
goto out;
}
diff --git a/drivers/extcon/extcon-class.c b/drivers/extcon/extcon-class.c
index 1446152..7704a3d 100644
--- a/drivers/extcon/extcon-class.c
+++ b/drivers/extcon/extcon-class.c
@@ -450,7 +450,8 @@ int extcon_register_interest(struct 
extcon_specific_cable_nb *obj,
if (!obj->edev)
return -ENODEV;
 
-   obj->cable_index = extcon_find_cable_index(obj->edev, 
cable_name);
+   obj->cable_index = extcon_find_cable_index(obj->edev,
+ cable_name);
if (obj->cable_index < 0)
return obj->cable_index;
 
@@ -458,7 +459,8 @@ int extcon_register_interest(struct 
extcon_specific_cable_nb *obj,
 
obj->internal_nb.notifier_call = _call_per_cable;
 
-   return raw_notifier_chain_register(>edev->nh, 
>internal_nb);
+   return 

Re: [PATCH v2] DMA: add help function to check whether dma controller registered

2013-08-22 Thread Richard Zhao
On Fri, Aug 23, 2013 at 04:36:53AM +0800, Stephen Warren wrote:
> On 08/22/2013 12:43 AM, Richard Zhao wrote:
> > DMA client device driver usually needs to know at probe time whether
> > dma controller has been registered to deffer probe. So add a help
> > function of_dma_check_controller.
> > 
> > DMA request channel functions can also used to check it, but they
> > are usually called at open() time.
> 
> This new function is almost identical to the existing
> of_dma_request_slave_channel(). Surely the code should be shared?
ofdma->of_dma_xlate(_spec, ofdma);
The above is called holding of_dma_lock. If I want to abstract the
common lines, there' two options.

Option 1:
static struct of_dma* of_dma_check_controller_locked(np, name)
{
parameter check
get dma-names count and check return value
for loop to get of_dma
return PTR_ERR(err) or of_dma
}

struct dma_chan *of_dma_request_slave_channel(struct device_node *np,
  const char *name)
{
chan = null;
mutex_lock(_dma_lock);
of_dma = of_dma_check_controller_locked(np, name)
if(!IS_ERR(of_dma))
chan = ofdma->of_dma_xlate(_spec, ofdma);
mutex_unlock(_dma_lock);
return chan;
}

int of_dma_check_controller(struct device *dev, const char *name)
{
mutex_lock(_dma_lock);
ofdma = of_dma_check_controller_locked(dev->of_node, name);
mutex_unlock(_dma_lock);
if (IS_ERR(ofdma))
return ERR_PTR(ofdma);
else
return 0;
}

Option 2:
static struct of_dma* of_dma_check_controller_getlock(np, name)
{
parameter check
get dma-names count and check return value
for loop to get of_dma, get lock at old place
if failed, unlock.
return PTR_ERR(err) or of_dma
}

struct dma_chan *of_dma_request_slave_channel(struct device_node *np,
  const char *name)
{
}   of_dma = of_dma_check_controller_getlock(np, name)
if(!IS_ERR(of_dma)) {
chan = ofdma->of_dma_xlate(_spec, ofdma);
unlock;
}
return chan;
}

int of_dma_check_controller(struct device *dev, const char *name)
ofdma = of_dma_check_controller_locked(dev->of_node, name);

if (IS_ERR(ofdma)) {
return ERR_PTR(ofdma);
} else {
unlock;
return 0;
}
}

> But that said, I don't see any need for a new function; why can't
> drivers simply call of_dma_request_slave_channel() at probe time;
It'll mislead user. channel supposed to be request at open time.

> from
> what I can see, that function doesn't actually request the channel, but
> rather simply looks it up, just like this one. The only difference is
> that of_dma_xlate() is also called, but that's just doing some data
> transformation, not actually recording channel ownership.
xlate function request the channel if things go well.

Thanks
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/5] Rework mtime and ctime updates on mmaped writes

2013-08-22 Thread Andy Lutomirski
On 08/22/2013 05:03 PM, Andy Lutomirski wrote:
> Writes via mmap currently update mtime and ctime in ->page_mkwrite.

The subject should be [PATCH v4 0.7]...  Sorry for the cut-and-pasteo.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer

2013-08-22 Thread Steven Rostedt
On Fri, 23 Aug 2013 07:57:15 +0800
"zhangwei(Jovi)"  wrote:


> > 
> > What about creating a per cpu buffer when uprobes are registered, and
> > delete them when they are finished? Basically what trace_printk() does
> > if it detects that there are users of trace_printk() in the kernel.
> > Note, it does not deallocate them when finished, as it is never
> > finished until reboot ;-)
> > 
> > -- Steve
> >
> I also thought out this approach, but the issue is we cannot fetch user
> memory into per-cpu buffer, because use per-cpu buffer should under
> preempt disabled, and fetching user memory could sleep.

Actually, we could create a per_cpu mutex to match the per_cpu buffers.
This is not unlike what we do in -rt.

int cpu;
struct mutex *mutex;
void *buf;


/*
 * Use per cpu buffers for fastest access, but we might migrate
 * So the mutex makes sure we have sole access to it.
 */

cpu = raw_smp_processor_id();
mutex = per_cpu(uprobe_cpu_mutex, cpu);
buf = per_cpu(uprobe_cpu_buffer, cpu);

mutex_lock(mutex);
store_trace_args(..., buf,...);
mutex_unlock(mutex);

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)

2013-08-22 Thread Al Viro
On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote:
> On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski  wrote:
> >
> > Sure.  But aren't they always last?
> 
> What do you mean? I'd say that the /proc lookup is always *innermost*.
> Which means that it certainly cannot bail out, since there are many
> levels of nesting outside of it.
> 
> > With the current code structure, trying to enforce some kind of
> > security restriction in the middle of lookup seems really unpleasant.
> 
> If it's conditional (ie "linkat behaves differently from openat"), it
> certainly means that we'd have to pass in that info in annoying ways.

Nope.  All we need to pass is one more LOOKUP_...  Add
if (unlikely(nd->last_type == LAST_BIND)) {
if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) {
terminate_walk(nd);
return -EINVAL;
}
}
in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when
linkat() calls user_path_at().  That will affect *only* the terminal
symlinks and cost nothing in all normal cases.  The same check can
bloody well go into path_init() - take
if (*name) {
if (!can_lookup(dentry->d_inode)) {
fdput(f);
return -ENOTDIR;
}
}
in there and slap
else {
if ((flags & LOOKUP_BLAH) && !may_flink(...)) {
fdput(f);
return -EINVAL;
}
}
after it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dm: allow error target to replace either bio-based and request-based targets

2013-08-22 Thread Joe Jin
On 08/23/13 08:17, Mike Snitzer wrote:
> Here is a patch that should work for your needs (I tested it to work
> with 'dmsetup wipe_table' on both request-based and bio-based devices):

This really what I looking for, thanks!

> 
> From: Mike Snitzer 
> Date: Thu, 22 Aug 2013 18:21:38 -0400
> Subject: [PATCH] dm: allow error target to replace either bio-based and 
> request-based targets
> 
> In may be useful to switch a request-based table to the "error" target.
> Enhance the DM core to allow a single hybrid target to be capable of
> handling either bios or requests.
> 
> Add a request-based (.map_rq) member to the error target_type and train
> dm_table_set_type() to prefer the md's established type (request-based
> or bio-based).  If the md doesn't have an established type default to
> making the hybrid target bio-based.

Signed-off-by: Joe Jin 

Thanks,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 17/17] clk: zynq: remove call to of_clk_init

2013-08-22 Thread Sören Brinkmann
On Thu, Aug 22, 2013 at 05:26:47PM -0700, Sören Brinkmann wrote:
> Hi Sebastian,
> 
> On Tue, Aug 20, 2013 at 04:04:31AM +0200, Sebastian Hesselbarth wrote:
> > With arch/arm calling of_clk_init(NULL) from time_init(), we can now
> > remove it from corresponding drivers/clk code.
> 
> I think that would break Zynq.
> If I see this correctly you call of_clk_init() from common code,
> _before_ the SOC specific time init function is called.
> The problem is, that we have code setting up a global pointer which is
> required by zynq_clk_setup() which is triggered when of_clk_init() is
> called.
> 
> Let me try to illustrate the current call graph:
> 
> time_init()
>   zynq_timer_init()   // this machines init_time()
>   zynq_slcr_init()// setup System Level Control Registers 
> including a global pointer
>   zynq_clock_init()
>   of_clk_init()
>   zynq_clk_setup()   // requires pointer 
> setup in zynq_slcr_init()
>   ...
> 
> IIUC, your series would change this to:
> time_init()
>   of_clk_init()
>   zynq_clk_setup()// SLCR pointer is not setup/NULL
>   ...
>   zynq_timer_init()
>   zynq_slcr_init()// now the pointer becomes valid

I guess we could move zynq_slcr_init() into init_irq(). I'll give that a
shot tomorrow.

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Finding out who's holding a lock?

2013-08-22 Thread Andy Lutomirski
My program is occasionally seeing slow page faults.  latencytop says
they're slow because they're waiting for read access to mmap_sem, but
latencytop isn't showing any other thread in the process blocking.

Is there any straightforward way to find out who called down_write on
mmap_sem when down_read is slow?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -next] f2fs: fix error return code in init_f2fs_fs()

2013-08-22 Thread Wei Yongjun
From: Wei Yongjun 

Fix to return -ENOMEM in the kset create and add error handling
case instead of 0, as done elsewhere in this function.

Introduced by commit b59d0bae6ca30c496f298881616258f9cde0d9c6.
(f2fs: add sysfs support for controlling the gc_thread)

Signed-off-by: Wei Yongjun 
---
 fs/f2fs/super.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 66d1ec1..33a809f 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1013,8 +1013,10 @@ static int __init init_f2fs_fs(void)
if (err)
goto fail;
f2fs_kset = kset_create_and_add("f2fs", NULL, fs_kobj);
-   if (!f2fs_kset)
+   if (!f2fs_kset) {
+   err = -ENOMEM;
goto fail;
+   }
err = register_filesystem(_fs_type);
if (err)
goto fail;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 17/17] clk: zynq: remove call to of_clk_init

2013-08-22 Thread Sören Brinkmann
Hi Sebastian,

On Tue, Aug 20, 2013 at 04:04:31AM +0200, Sebastian Hesselbarth wrote:
> With arch/arm calling of_clk_init(NULL) from time_init(), we can now
> remove it from corresponding drivers/clk code.

I think that would break Zynq.
If I see this correctly you call of_clk_init() from common code,
_before_ the SOC specific time init function is called.
The problem is, that we have code setting up a global pointer which is
required by zynq_clk_setup() which is triggered when of_clk_init() is
called.

Let me try to illustrate the current call graph:

time_init()
zynq_timer_init()   // this machines init_time()
zynq_slcr_init()// setup System Level Control Registers 
including a global pointer
zynq_clock_init()
of_clk_init()
zynq_clk_setup()   // requires pointer 
setup in zynq_slcr_init()
...

IIUC, your series would change this to:
time_init()
of_clk_init()
zynq_clk_setup()// SLCR pointer is not setup/NULL
...
zynq_timer_init()
zynq_slcr_init()// now the pointer becomes valid

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dm: allow error target to replace either bio-based and request-based targets

2013-08-22 Thread Mike Snitzer
On Thu, Aug 22 2013 at  4:19pm -0400,
Mike Snitzer  wrote:

> Hi Joe,
> 
> Unfortunately this isn't going to work properly.  Mikulas suggested a
> new "error-rq" target.
> 
> I do like the idea of a single error target that is hybrid (supports
> both bio-based and request-based) but the DM core would need to be
> updated to support this.
> 
> Specifically, we'd need to check if the device (and active table) is
> already bio-based or request-based and select the appropriate type.  If
> it is a new device, default to selecting bio-based.
> 
> There are some wrappers and other logic thoughout DM core that will need
> auditing too.

Here is a patch that should work for your needs (I tested it to work
with 'dmsetup wipe_table' on both request-based and bio-based devices):

From: Mike Snitzer 
Date: Thu, 22 Aug 2013 18:21:38 -0400
Subject: [PATCH] dm: allow error target to replace either bio-based and 
request-based targets

In may be useful to switch a request-based table to the "error" target.
Enhance the DM core to allow a single hybrid target to be capable of
handling either bios or requests.

Add a request-based (.map_rq) member to the error target_type and train
dm_table_set_type() to prefer the md's established type (request-based
or bio-based).  If the md doesn't have an established type default to
making the hybrid target bio-based.

Cc: Joe Jin 
Cc: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
---
 drivers/md/dm-table.c  |   18 +-
 drivers/md/dm-target.c |9 -
 drivers/md/dm.h|   11 +++
 3 files changed, 36 insertions(+), 2 deletions(-)

Index: linux/drivers/md/dm-table.c
===
--- linux.orig/drivers/md/dm-table.c
+++ linux/drivers/md/dm-table.c
@@ -864,10 +864,26 @@ static int dm_table_set_type(struct dm_t
struct dm_target *tgt;
struct dm_dev_internal *dd;
struct list_head *devices;
+   unsigned live_md_type;
+
+   dm_lock_md_type(t->md);
+   live_md_type = dm_get_md_type(t->md);
+   dm_unlock_md_type(t->md);
 
for (i = 0; i < t->num_targets; i++) {
tgt = t->targets + i;
-   if (dm_target_request_based(tgt))
+   if (dm_target_hybrid(tgt)) {
+   switch (live_md_type) {
+   case DM_TYPE_NONE:
+   case DM_TYPE_BIO_BASED:
+   bio_based = 1;
+   break;
+   case DM_TYPE_REQUEST_BASED:
+   request_based = 1;
+   break;
+   }
+   }
+   else if (dm_target_request_based(tgt))
request_based = 1;
else
bio_based = 1;
Index: linux/drivers/md/dm-target.c
===
--- linux.orig/drivers/md/dm-target.c
+++ linux/drivers/md/dm-target.c
@@ -131,12 +131,19 @@ static int io_err_map(struct dm_target *
return -EIO;
 }
 
+static int io_err_map_rq(struct dm_target *ti, struct request *clone,
+union map_info *map_context)
+{
+   return -EIO;
+}
+
 static struct target_type error_target = {
.name = "error",
-   .version = {1, 1, 0},
+   .version = {1, 2, 0},
.ctr  = io_err_ctr,
.dtr  = io_err_dtr,
.map  = io_err_map,
+   .map_rq = io_err_map_rq,
 };
 
 int __init dm_target_init(void)
Index: linux/drivers/md/dm.h
===
--- linux.orig/drivers/md/dm.h
+++ linux/drivers/md/dm.h
@@ -91,10 +91,21 @@ int dm_setup_md_queue(struct mapped_devi
 #define dm_target_is_valid(t) ((t)->table)
 
 /*
+ * To check whether the target type is bio-based or not (request-based).
+ */
+#define dm_target_bio_based(t) ((t)->type->map != NULL)
+
+/*
  * To check whether the target type is request-based or not (bio-based).
  */
 #define dm_target_request_based(t) ((t)->type->map_rq != NULL)
 
+/*
+ * To check whether the target type is a hybrid (capable of being
+ * either request-based or bio-based).
+ */
+#define dm_target_hybrid(t) (dm_target_bio_based(t) && 
dm_target_request_based(t))
+
 /*-
  * A registry of target types.
  *---*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/7] fs: Add inode_update_time_writable

2013-08-22 Thread Andy Lutomirski
This is like file_update_time, except that it acts on a struct inode *
instead of a struct file *.

Signed-off-by: Andy Lutomirski 
---
 fs/inode.c | 64 +-
 include/linux/fs.h |  1 +
 2 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index d6dfb09..2bbcb19 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1637,6 +1637,34 @@ int file_remove_suid(struct file *file)
 }
 EXPORT_SYMBOL(file_remove_suid);
 
+/*
+ * This does the work that's common to file_update_time and
+ * inode_update_time.
+ */
+static int prepare_update_cmtime(struct inode *inode, struct timespec *now)
+{
+   int sync_it;
+
+   /* First try to exhaust all avenues to not sync */
+   if (IS_NOCMTIME(inode))
+   return 0;
+
+   *now = current_fs_time(inode->i_sb);
+   if (!timespec_equal(>i_mtime, now))
+   sync_it = S_MTIME;
+
+   if (!timespec_equal(>i_ctime, now))
+   sync_it |= S_CTIME;
+
+   if (IS_I_VERSION(inode))
+   sync_it |= S_VERSION;
+
+   if (!sync_it)
+   return 0;
+
+   return sync_it;
+}
+
 /**
  * file_update_time-   update mtime and ctime time
  * @file: file accessed
@@ -1654,23 +1682,9 @@ int file_update_time(struct file *file)
 {
struct inode *inode = file_inode(file);
struct timespec now;
-   int sync_it = 0;
+   int sync_it = prepare_update_cmtime(inode, );
int ret;
 
-   /* First try to exhaust all avenues to not sync */
-   if (IS_NOCMTIME(inode))
-   return 0;
-
-   now = current_fs_time(inode->i_sb);
-   if (!timespec_equal(>i_mtime, ))
-   sync_it = S_MTIME;
-
-   if (!timespec_equal(>i_ctime, ))
-   sync_it |= S_CTIME;
-
-   if (IS_I_VERSION(inode))
-   sync_it |= S_VERSION;
-
if (!sync_it)
return 0;
 
@@ -1685,6 +1699,26 @@ int file_update_time(struct file *file)
 }
 EXPORT_SYMBOL(file_update_time);
 
+/**
+ * inode_update_time_writable  -   update mtime and ctime time
+ * @inode: inode accessed
+ *
+ * This is like file_update_time, but it assumes the mnt is
+ * writable and not frozen and takes an inode parameter instead.
+ */
+
+int inode_update_time_writable(struct inode *inode)
+{
+   struct timespec now;
+   int sync_it = prepare_update_cmtime(inode, );
+
+   if (!sync_it)
+   return 0;
+
+   return update_time(inode, , sync_it);
+}
+EXPORT_SYMBOL(inode_update_time_writable);
+
 int inode_needs_sync(struct inode *inode)
 {
if (IS_SYNC(inode))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9818747..86cf0a4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2590,6 +2590,7 @@ extern int inode_newsize_ok(const struct inode *, loff_t 
offset);
 extern void setattr_copy(struct inode *inode, const struct iattr *attr);
 
 extern int file_update_time(struct file *file);
+extern int inode_update_time_writable(struct inode *inode);
 
 extern int generic_show_options(struct seq_file *m, struct dentry *root);
 extern void save_mount_options(struct super_block *sb, char *options);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/5] Rework mtime and ctime updates on mmaped writes

2013-08-22 Thread Andy Lutomirski
Writes via mmap currently update mtime and ctime in ->page_mkwrite.
This hurts both throughput and latency.  In workloads that dirty a
large number of mmapped pages, ->page_mkwrite can be hot and
file_update_time is slow and scales poorly.  Updating timestamps can
also sleep, which hurts latency for real-time workloads.

This is also a correctness issue.  SuS says:

The st_ctime and st_mtime fields of a file that is mapped with
MAP_SHARED and PROT_WRITE, will be marked for update at some point
in the interval between a write reference to the mapped region and
the next call to msync() with MS_ASYNC or MS_SYNC for that portion
of the file by any process. If there is no such call, these fields
may be marked for update at any time after a write reference if
the underlying file is modified as a result.

Currently, if the same mmapped page is written twice, the timestamp
may not be update at all after the second write, whereas SuS (and
anything using timestamps to invalidate caches, backup data, etc.)
would expect the timestamp to eventually be updated.

This patchset attempts to fix both issues at once.  It adds a new
address_space flag AS_CMTIME that is set atomically whenever the
system transfers a pte dirty bit to a struct page backed by the
address_space.  This can happen with various locks held and when low
on memory.

Later on, a_ops.update_cmtime_deferred is called to tell the FS to
update cmtime due to a previous mmapped write.

The core changes have no effect on unmodified filesystems.  To opt in,
a filesystem should implement .update_cmtime_deferred (most likely by
using generic_update_cmtime_deferred) and must call either
mapping_flush_cmtime or mapping_test_clear_cmtime in .writepages.
Filesystems should avoid updating timestamps in ->page_mkwrite.

The reason that this is not completely automatic is that filesystems
without backing stores do not really fit in to this model.
Eventually, someone can add support.

I've converted ext4, xfs, and btrfs.  Converting most other
filesystems should be straightforward.

I wrote an xfstest for this.  ext4, xfs, and btrfs pass.  It's here:

https://github.com/amluto/xfstests/commit/5fbb72ac799cc44a9c4c6d3919f00a479202c899

This series is pullable from:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=mmap_mtime/patch_v4

Changes from v3:
 - The new address space op is now called update_cmtime_deferred.
   Callers take care of protection from fs freezing and checking
   AS_CMTIME.  I fixed a deadlock in the freezer interaction.
 - Block plugs should be handled better.
 - Fixed an infinite loop in msync(MS_ASYNC).
 - Converted xfs and btrfs.
 - Misc minor cleanups.
 - Fixed a corner case: reclaim or migration could have cleaned all
   pages without updating cmtime.

Changes from v2:
 - The core code now interacts with filesystems only through
   address_space ops, so there should be fewer layering issues.
 - MS_ASYNC is handled correctly.

Changes from v1:
 - inode_update_time_writable now locks against the fs freezer.
 - Minor cleanups.
 - Major changelog improvements.

Andy Lutomirski (7):
  mm: Track mappings that have been written via ptes
  fs: Add inode_update_time_writable
  mm: Allow filesystems to defer cmtime updates
  mm: Scan for dirty ptes and update cmtime on MS_ASYNC
  ext4: Defer mmap cmtime updates
  btrfs: Defer mmap cmtime updates
  xfs: Defer mmap cmtime updates

 fs/btrfs/extent_io.c  |  1 +
 fs/btrfs/inode.c  | 32 +-
 fs/buffer.c   |  7 
 fs/ext4/inode.c   | 11 +--
 fs/inode.c| 64 +++-
 fs/xfs/xfs_aops.c |  1 +
 include/linux/fs.h|  9 +
 include/linux/pagemap.h   | 22 +
 include/linux/writeback.h |  1 +
 mm/memory.c   |  7 +++-
 mm/migrate.c  |  2 ++
 mm/mmap.c |  6 +++-
 mm/msync.c| 84 ---
 mm/page-writeback.c   | 53 +-
 mm/rmap.c | 27 +--
 mm/vmscan.c   |  1 +
 16 files changed, 272 insertions(+), 56 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/7] mm: Track mappings that have been written via ptes

2013-08-22 Thread Andy Lutomirski
This will allow the mm code to figure out when a file has been
changed through a writable mmap.  Future changes will use this
information to update the file timestamp after writes.

This is handled in core mm code for two reasons:

1. Performance.  Setting a bit directly is faster than an indirect
   call to a vma op.

2. Simplicity.  The cmtime bit is set with lots of mm locks held.
   Rather than making filesystems add a new vm operation that needs
   to be aware of locking, it's easier to just get it right in core
   code.

Signed-off-by: Andy Lutomirski 
---
 include/linux/pagemap.h | 16 
 mm/memory.c |  7 ++-
 mm/rmap.c   | 27 +--
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e3dea75..9a461ee 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE  = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
AS_BALLOON_MAP  = __GFP_BITS_SHIFT + 4, /* balloon page special map */
+   AS_CMTIME   = __GFP_BITS_SHIFT + 5, /* cmtime update deferred */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -74,6 +75,21 @@ static inline gfp_t mapping_gfp_mask(struct address_space * 
mapping)
return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
 }
 
+static inline void mapping_set_cmtime(struct address_space * mapping)
+{
+   set_bit(AS_CMTIME, >flags);
+}
+
+static inline bool mapping_test_cmtime(struct address_space * mapping)
+{
+   return test_bit(AS_CMTIME, >flags);
+}
+
+static inline bool mapping_test_clear_cmtime(struct address_space * mapping)
+{
+   return test_and_clear_bit(AS_CMTIME, >flags);
+}
+
 /*
  * This is non-atomic.  Only to be used before the mapping is activated.
  * Probably needs a barrier...
diff --git a/mm/memory.c b/mm/memory.c
index 4026841..1737a90 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1150,8 +1150,13 @@ again:
if (PageAnon(page))
rss[MM_ANONPAGES]--;
else {
-   if (pte_dirty(ptent))
+   if (pte_dirty(ptent)) {
+   struct address_space *mapping =
+   page_mapping(page);
+   if (mapping)
+   mapping_set_cmtime(mapping);
set_page_dirty(page);
+   }
if (pte_young(ptent) &&
likely(!(vma->vm_flags & VM_SEQ_READ)))
mark_page_accessed(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index b2e29ac..2e3fb27 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -928,6 +928,10 @@ static int page_mkclean_file(struct address_space 
*mapping, struct page *page)
}
}
mutex_unlock(>i_mmap_mutex);
+
+   if (ret)
+   mapping_set_cmtime(mapping);
+
return ret;
 }
 
@@ -1179,6 +1183,19 @@ out:
 }
 
 /*
+ * Mark a page's mapping for future cmtime update.  It's safe to call this
+ * on any page, but it only has any effect if the page is backed by a mapping
+ * that uses mapping_test_clear_cmtime to handle file time updates.  This means
+ * that there's no need to call this on for non-VM_SHARED vmas.
+ */
+static void page_set_cmtime(struct page *page)
+{
+   struct address_space *mapping = page_mapping(page);
+   if (mapping)
+   mapping_set_cmtime(mapping);
+}
+
+/*
  * Subfunctions of try_to_unmap: try_to_unmap_one called
  * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file.
  */
@@ -1219,8 +1236,11 @@ int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
pteval = ptep_clear_flush(vma, address, pte);
 
/* Move the dirty bit to the physical page now the pte is gone. */
-   if (pte_dirty(pteval))
+   if (pte_dirty(pteval)) {
set_page_dirty(page);
+   if (vma->vm_flags & VM_SHARED)
+   page_set_cmtime(page);
+   }
 
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
@@ -1413,8 +1433,11 @@ static int try_to_unmap_cluster(unsigned long cursor, 
unsigned int *mapcount,
}
 
/* Move the dirty bit to the physical page now the pte is gone. 
*/
-   if (pte_dirty(pteval))
+   if (pte_dirty(pteval)) {
set_page_dirty(page);
+   if (vma->vm_flags & VM_SHARED)
+   page_set_cmtime(page);
+   }
 
page_remove_rmap(page);
  

[PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates

2013-08-22 Thread Andy Lutomirski
Filesystems that defer cmtime updates should update cmtime when any
of these events happen after a write via a mapping:

 - The mapping is written back to disk.  This happens from all kinds
   of places, most of which eventually call ->writepages.  (The
   exceptions are vmscan and migration.)

 - munmap is called or the mapping is removed when the process exits

 - msync(MS_ASYNC) is called.  Linux currently does nothing for
   msync(MS_ASYNC), but POSIX says that cmtime should be updated some
   time between an mmaped write and the subsequent msync call.
   MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.

Filesystems are responsible for checking for pending deferred cmtime
updates in .writepages (a helper is provided for this purpose) and
for doing the actual update in .update_cmtime_deferred.

These changes have no effect by themselves; filesystems must opt in
by implementing .update_cmtime_deferred and removing any
file_update_time call in .page_mkwrite.

This patch does not implement the MS_ASYNC case; that's in the next
patch.

Signed-off-by: Andy Lutomirski 
---
 include/linux/fs.h|  8 +++
 include/linux/pagemap.h   |  6 ++
 include/linux/writeback.h |  1 +
 mm/migrate.c  |  2 ++
 mm/mmap.c |  6 +-
 mm/page-writeback.c   | 53 ++-
 mm/vmscan.c   |  1 +
 7 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 86cf0a4..f6b0f8b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -350,6 +350,14 @@ struct address_space_operations {
/* Write back some dirty pages from this mapping. */
int (*writepages)(struct address_space *, struct writeback_control *);
 
+   /*
+* Called when a deferred cmtime update should be applied.
+* Implementations should update cmtime.  (As an optional
+* optimization, implementaions can call mapping_test_clear_cmtime
+* from writepages as well.)
+*/
+   void (*update_cmtime_deferred)(struct address_space *);
+
/* Set a page dirty.  Return true if this dirtied it */
int (*set_page_dirty)(struct page *page);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9a461ee..2647a13 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -90,6 +90,12 @@ static inline bool mapping_test_clear_cmtime(struct 
address_space * mapping)
return test_and_clear_bit(AS_CMTIME, >flags);
 }
 
+/* Use this one in writepages, etc. */
+extern void mapping_flush_cmtime(struct address_space * mapping);
+
+/* Use this one outside writeback. */
+extern void mapping_flush_cmtime_nowb(struct address_space * mapping);
+
 /*
  * This is non-atomic.  Only to be used before the mapping is activated.
  * Probably needs a barrier...
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 4e198ca..efe4970 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -174,6 +174,7 @@ typedef int (*writepage_t)(struct page *page, struct 
writeback_control *wbc,
 
 int generic_writepages(struct address_space *mapping,
   struct writeback_control *wbc);
+void generic_update_cmtime_deferred(struct address_space *mapping);
 void tag_pages_for_writeback(struct address_space *mapping,
 pgoff_t start, pgoff_t end);
 int write_cache_pages(struct address_space *mapping,
diff --git a/mm/migrate.c b/mm/migrate.c
index 6f0c244..e4124e2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -627,6 +627,8 @@ static int writeout(struct address_space *mapping, struct 
page *page)
/* unlocked. Relock */
lock_page(page);
 
+   mapping_flush_cmtime(mapping);
+
return (rc < 0) ? -EIO : -EAGAIN;
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 1edbaa3..189eb7a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1,3 +1,4 @@
+
 /*
  * mm/mmap.c
  *
@@ -249,8 +250,11 @@ static struct vm_area_struct *remove_vma(struct 
vm_area_struct *vma)
might_sleep();
if (vma->vm_ops && vma->vm_ops->close)
vma->vm_ops->close(vma);
-   if (vma->vm_file)
+   if (vma->vm_file) {
+   if ((vma->vm_flags & VM_SHARED) && vma->vm_file->f_mapping)
+   mapping_flush_cmtime_nowb(vma->vm_file->f_mapping);
fput(vma->vm_file);
+   }
mpol_put(vma_policy(vma));
kmem_cache_free(vm_area_cachep, vma);
return next;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3f0c895..4ec8c02 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1912,12 +1912,30 @@ int generic_writepages(struct address_space *mapping,
 
blk_start_plug();
ret = write_cache_pages(mapping, wbc, __writepage, mapping);
+   mapping_flush_cmtime(mapping);
blk_finish_plug();
return ret;
 }
-
 EXPORT_SYMBOL(generic_writepages);
 
+/**
+ * 

[PATCH v4 4/7] mm: Scan for dirty ptes and update cmtime on MS_ASYNC

2013-08-22 Thread Andy Lutomirski
This is probably unimportant but improves POSIX compliance.

Signed-off-by: Andy Lutomirski 
---
 mm/msync.c | 84 ++
 1 file changed, 73 insertions(+), 11 deletions(-)

diff --git a/mm/msync.c b/mm/msync.c
index 632df45..a2ee43c 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -13,13 +13,16 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * MS_SYNC syncs the entire file - including mappings.
  *
  * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
  * Nor does it marks the relevant pages dirty (it used to up to 2.6.17).
- * Now it doesn't do anything, since dirty pages are properly tracked.
+ * Now all it does is ensure that file timestamps get updated, since POSIX
+ * requires it.  We track dirty pages correct without MS_ASYNC.
  *
  * The application may now run fsync() to
  * write out the dirty pages and wait on the writeout and check the result.
@@ -28,6 +31,54 @@
  * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to
  * applications.
  */
+
+static int msync_async_range(struct vm_area_struct *vma,
+ unsigned long *start, unsigned long end)
+{
+   struct mm_struct *mm;
+   int iters = 0;
+
+   while (*start < end && *start < vma->vm_end && iters < 128) {
+   unsigned int page_mask, page_increm;
+
+   /*
+* Require that the pte is writable (because otherwise
+* it can't be dirty, so there's nothing to clean).
+*
+* In theory we could check the pte dirty bit, but this is
+* awkward and barely worth it.
+*/
+   struct page *page = follow_page_mask(vma, *start,
+FOLL_GET | FOLL_WRITE,
+_mask);
+
+   if (page && !IS_ERR(page)) {
+   if (lock_page_killable(page) == 0) {
+   page_mkclean(page);
+   unlock_page(page);
+   }
+   put_page(page);
+   }
+
+   if (IS_ERR(page))
+   return PTR_ERR(page);
+
+   page_increm = 1 + (~(*start >> PAGE_SHIFT) & page_mask);
+   *start += page_increm * PAGE_SIZE;
+   cond_resched();
+   iters++;
+   }
+
+   /* XXX: try to do this only once? */
+   mapping_flush_cmtime_nowb(vma->vm_file->f_mapping);
+
+   /* Give mmap_sem writers a chance. */
+   mm = current->mm;
+   up_read(>mmap_sem);
+   down_read(>mmap_sem);
+   return 0;
+}
+
 SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
 {
unsigned long end;
@@ -77,18 +128,29 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, 
int, flags)
goto out_unlock;
}
file = vma->vm_file;
-   start = vma->vm_end;
-   if ((flags & MS_SYNC) && file &&
-   (vma->vm_flags & VM_SHARED)) {
-   get_file(file);
-   up_read(>mmap_sem);
-   error = vfs_fsync(file, 0);
-   fput(file);
-   if (error || start >= end)
-   goto out;
-   down_read(>mmap_sem);
+   if (file && vma->vm_flags & VM_SHARED) {
+   if (flags & MS_SYNC) {
+   start = vma->vm_end;
+   get_file(file);
+   up_read(>mmap_sem);
+   error = vfs_fsync(file, 0);
+   fput(file);
+   if (error || start >= end)
+   goto out;
+   down_read(>mmap_sem);
+   } else if ((vma->vm_flags & VM_WRITE) &&
+  file->f_mapping) {
+   error = msync_async_range(vma, , end);
+   if (error || start >= end)
+   goto out_unlock;
+   } else {
+   start = vma->vm_end;
+   if (start >= end)
+   goto out_unlock;
+   }
vma = find_vma(mm, start);
} else {
+   start = vma->vm_end;
if (start >= end) {
error = 0;
goto out_unlock;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 5/7] ext4: Defer mmap cmtime updates

2013-08-22 Thread Andy Lutomirski
Signed-off-by: Andy Lutomirski 
---
 fs/ext4/inode.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index dd32a2e..2cb2961 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2382,8 +2382,11 @@ static int ext4_writepages(struct address_space *mapping,
 * a transaction for special inodes like journal inode on last iput()
 * because that could violate lock ordering on umount
 */
-   if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
+   if (!mapping->nrpages ||
+   !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+   mapping_flush_cmtime(mapping);
return 0;
+   }
 
if (ext4_should_journal_data(inode)) {
struct blk_plug plug;
@@ -2391,6 +2394,7 @@ static int ext4_writepages(struct address_space *mapping,
 
blk_start_plug();
ret = write_cache_pages(mapping, wbc, __writepage, mapping);
+   mapping_flush_cmtime(mapping);
blk_finish_plug();
return ret;
}
@@ -2525,6 +2529,7 @@ retry:
if (ret)
break;
}
+   mapping_flush_cmtime(mapping);
blk_finish_plug();
if (!ret && !cycled) {
cycled = 1;
@@ -3238,6 +3243,7 @@ static const struct address_space_operations ext4_aops = {
.writepages = ext4_writepages,
.write_begin= ext4_write_begin,
.write_end  = ext4_write_end,
+   .update_cmtime_deferred = generic_update_cmtime_deferred,
.bmap   = ext4_bmap,
.invalidatepage = ext4_invalidatepage,
.releasepage= ext4_releasepage,
@@ -3254,6 +3260,7 @@ static const struct address_space_operations 
ext4_journalled_aops = {
.writepages = ext4_writepages,
.write_begin= ext4_write_begin,
.write_end  = ext4_journalled_write_end,
+   .update_cmtime_deferred = generic_update_cmtime_deferred,
.set_page_dirty = ext4_journalled_set_page_dirty,
.bmap   = ext4_bmap,
.invalidatepage = ext4_journalled_invalidatepage,
@@ -3270,6 +3277,7 @@ static const struct address_space_operations ext4_da_aops 
= {
.writepages = ext4_writepages,
.write_begin= ext4_da_write_begin,
.write_end  = ext4_da_write_end,
+   .update_cmtime_deferred = generic_update_cmtime_deferred,
.bmap   = ext4_bmap,
.invalidatepage = ext4_da_invalidatepage,
.releasepage= ext4_releasepage,
@@ -5025,7 +5033,6 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf)
int retries = 0;
 
sb_start_pagefault(inode->i_sb);
-   file_update_time(vma->vm_file);
/* Delalloc case is easy... */
if (test_opt(inode->i_sb, DELALLOC) &&
!ext4_should_journal_data(inode) &&
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 7/7] xfs: Defer mmap cmtime updates

2013-08-22 Thread Andy Lutomirski
This involves a change to block_page_mkwrite.  xfs is the only user.

Signed-off-by: Andy Lutomirski 
---
 fs/buffer.c   | 7 ---
 fs/xfs/xfs_aops.c | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4d74335..408677c 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2431,13 +2431,6 @@ int block_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf,
struct super_block *sb = file_inode(vma->vm_file)->i_sb;
 
sb_start_pagefault(sb);
-
-   /*
-* Update file times before taking page lock. We may end up failing the
-* fault so this update may be superfluous but who really cares...
-*/
-   file_update_time(vma->vm_file);
-
ret = __block_page_mkwrite(vma, vmf, get_block);
sb_end_pagefault(sb);
return block_page_mkwrite_return(ret);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 596ec71..aa8fbcf 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1668,6 +1668,7 @@ const struct address_space_operations 
xfs_address_space_operations = {
.readpages  = xfs_vm_readpages,
.writepage  = xfs_vm_writepage,
.writepages = xfs_vm_writepages,
+   .update_cmtime_deferred = generic_update_cmtime_deferred,
.releasepage= xfs_vm_releasepage,
.invalidatepage = xfs_vm_invalidatepage,
.write_begin= xfs_vm_write_begin,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 6/7] btrfs: Defer mmap cmtime updates

2013-08-22 Thread Andy Lutomirski
Signed-off-by: Andy Lutomirski 
---
 fs/btrfs/extent_io.c |  1 +
 fs/btrfs/inode.c | 32 
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fe443fe..dc2f851 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3756,6 +3756,7 @@ int extent_writepages(struct extent_io_tree *tree,
   __extent_writepage, ,
   flush_write_bio);
flush_epd_write_bio();
+   mapping_flush_cmtime(mapping);
return ret;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 021694c..fc51380 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7499,10 +7499,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
 
sb_start_pagefault(inode->i_sb);
ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
-   if (!ret) {
-   ret = file_update_time(vma->vm_file);
+   if (!ret)
reserved = 1;
-   }
if (ret) {
if (ret == -ENOMEM)
ret = VM_FAULT_OOM;
@@ -8711,22 +8709,24 @@ static struct extent_io_ops btrfs_extent_io_ops = {
  * For now we're avoiding this by dropping bmap.
  */
 static const struct address_space_operations btrfs_aops = {
-   .readpage   = btrfs_readpage,
-   .writepage  = btrfs_writepage,
-   .writepages = btrfs_writepages,
-   .readpages  = btrfs_readpages,
-   .direct_IO  = btrfs_direct_IO,
-   .invalidatepage = btrfs_invalidatepage,
-   .releasepage= btrfs_releasepage,
-   .set_page_dirty = btrfs_set_page_dirty,
-   .error_remove_page = generic_error_remove_page,
+   .readpage   = btrfs_readpage,
+   .writepage  = btrfs_writepage,
+   .writepages = btrfs_writepages,
+   .update_cmtime_deferred = generic_update_cmtime_deferred,
+   .readpages  = btrfs_readpages,
+   .direct_IO  = btrfs_direct_IO,
+   .invalidatepage = btrfs_invalidatepage,
+   .releasepage= btrfs_releasepage,
+   .set_page_dirty = btrfs_set_page_dirty,
+   .error_remove_page  = generic_error_remove_page,
 };
 
 static const struct address_space_operations btrfs_symlink_aops = {
-   .readpage   = btrfs_readpage,
-   .writepage  = btrfs_writepage,
-   .invalidatepage = btrfs_invalidatepage,
-   .releasepage= btrfs_releasepage,
+   .readpage   = btrfs_readpage,
+   .writepage  = btrfs_writepage,
+   .update_cmtime_deferred = generic_update_cmtime_deferred,
+   .invalidatepage = btrfs_invalidatepage,
+   .releasepage= btrfs_releasepage,
 };
 
 static const struct inode_operations btrfs_file_inode_operations = {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state

2013-08-22 Thread Bjorn Helgaas
On Thu, Aug 08, 2013 at 03:57:07PM +0200, Radim Krčmář wrote:
> PCIe switch can be connected directly to the PCIe root complex in QEMU;
> ASPM does not expect this topology and dereferences NULL pointer when
> initializing.
> 
> Downstream port can be also connected to the root complex without
> upstream one, so code checks for both, otherwise they dereference NULL
> on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
>   parent = pdev->bus->parent->self->link_state;
> "pdev->bus->parent->self == NULL" if upstream port is connected directly
> to the root bus and "pdev->bus->parent == NULL" in the second case.
> 
> v1 -> v2: (https://lkml.org/lkml/2013/6/19/753)
>  - Initialization is aborted in pcie_aspm_init_link_state, where other
>special cases are being handled
>  - pci_is_root_bus is used
>  - Warning is printed
> 
> Reproducer for "downstream -- root" and "downstream -- upstream -- root"
> (used qemu-kvm 1.5, q35 machine type might be missing on older ones)
> 
>   for parent in pcie.0 upstream; do
>qemu-kvm -m 128 -M q35 -nographic -no-reboot \
>  -device x3130-upstream,bus=pcie.0,id=upstream \
>  -device xio3130-downstream,bus=$parent,id=downstream,chassis=1 \
>  -device virtio-blk-pci,bus=downstream,id=virtio-zero,drive=zero \
>  -drive  file=/dev/zero,id=zero,format=raw \
>  -kernel bzImage -append "console=ttyS0 panic=3" # pcie_aspm=off
>   done
> 
> ASPM in QEMU works if we connect upstream through root port
>   -device ioh3420,bus=pcie.0,id=root.0 \
>   -device x3130-upstream,bus=root.0,id=upstream
> 
> Signed-off-by: Radim Krčmář 
> ---
>  drivers/pci/pcie/aspm.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 403a443..209cd7f 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -570,6 +570,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
>   pdev->bus->self)
>   return;
>  
> + /* We require at least two ports between downstream and root bus */
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
> + (pci_is_root_bus(pdev->bus) ||
> +  pci_is_root_bus(pdev->bus->parent))) {
> + dev_warn(>dev, "ASPM disabled"
> +  " (connected directly to root bus)\n");
> + return;
> + }

I don't really want to detect invalid topologies piecemeal -- we will
likely find other areas (MPS, AER, link speed management, etc.) that
have similar dependencies.  I'd rather do it generically, maybe with
something like the following patch.

I tried this with the following qemu invocation:

$ /usr/local/bin/qemu-system-x86_64 -M q35 -enable-kvm -m 512   -device 
x3130-upstream,bus=pcie.0,id=upstream   -device 
xio3130-downstream,bus=upstream,id=downstream,chassis=1   -drive 
file=ubuntu.img,if=none,id=mydisk   -device ide-drive,drive=mydisk,bus=ide.0   
-drive file=scratch.img,id=disk1 -device 
virtio-blk-pci,bus=downstream,id=virtio-disk1,drive=disk1 -nographic -kernel 
~/linux/arch/x86/boot/bzImage   -append "console=ttyS0,115200n8 root=/dev/sda1 
ignore_loglevel"

With unmodified v3.11-rc2, I see the NULL pointer dereference, but with
the patch below applied, we just ignore the 00:03.0 device and the kernel
boots fine.

Bjorn

---
 arch/powerpc/kernel/pci_of_scan.c |7 ++-
 arch/sparc/kernel/pci.c   |7 ++-
 drivers/pci/probe.c   |   37 +
 include/linux/pci.h   |2 +-
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 6b0ba58..f6ef4dd 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -143,7 +143,6 @@ struct pci_dev *of_create_pci_dev(struct device_node *node,
dev->devfn = devfn;
dev->multifunction = 0; /* maybe a lie? */
dev->needs_freset = 0;  /* pcie fundamental reset required */
-   set_pcie_port_type(dev);
 
list_for_each_entry(slot, >bus->slots, list)
if (PCI_SLOT(dev->devfn) == slot->number)
@@ -164,6 +163,12 @@ struct pci_dev *of_create_pci_dev(struct device_node *node,
pr_debug("class: 0x%x\n", dev->class);
pr_debug("revision: 0x%x\n", dev->revision);
 
+   if (set_pcie_port_type(dev)) {
+   pci_bus_put(dev->bus);
+   kfree(dev);
+   return NULL;
+   }
+
dev->current_state = PCI_UNKNOWN;   /* unknown power state */
dev->error_state = pci_channel_io_normal;
dev->dma_mask = 0x;
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index bc4d3f5..5600849 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -287,7 +287,6 @@ static struct pci_dev *of_create_pci_dev(struct 
pci_pbm_info *pbm,
dev->dev.of_node = of_node_get(node);

Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer

2013-08-22 Thread zhangwei(Jovi)
On 2013/8/23 0:42, Steven Rostedt wrote:
> On Fri, 09 Aug 2013 18:56:54 +0900
> Masami Hiramatsu  wrote:
> 
>> (2013/08/09 17:45), Namhyung Kim wrote:
>>> From: Namhyung Kim 
>>>
>>> Fetching from user space should be done in a non-atomic context.  So
>>> use a temporary buffer and copy its content to the ring buffer
>>> atomically.
>>>
>>> While at it, use __get_data_size() and store_trace_args() to reduce
>>> code duplication.
>>
>> I just concern using kmalloc() in the event handler. For fetching user
>> memory which can be swapped out, that is true. But most of the cases,
>> we can presume that it exists on the physical memory.
>>
> 
> 
> What about creating a per cpu buffer when uprobes are registered, and
> delete them when they are finished? Basically what trace_printk() does
> if it detects that there are users of trace_printk() in the kernel.
> Note, it does not deallocate them when finished, as it is never
> finished until reboot ;-)
> 
> -- Steve
>
I also thought out this approach, but the issue is we cannot fetch user
memory into per-cpu buffer, because use per-cpu buffer should under
preempt disabled, and fetching user memory could sleep.

jovi.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Seiji Aguchi


> -Original Message-
> From: Luck, Tony [mailto:tony.l...@intel.com]
> Sent: Thursday, August 22, 2013 7:17 PM
> To: Seiji Aguchi; Aruna Balakrishnaiah; linuxppc-...@ozlabs.org; 
> linux-kernel@vger.kernel.org; keesc...@chromium.org
> Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; 
> cbouatmai...@gmail.com;
> mah...@linux.vnet.ibm.com; ccr...@android.com
> Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> 
> <1>[  383.209057] RIP  [] sysrq_handle_crash+0x16/0x20
> <4>[  383.209057]  RSP 
> <4>[  383.209057] CR2: 
> <4>[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
> <3>[  383.209057] pstore: compression failed for Part 2 returned -5
> <3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
> <3>[  383.209057] pstore: compression failed for Part 5 returned -5
> 
> Interesting.  With ERST backend I didn't see these messages.  Traces in
> pstore recovered files go as far as the line before the "---[ end trace 
> 04a1cddad37b4b33 ]---"
> 
> Why the difference depending on which back end is in use?

I think the difference doesn't depend on the back end.
Rather it depends on the environment.

I tested on a kvm guest with OVMF.

Seiji


> 
> But I agree that we shouldn't have these messages.  They use up space
> in the persistent store that could be better used saving some more lines
> from earlier in the console log.
> 
> -Tony


Re: [PATCH 1/2] tick: broadcast: Deny per-cpu clockevents from being broadcast sources

2013-08-22 Thread Sören Brinkmann
On Thu, Aug 22, 2013 at 10:06:40AM -0700, Stephen Boyd wrote:
> On most ARM systems the per-cpu clockevents are truly per-cpu in
> the sense that they can't be controlled on any other CPU besides
> the CPU that they interrupt. If one of these clockevents were to
> become a broadcast source we will run into a lot of trouble
> because the broadcast source is enabled on the first CPU to go
> into deep idle (if that CPU suffers from FEAT_C3_STOP) and that
> could be a different CPU than what the clockevent is interrupting
> (or even worse the CPU that the clockevent interrupts could be
> offline).
> 
> Theoretically it's possible to support per-cpu clockevents as the
> broadcast source but so far we haven't needed this and supporting
> it is rather complicated. Let's just deny the possibility for now
> until this becomes a reality (let's hope it never does!).
> 
> Reported-by: Sören Brinkmann 
> Signed-off-by: Stephen Boyd 
Tested-by: Sören Brinkmann 

This fixes the issue I reported when enabling the global timer on Zynq.
The global timer is prevented from becoming the broadcast device and my
system boots.

Thanks,
Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 03/10] tracing: add 'traceon' and 'traceoff' event trigger commands

2013-08-22 Thread Tom Zanussi
Add 'traceon' and 'traceoff' ftrace_func_command commands.  traceon
and traceoff event triggers are added by the user via these commands
in a similar way and using practically the same syntax as the
analagous 'traceon' and 'traceoff' ftrace function commands, but
instead of writing to the set_ftrace_filter file, the traceon and
traceoff triggers are written to the per-event 'trigger' files:

echo 'traceon' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff' > .../tracing/events/somesys/someevent/trigger

The above command will turn tracing on or off whenever someevent is
hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace_event.h|   1 +
 kernel/trace/trace_events_trigger.c | 182 
 2 files changed, 183 insertions(+)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 0765d3d..4c8f7c1 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -318,6 +318,7 @@ struct ftrace_event_file {
 
 enum trigger_mode {
TM_NONE = (0),
+   TM_TRACE_ONOFF  = (1 << 0),
 };
 
 extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace_events_trigger.c 
b/kernel/trace/trace_events_trigger.c
index 7a52109..d5a10ed 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -564,7 +564,189 @@ event_trigger_callback(struct event_command *cmd_ops,
goto out;
 }
 
+static void
+traceon_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (tracing_is_on())
+   return;
+
+   tracing_on();
+}
+
+static void
+traceon_count_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (!data->count)
+   return;
+
+   if (data->count != -1)
+   (data->count)--;
+
+   traceon_trigger(_data);
+}
+
+static void
+traceoff_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (!tracing_is_on())
+   return;
+
+   tracing_off();
+}
+
+static void
+traceoff_count_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (!data->count)
+   return;
+
+   if (data->count != -1)
+   (data->count)--;
+
+   traceoff_trigger(_data);
+}
+
+static int
+traceon_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+   struct event_trigger_data *data = _data;
+
+   return event_trigger_print("traceon", m, (void *)data->count,
+  data->filter_str);
+}
+
+static int
+traceoff_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+  void *_data)
+{
+   struct event_trigger_data *data = _data;
+
+   return event_trigger_print("traceoff", m, (void *)data->count,
+  data->filter_str);
+}
+
+static struct event_trigger_ops traceon_trigger_ops = {
+   .func   = traceon_trigger,
+   .print  = traceon_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct event_trigger_ops traceon_count_trigger_ops = {
+   .func   = traceon_count_trigger,
+   .print  = traceon_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct event_trigger_ops traceoff_trigger_ops = {
+   .func   = traceoff_trigger,
+   .print  = traceoff_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct event_trigger_ops traceoff_count_trigger_ops = {
+   .func   = traceoff_count_trigger,
+   .print  = traceoff_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+

[PATCH v6 05/10] tracing: add 'stacktrace' event trigger command

2013-08-22 Thread Tom Zanussi
Add 'stacktrace' ftrace_func_command.  stacktrace event triggers are
added by the user via this command in a similar way and using
practically the same syntax as the analogous 'stacktrace' ftrace
function command, but instead of writing to the set_ftrace_filter
file, the stacktrace event trigger is written to the per-event
'trigger' files:

echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger

The above command will turn on stacktraces for someevent i.e. whenever
someevent is hit, a stacktrace will be logged.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will log N stacktraces for someevent i.e. whenever
someevent is hit N times, a stacktrace will be logged.

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace_event.h|   1 +
 kernel/trace/trace_events_trigger.c | 102 
 2 files changed, 103 insertions(+)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c223e3..850f201 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -320,6 +320,7 @@ enum trigger_mode {
TM_NONE = (0),
TM_TRACE_ONOFF  = (1 << 0),
TM_SNAPSHOT = (1 << 1),
+   TM_STACKTRACE   = (1 << 2),
 };
 
 extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace_events_trigger.c 
b/kernel/trace/trace_events_trigger.c
index 342a6dc..42c96de 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -793,6 +793,97 @@ static struct event_command trigger_snapshot_cmd = {
.get_trigger_ops= snapshot_get_trigger_ops,
 };
 
+/*
+ * Skip 4:
+ *   ftrace_stacktrace()
+ *   function_trace_probe_call()
+ *   ftrace_ops_list_func()
+ *   ftrace_call()
+ */
+#define STACK_SKIP 4
+
+static void
+stacktrace_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   trace_dump_stack(STACK_SKIP);
+}
+
+static void
+stacktrace_count_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (!data->count)
+   return;
+
+   if (data->count != -1)
+   (data->count)--;
+
+   stacktrace_trigger(_data);
+}
+
+static int
+stacktrace_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+void *_data)
+{
+   struct event_trigger_data *data = _data;
+
+   return event_trigger_print("stacktrace", m, (void *)data->count,
+  data->filter_str);
+}
+
+static int stacktrace_register_trigger(char *glob,
+  struct event_trigger_ops *ops,
+  void *trigger_data,
+  struct ftrace_event_file *file)
+{
+   struct event_trigger_data *data = trigger_data;
+
+   data->post_trigger = true;
+
+   return register_trigger(glob, ops, trigger_data, file);
+}
+
+static struct event_trigger_ops stacktrace_trigger_ops = {
+   .func   = stacktrace_trigger,
+   .print  = stacktrace_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct event_trigger_ops stacktrace_count_trigger_ops = {
+   .func   = stacktrace_count_trigger,
+   .print  = stacktrace_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+stacktrace_get_trigger_ops(char *cmd, char *param)
+{
+   return param ? _count_trigger_ops : _trigger_ops;
+}
+
+static struct event_command trigger_stacktrace_cmd = {
+   .name   = "stacktrace",
+   .trigger_mode   = TM_STACKTRACE,
+   .post_trigger   = true,
+   .func   = event_trigger_callback,
+   .reg= register_trigger,
+   .unreg  = unregister_trigger,
+   .get_trigger_ops= stacktrace_get_trigger_ops,
+};
+
 static __init void unregister_trigger_traceon_traceoff_cmds(void)
 {
unregister_event_command(_traceon_cmd,
@@ -837,5 +928,16 @@ __init int register_trigger_cmds(void)
return ret;
}
 
+   ret = register_event_command(_stacktrace_cmd,
+ _commands,
+ _cmd_mutex);
+   if (WARN_ON(ret < 0)) {
+   

[PATCH v6 01/10] tracing: Add support for SOFT_DISABLE to syscall events

2013-08-22 Thread Tom Zanussi
The original SOFT_DISABLE patches didn't add support for soft disable
of syscall events; this adds it and paves the way for future patches
allowing triggers to be added to syscall events, since triggers are
built on top of SOFT_DISABLE.

Add an array of ftrace_event_file pointers indexed by syscall number
to the trace array and remove the existing enabled bitmaps, which as a
result are now redundant.  The ftrace_event_file structs in turn
contain the soft disable flags we need for per-syscall soft disable
accounting; later patches add additional 'trigger' flags and
per-syscall triggers and filters.

Signed-off-by: Tom Zanussi 
---
 kernel/trace/trace.h  |  4 ++--
 kernel/trace/trace_syscalls.c | 36 ++--
 2 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index fe39acd..b1227b9 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -192,8 +192,8 @@ struct trace_array {
 #ifdef CONFIG_FTRACE_SYSCALLS
int sys_refcount_enter;
int sys_refcount_exit;
-   DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
-   DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
+   struct ftrace_event_file *enter_syscall_files[NR_syscalls];
+   struct ftrace_event_file *exit_syscall_files[NR_syscalls];
 #endif
int stop_count;
int clock_id;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 559329d..230cdb6 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -302,6 +302,7 @@ static int __init syscall_exit_define_fields(struct 
ftrace_event_call *call)
 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 {
struct trace_array *tr = data;
+   struct ftrace_event_file *ftrace_file;
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
@@ -314,7 +315,13 @@ static void ftrace_syscall_enter(void *data, struct 
pt_regs *regs, long id)
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0)
return;
-   if (!test_bit(syscall_nr, tr->enabled_enter_syscalls))
+
+   /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
+   ftrace_file = rcu_dereference_raw(tr->enter_syscall_files[syscall_nr]);
+   if (!ftrace_file)
+   return;
+
+   if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, _file->flags))
return;
 
sys_data = syscall_nr_to_meta(syscall_nr);
@@ -345,6 +352,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs 
*regs, long id)
 static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 {
struct trace_array *tr = data;
+   struct ftrace_event_file *ftrace_file;
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
@@ -356,7 +364,13 @@ static void ftrace_syscall_exit(void *data, struct pt_regs 
*regs, long ret)
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0)
return;
-   if (!test_bit(syscall_nr, tr->enabled_exit_syscalls))
+
+   /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
+   ftrace_file = rcu_dereference_raw(tr->exit_syscall_files[syscall_nr]);
+   if (!ftrace_file)
+   return;
+
+   if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, _file->flags))
return;
 
sys_data = syscall_nr_to_meta(syscall_nr);
@@ -397,7 +411,7 @@ static int reg_event_syscall_enter(struct ftrace_event_file 
*file,
if (!tr->sys_refcount_enter)
ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
if (!ret) {
-   set_bit(num, tr->enabled_enter_syscalls);
+   rcu_assign_pointer(tr->enter_syscall_files[num], file);
tr->sys_refcount_enter++;
}
mutex_unlock(_trace_lock);
@@ -415,9 +429,14 @@ static void unreg_event_syscall_enter(struct 
ftrace_event_file *file,
return;
mutex_lock(_trace_lock);
tr->sys_refcount_enter--;
-   clear_bit(num, tr->enabled_enter_syscalls);
+   rcu_assign_pointer(tr->enter_syscall_files[num], NULL);
if (!tr->sys_refcount_enter)
unregister_trace_sys_enter(ftrace_syscall_enter, tr);
+   /*
+* Callers expect the event to be completely disabled on
+* return, so wait for current handlers to finish.
+*/
+   synchronize_sched();
mutex_unlock(_trace_lock);
 }
 
@@ -435,7 +454,7 @@ static int reg_event_syscall_exit(struct ftrace_event_file 
*file,
if (!tr->sys_refcount_exit)
ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
if (!ret) {
-   set_bit(num, 

[PATCH v6 07/10] tracing: add and use generic set_trigger_filter() implementation

2013-08-22 Thread Tom Zanussi
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.

Syntactically, filters are supported by adding 'if ' just
after the command, in which case only events matching the filter will
invoke the trigger.  For example, to add a filter to an
enable/disable_event command:

echo 'enable_event:system:event if common_pid == 999' > \
  .../othersys/otherevent/trigger

The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.

As another example, to add a filter to a stacktrace command:

echo 'stacktrace if common_pid == 999' > \
   .../somesys/someevent/trigger

The above command will only trigger a stacktrace if the common_pid
field in the event is 999.

The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.

Because triggers can now use filters, the trigger-invoking logic needs
to be moved - for ftrace_raw_event_calls, trigger invocation now needs
to happen after the { assign; } part of the call.

Also, because triggers need to be invoked even for soft-disabled
events, the SOFT_DISABLED check and return needs to be moved from the
top of the call to a point following the trigger check, which means
that soft-disabled events actually get discarded instead of simply
skipped.  There's still a SOFT_DISABLED-only check at the top of the
function, so when an event is soft disabled but not because of the
presence of a trigger, the original SOFT_DISABLED behavior remains
unchanged.

There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers.  To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger.  Once all commands have been either invoked or set their
return flag, event_triggers_call() returns.  The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().

The syscall event invocation code is also changed in analogous ways.

Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace_event.h|   6 ++-
 include/trace/ftrace.h  |  45 +++-
 kernel/trace/trace.h|   4 ++
 kernel/trace/trace_events_filter.c  |  13 +
 kernel/trace/trace_events_trigger.c | 103 ++--
 kernel/trace/trace_syscalls.c   |  36 +
 6 files changed, 180 insertions(+), 27 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4ace984..5f14544 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -330,7 +330,11 @@ extern int filter_current_check_discard(struct ring_buffer 
*buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
-extern void event_triggers_call(struct ftrace_event_file *file);
+extern enum trigger_mode event_triggers_call(struct ftrace_event_file *file,
+void *rec);
+extern void event_triggers_post_call(struct ftrace_event_file *file,
+enum trigger_mode tm);
+
 
 enum {
FILTER_OTHER = 0,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 326ba32..be913f1 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -412,13 +412,15 @@ static inline notrace int ftrace_get_offsets_##call(  
\
  * struct ftrace_data_offsets_ __maybe_unused __data_offsets;
  * struct ring_buffer_event *event;
  * struct ftrace_raw_ *entry; <-- defined in stage 1
+ * enum trigger_mode __tm = TM_NONE;
  * struct ring_buffer *buffer;
  * unsigned long irq_flags;
  * int __data_size;
  * int pc;
  *
- * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
- *  _file->flags))
+ * if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED |
+ *  FTRACE_EVENT_FL_TRIGGER_MODE)) ==
+ * FTRACE_EVENT_FL_SOFT_DISABLED)
  * return;
  *
  * local_save_flags(irq_flags);
@@ -437,9 

[PATCH v6 04/10] tracing: add 'snapshot' event trigger command

2013-08-22 Thread Tom Zanussi
Add 'snapshot' ftrace_func_command.  snapshot event triggers are added
by the user via this command in a similar way and using practically
the same syntax as the analogous 'snapshot' ftrace function command,
but instead of writing to the set_ftrace_filter file, the snapshot
event trigger is written to the per-event 'trigger' files:

echo 'snapshot' > .../somesys/someevent/trigger

The above command will turn on snapshots for someevent i.e. whenever
someevent is hit, a snapshot will be done.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'snapshot:N' > .../somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will snapshot N times for someevent i.e. whenever
someevent is hit N times, a snapshot will be done.

Also adds a new ftrace_alloc_snapshot() function - the ftrace snapshot
command defines code that allocates a snapshot, which would be nice to
be able to reuse, which this does.

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace_event.h|  1 +
 kernel/trace/trace.c|  9 
 kernel/trace/trace.h|  1 +
 kernel/trace/trace_events_trigger.c | 89 +
 4 files changed, 100 insertions(+)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4c8f7c1..5c223e3 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -319,6 +319,7 @@ struct ftrace_event_file {
 enum trigger_mode {
TM_NONE = (0),
TM_TRACE_ONOFF  = (1 << 0),
+   TM_SNAPSHOT = (1 << 1),
 };
 
 extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 496f94d..5a61dbe 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5358,6 +5358,15 @@ static const struct file_operations 
tracing_dyn_info_fops = {
 };
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
+#if defined(CONFIG_TRACER_SNAPSHOT)
+int ftrace_alloc_snapshot(void)
+{
+   return alloc_snapshot(_trace);
+}
+#else
+int ftrace_alloc_snapshot(void) { return -ENOSYS; }
+#endif
+
 #if defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE)
 static void
 ftrace_snapshot(unsigned long ip, unsigned long parent_ip, void **data)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 1733ac9..5aea9e1 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1193,6 +1193,7 @@ struct event_command {
 
 extern int trace_event_enable_disable(struct ftrace_event_file *file,
  int enable, int soft_disable);
+extern int ftrace_alloc_snapshot(void);
 
 extern const char *__start___trace_bprintk_fmt[];
 extern const char *__stop___trace_bprintk_fmt[];
diff --git a/kernel/trace/trace_events_trigger.c 
b/kernel/trace/trace_events_trigger.c
index d5a10ed..342a6dc 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -712,6 +712,87 @@ static struct event_command trigger_traceoff_cmd = {
.get_trigger_ops= onoff_get_trigger_ops,
 };
 
+static void
+snapshot_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   tracing_snapshot();
+}
+
+static void
+snapshot_count_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (!data->count)
+   return;
+
+   if (data->count != -1)
+   (data->count)--;
+
+   snapshot_trigger(_data);
+}
+
+static int
+register_snapshot_trigger(char *glob, struct event_trigger_ops *ops,
+ void *data, struct ftrace_event_file *file)
+{
+   int ret = register_trigger(glob, ops, data, file);
+
+   if (ret > 0)
+   ftrace_alloc_snapshot();
+
+   return ret;
+}
+
+static int
+snapshot_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+  void *_data)
+{
+   struct event_trigger_data *data = _data;
+
+   return event_trigger_print("snapshot", m, (void *)data->count,
+  data->filter_str);
+}
+
+static struct event_trigger_ops snapshot_trigger_ops = {
+   .func   = snapshot_trigger,
+   .print  = snapshot_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct event_trigger_ops snapshot_count_trigger_ops = {
+   .func   = snapshot_count_trigger,
+   .print  = snapshot_trigger_print,
+   .init   = event_trigger_init,
+   .free   = event_trigger_free,
+};
+
+static struct 

[PATCH v6 10/10] tracing: make register/unregister_ftrace_command __init

2013-08-22 Thread Tom Zanussi
register/unregister_ftrace_command() are only ever called from __init
functions, so can themselves be made __init.

Also make register_snapshot_cmd() __init for the same reason.

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace.h |  4 ++--
 kernel/trace/ftrace.c  | 12 ++--
 kernel/trace/trace.c   |  4 ++--
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9f15c00..6062491 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -533,11 +533,11 @@ static inline int ftrace_force_update(void) { return 0; }
 static inline void ftrace_disable_daemon(void) { }
 static inline void ftrace_enable_daemon(void) { }
 static inline void ftrace_release_mod(struct module *mod) {}
-static inline int register_ftrace_command(struct ftrace_func_command *cmd)
+static inline __init int register_ftrace_command(struct ftrace_func_command 
*cmd)
 {
return -EINVAL;
 }
-static inline int unregister_ftrace_command(char *cmd_name)
+static inline __init int unregister_ftrace_command(char *cmd_name)
 {
return -EINVAL;
 }
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a6d098c..64f7f39 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3292,7 +3292,11 @@ void unregister_ftrace_function_probe_all(char *glob)
 static LIST_HEAD(ftrace_commands);
 static DEFINE_MUTEX(ftrace_cmd_mutex);
 
-int register_ftrace_command(struct ftrace_func_command *cmd)
+/*
+ * Currently we only register ftrace commands from __init, so mark this
+ * __init too.
+ */
+__init int register_ftrace_command(struct ftrace_func_command *cmd)
 {
struct ftrace_func_command *p;
int ret = 0;
@@ -3311,7 +3315,11 @@ int register_ftrace_command(struct ftrace_func_command 
*cmd)
return ret;
 }
 
-int unregister_ftrace_command(struct ftrace_func_command *cmd)
+/*
+ * Currently we only unregister ftrace commands from __init, so mark
+ * this __init too.
+ */
+__init int unregister_ftrace_command(struct ftrace_func_command *cmd)
 {
struct ftrace_func_command *p, *n;
int ret = -ENODEV;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2aabd34..4222c6a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5458,12 +5458,12 @@ static struct ftrace_func_command ftrace_snapshot_cmd = 
{
.func   = ftrace_trace_snapshot_callback,
 };
 
-static int register_snapshot_cmd(void)
+static __init int register_snapshot_cmd(void)
 {
return register_ftrace_command(_snapshot_cmd);
 }
 #else
-static inline int register_snapshot_cmd(void) { return 0; }
+static inline __init int register_snapshot_cmd(void) { return 0; }
 #endif /* defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE) */
 
 struct dentry *tracing_init_dentry_tr(struct trace_array *tr)
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 09/10] tracing: add documentation for trace event triggers

2013-08-22 Thread Tom Zanussi
Provide a basic overview of trace event triggers and document the
available trigger commands, along with a few simple examples.

Signed-off-by: Tom Zanussi 
---
 Documentation/trace/events.txt | 207 +
 1 file changed, 207 insertions(+)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 37732a2..c94435d 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -287,3 +287,210 @@ their old filters):
 prev_pid == 0
 # cat sched_wakeup/filter
 common_pid == 0
+
+6. Event triggers
+=
+
+Trace events can be made to conditionally invoke trigger 'commands'
+which can take various forms and are described in detail below;
+examples would be enabling or disabling other trace events or invoking
+a stack trace whenever the trace event is hit.  Whenever a trace event
+with attached triggers is invoked, the set of trigger commands
+associated with that event is invoked.  Any given trigger can
+additionally have an event filter of the same form as described in
+section 5 (Event filtering) associated with it - the command will only
+be invoked if the event being invoked passes the associated filter.
+If no filter is associated with the trigger, it always passes.
+
+Triggers are added to and removed from a particular event by writing
+trigger expressions to the 'trigger' file for the given event.
+
+A given event can have any number of triggers associated with it,
+subject to any restrictions that individual commands may have in that
+regard.
+
+Event triggers are implemented on top of "soft" mode, which means that
+whenever a trace event has one or more triggers associated with it,
+the event is activated even if it isn't actually enabled, but is
+disabled in a "soft" mode.  That is, the tracepoint will be called,
+but just will not be traced, unless of course it's actually enabled.
+This scheme allows triggers to be invoked even for events that aren't
+enabled, and also allows the current event filter implementation to be
+used for conditionally invoking triggers.
+
+The syntax for event triggers is roughly based on the syntax for
+set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
+section of Documentation/trace/ftrace.txt), but there are major
+differences and the implementation isn't currently tied to it in any
+way, so beware about making generalizations between the two.
+
+6.1 Expression syntax
+-
+
+Triggers are added by echoing the command to the 'trigger' file:
+
+  # echo 'command[:count] [if filter]' > trigger
+
+Triggers are removed by echoing the same command but starting with '!'
+to the 'trigger' file:
+
+  # echo '!command[:count] [if filter]' > trigger
+
+The [if filter] part isn't used in matching commands when removing, so
+leaving that off in a '!' command will accomplish the same thing as
+having it in.
+
+The filter syntax is the same as that described in the 'Event
+filtering' section above.
+
+For ease of use, writing to the trigger file using '>' currently just
+adds or removes a single trigger and there's no explicit '>>' support
+('>' actually behaves like '>>') or truncation support to remove all
+triggers (you have to use '!' for each one added.)
+
+6.2 Supported trigger commands
+--
+
+The following commands are supported:
+
+- enable_event/disable_event
+
+  These commands can enable or disable another trace event whenever
+  the triggering event is hit.  When these commands are registered,
+  the other trace event is activated, but disabled in a "soft" mode.
+  That is, the tracepoint will be called, but just will not be traced.
+  The event tracepoint stays in this mode as long as there's a trigger
+  in effect that can trigger it.
+
+  For example, the following trigger causes kmalloc events to be
+  traced when a read system call is entered, and the :1 at the end
+  specifies that this enablement happens only once:
+
+  # echo 'enable_event:kmem:kmalloc:1' > \
+  /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+  The following trigger causes kmalloc events to stop being traced
+  when a read system call exits.  This disablement happens on every
+  read system call exit:
+
+  # echo 'disable_event:kmem:kmalloc' > \
+  /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+  The format is:
+
+  enable_event::[:count]
+  disable_event::[:count]
+
+  To remove the above commands:
+
+  # echo '!enable_event:kmem:kmalloc:1' > \
+  /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+  # echo '!disable_event:kmem:kmalloc' > \
+  /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+  Note that there can be any number of enable/disable_event triggers
+  per triggering event, but there can only be one trigger per
+  triggered event. e.g. sys_enter_read can have triggers enabling both
+  kmem:kmalloc and sched:sched_switch, but can't 

[PATCH v6 06/10] tracing: add 'enable_event' and 'disable_event' event trigger commands

2013-08-22 Thread Tom Zanussi
Add 'enable_event' and 'disable_event' event_command commands.

enable_event and disable_event event triggers are added by the user
via these commands in a similar way and using practically the same
syntax as the analagous 'enable_event' and 'disable_event' ftrace
function commands, but instead of writing to the set_ftrace_filter
file, the enable_event and disable_event triggers are written to the
per-event 'trigger' files:

echo 'enable_event:system:event' > .../othersys/otherevent/trigger
echo 'disable_event:system:event' > .../othersys/otherevent/trigger

The above commands will enable or disable the 'system:event' trace
events whenever the othersys:otherevent events are hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will enable or disable the 'system:event'
trace events whenever the othersys:otherevent events are hit, but only
N times.

This also makes the find_event_file() helper function extern, since
it's useful to use from other places, such as the event triggers code,
so make it accessible.

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace_event.h|   1 +
 kernel/trace/trace_events.c |   2 +-
 kernel/trace/trace_events_trigger.c | 364 
 3 files changed, 366 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 850f201..4ace984 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -321,6 +321,7 @@ enum trigger_mode {
TM_TRACE_ONOFF  = (1 << 0),
TM_SNAPSHOT = (1 << 1),
TM_STACKTRACE   = (1 << 2),
+   TM_EVENT_ENABLE = (1 << 3),
 };
 
 extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7d8eb8a..25b2c86 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1860,7 +1860,7 @@ struct event_probe_data {
boolenable;
 };
 
-static struct ftrace_event_file *
+struct ftrace_event_file *
 find_event_file(struct trace_array *tr, const char *system,  const char *event)
 {
struct ftrace_event_file *file;
diff --git a/kernel/trace/trace_events_trigger.c 
b/kernel/trace/trace_events_trigger.c
index 42c96de..94074d8 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -894,6 +894,358 @@ static __init void 
unregister_trigger_traceon_traceoff_cmds(void)
 _cmd_mutex);
 }
 
+/* Avoid typos */
+#define ENABLE_EVENT_STR   "enable_event"
+#define DISABLE_EVENT_STR  "disable_event"
+
+static void
+event_enable_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (data->enable)
+   clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, 
>file->flags);
+   else
+   set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, >file->flags);
+}
+
+static void
+event_enable_count_trigger(void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (!data)
+   return;
+
+   if (!data->count)
+   return;
+
+   /* Skip if the event is in a state we want to switch to */
+   if (data->enable == !(data->file->flags & 
FTRACE_EVENT_FL_SOFT_DISABLED))
+   return;
+
+   if (data->count != -1)
+   (data->count)--;
+
+   event_enable_trigger(_data);
+}
+
+static int
+event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+  void *_data)
+{
+   struct event_trigger_data *data = _data;
+
+   seq_printf(m, "%s:%s:%s",
+  data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR,
+  data->file->event_call->class->system,
+  data->file->event_call->name);
+
+   if (data->count == -1)
+   seq_puts(m, ":unlimited");
+   else
+   seq_printf(m, ":count=%ld", data->count);
+
+   if (data->filter_str)
+   seq_printf(m, " if %s\n", data->filter_str);
+   else
+   seq_puts(m, "\n");
+
+   return 0;
+}
+
+static void
+event_enable_trigger_free(struct event_trigger_ops *ops, void **_data)
+{
+   struct event_trigger_data **p = (struct event_trigger_data **)_data;
+   struct event_trigger_data *data = *p;
+
+   if (WARN_ON_ONCE(data->ref <= 0))
+   return;
+
+   data->ref--;
+   if (!data->ref) {
+   /* Remove the SOFT_MODE 

[PATCH v6 08/10] tracing: update event filters for multibuffer

2013-08-22 Thread Tom Zanussi
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

  # echo 'count > 65536' > 
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
  # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
  count > 65536
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'count > 4096' > 
/sys/kernel/debug/tracing/instances/test1/events/syscalls/sys_enter_read/filter
  # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
  count > 4096

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

  # echo 'count > 65536' > 
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
  # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
  count > 65536
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'count > 4096' > 
/sys/kernel/debug/tracing/instances/test1/events/syscalls/sys_enter_read/filter
  # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
count > 65536

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavioir for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Signed-off-by: Tom Zanussi 
---
 include/linux/ftrace_event.h |  37 ++--
 include/trace/ftrace.h   |   6 +-
 kernel/trace/trace.c |  18 ++--
 kernel/trace/trace.h |  10 +--
 kernel/trace/trace_branch.c  |   2 +-
 kernel/trace/trace_events.c  |  26 +++---
 kernel/trace/trace_events_filter.c   | 168 +++
 kernel/trace/trace_export.c  |   2 +-
 kernel/trace/trace_functions_graph.c |   4 +-
 kernel/trace/trace_kprobe.c  |   4 +-
 kernel/trace/trace_mmiotrace.c   |   4 +-
 kernel/trace/trace_sched_switch.c|   4 +-
 kernel/trace/trace_syscalls.c|   8 +-
 kernel/trace/trace_uprobe.c  |   3 +-
 14 files changed, 200 insertions(+), 96 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5f14544..908d293 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -202,6 +202,7 @@ enum {
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
TRACE_EVENT_FL_WAS_ENABLED_BIT,
+   TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
 };
 
 /*
@@ -213,6 +214,7 @@ enum {
  *  WAS_ENABLED   - Set and stays set when an event was ever enabled
  *(used for module unloading, if a module event is enabled,
  * it is best to clear the buffers that used it).
+ *  USE_CALL_FILTER - For ftrace internal events, don't use file filter
  */
 enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -220,6 +222,7 @@ enum {
TRACE_EVENT_FL_NO_SET_FILTER= (1 << 
TRACE_EVENT_FL_NO_SET_FILTER_BIT),
TRACE_EVENT_FL_IGNORE_ENABLE= (1 << 
TRACE_EVENT_FL_IGNORE_ENABLE_BIT),
TRACE_EVENT_FL_WAS_ENABLED  = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
+   TRACE_EVENT_FL_USE_CALL_FILTER  = (1 << 
TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
 };
 
 struct ftrace_event_call {
@@ -238,6 +241,7 @@ struct ftrace_event_call {
 *   bit 2: failed to apply filter
 *   bit 3: ftrace internal event (do not enable)
 *   bit 4: Event was enabled by module
+*   bit 5: use call filter rather than file filter
 */
int flags; /* static flags of different events */
 
@@ -253,6 +257,8 @@ struct ftrace_subsystem_dir;
 enum {
FTRACE_EVENT_FL_ENABLED_BIT,
FTRACE_EVENT_FL_RECORDED_CMD_BIT,
+   FTRACE_EVENT_FL_FILTERED_BIT,
+   FTRACE_EVENT_FL_NO_SET_FILTER_BIT,
FTRACE_EVENT_FL_SOFT_MODE_BIT,
FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
@@ -262,6 +268,8 @@ enum {
  * Ftrace event file flags:
  *  ENABLED  - The event is enabled
  *  RECORDED_CMD  - The comms should be recorded at sched_switch
+ *  FILTERED - The event has a filter attached
+ *  NO_SET_FILTER - Set when filter has error and is to be ignored
  *  SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
  *  

[PATCH v6 02/10] tracing: add basic event trigger framework

2013-08-22 Thread Tom Zanussi
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.

'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.

The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.

The event trigger functionality is built on top of SOFT_DISABLE
functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires.  Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that.  Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function.  Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.

The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.

The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.

The standard open, read, and release file operations are implemented
here.

The open() implementation sets up for the various open modes of the
'trigger' file.  It creates and attaches the trigger iterator and sets
up the command parser.  If opened for reading set up the trigger
seq_ops.

The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.

The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.

A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.

also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.

A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations.  They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.

The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event.  It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.

Every event_command func() implementation essentially does the
same thing for any command:

   - choose ops - use the value of param to choose either a number or
 count version of event_trigger_ops specific to the command
   - do the register or unregister of those ops
   - associate a filter, if specified, with the triggering event

The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized.  When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite.  The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.

Each command has an associated trigger_mode, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.

The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions.  This allows func()
implementations to use command-specific blobs and supports code
re-use.

The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked.  The other 

[PATCH v6 00/10] tracing: trace event triggers

2013-08-22 Thread Tom Zanussi
Hi,

This is v6 of the trace event triggers patchset.  This is essentially
the same as v5, but rebased to trace/for-next, which had a couple of
new conflicting patches pulled in since I had cut v5.  This version
just fixes up those conflicts.

v6:
 - fixed up the conflicts in trace_events.c related to the actual
   creation of the per-event 'trigger' files.

v5:
 - got rid of the trigger_iterator, a vestige of the first patchset,
   which attempted to abstract the ftrace_iterator for triggers, and
   cleaned up related code simplified as a result.
 - replaced the void *cmd_data everywhere with ftrace_event_file *,
   another vestige of the initial patchset.
 - updated the patchset to use event_file_data() to grab the i_private
   ftrace_event_files where appropriate (this was a separate patch in
   the previous patchset, but was merged into the basic framework
   patch as suggested by Masami.  The only interesting part about this
   is that it moved event_file_data() from kernel/trace/trace_events.c
   to kernel/trace/trace.h so it can be used in
   e.g. trace_events_trigger.c as well.)
 - add missing grab of event_mutex in event_trigger_regex_write().
 - realized when making the above changes that the trigger filters
   weren't being freed when the trigger was freed, so added a
   trigger_data_free() to do that.  It also ensures that trigger_data
   won't be freed until nothing is using it.
 - added clear_event_triggers(), which clears all triggers in a trace
   array (and soft-disable associated with event_enable/disable
   events).
 - added a comment to ftrace_syscall_enter/exit to document the use of
   rcu_dereference_raw() there.

v4:
 - made some changes to the soft-disable for syscall patch, according
   to Masami's suggestions.  Actually, since there's now an array of
   ftrace_files for syscalls that can serve the same purpose, the
   enabled_enter/exit_syscalls bit arrays became redundant and were
   removed.
 - moved all the remaining common functions out of the
   traceon/traceoff patch and into the basic trigger framework patch
   and added comments to all the common functions.
 - extensively commented the event_trigger_ops and event_command ops.
 - made the register/unregister_command functions __init.  Since that
   code was originally inspired by similar ftrace code, a new patch
   was added to do the same thing for the register/unregister of the
   ftrace commands (patch 10/11).
 - fixed the event_trigger_regex_open i_private problem noted by
   Masami that's currently being addressed by Oleg Nesterov's fixes
   for this.  Note that that patchset also affects patch 8/11 (update
   filters for multi-buffer, since it touches event filters as well).
   Patch 11/11 depends on that patchset and also moves
   event_file_data() to trace.h.b

v3:
 - added a new patch to the series (patch 8/9 - update event filters
   for multibuffer) to bring the event filters up-to-date wrt the
   multibuffer changes - without this patch, the same filter is
   applied to all buffers regardless of which instance sets it; this
   patch allows you to set per-instance filters as you'd expect.  The
   one exception to this is the 'ftrace subsystem' events, which are
   special and retain their current behavior.
 - changed the syscall soft enabling to keep a per-trace-array array
   of trace_event_files alongside the 'enabled' bitmaps there.  This
   keeps them in a place where they're only allocated for tracing
   and which I think addresses all the previous comments for that
   patch.

v2:
 - removed all changes to __ftrace_event_enable_disable() (except
   for patch 04/11 which clears the soft_disabled bit as discussed)
   and created a separate trace_event_trigger_enable_disable() that
   calls it after setting/clearing the TRIGGER_MODE_BIT.
 - added a trigger_mode enum for future patches that break up the
   trigger calls for filtering, but that's also now used as a command
   id for registering/unregistering commands.
 - removed the enter_file/exit_file members that were added to
   syscall_metadata after realizing that they were unnecessary if
   ftrace_syscall_enter/exit() were modified to receive a pointer
   to the ftrace_file instead of the pointer to the trace_array in
   the ftrace_file.
 - broke up the trigger invocation into two parts so that triggers
   like 'stacktrace' that themselves log into the trace buffer can
   defer the actual trigger invocation until after the current
   record is closed, which is needed for the filter check that
   in turn determines whether the trigger gets invoked.
 - other minor cleanup


This patchset implements 'trace event triggers', which are similar to
the function triggers implemented for 'ftrace filter commands' (see
'Filter commands' in Documentation/trace/ftrace.txt), but instead of
being invoked from function calls are invoked by trace events.
Basically the patchset allows 'commands' to be triggered whenever a
given trace event is hit.  The set of commands 

Re: [PATCH v4 00/11] tracing: trace event triggers

2013-08-22 Thread Tom Zanussi
On Thu, 2013-08-22 at 14:48 -0500, Tom Zanussi wrote:
> Hi Masami,
> 
> Just getting back to this after returning from vacation - I'll be
> sending an update to this patchset addressing your comments shortly...
> 
> On Thu, 2013-08-08 at 11:02 +0900, Masami Hiramatsu wrote:
> > Hi,
> > 
> > (2013/07/30 1:40), Tom Zanussi wrote:
> > > Hi,
> > > 
> > > This is v4 of the trace event triggers patchset, addressing more
> > > comments from Masami Hiramatsu (thanks for the review and comments).
> > > 
> > > One of Masami's comments was on event_trigger_regex_open's use of
> > > inode->i_private and that the same problem was being worked on by Oleg
> > > Nesterov in other places.  That still seems to be the case, but in
> > > order to address that, this patchset is built on top of the current
> > > linux-trace/for-next but also including v2 of Oleg Nesterov's tracing:
> > > open/delete fixes (but with v3 of the 6/6 patch). 
> > 
> > Does this patchset supports multibuffer? It seems that setting a
> > trigger in an event of an instance affects the default event, but not
> > the instance's event.
> 
> You're right of course - I went through the trouble of fixing up the
> event filters to better support multibuffer, but neglected the
> triggers. :-(   But as you point out in a later comment, the fix is
> simple and I've updated the patchset to do that..
> 
> > e.g.
> > 
> > # mkdir instances/hoge
> > # echo 'enable_event:mce:mce_record' > 
> > instances/hoge/events/syscalls/sys_enter_symlink/trigger
> > # cat instances/hoge/events/syscalls/sys_enter_symlink/enable
> > 0*
> > # cat instances/hoge/events/mce/mce_record/enable
> > 0
> > # cat events/mce/mce_record/enable
> > 0*
> > # ln -sf /dev/null /tmp
> > # cat instances/hoge/events/mce/mce_record/enable
> > 0
> > # cat events/mce/mce_record/enable
> > 1*
> > 
> > This looks odd, I expected enabling mce/mce_record under instances/hoge.
> > 
> > And, there is a bug of ftrace itself (not introduced by this series) I've 
> > found.
> > After the above operation, we can delete the instance "hoge", but the 
> > soft-mode
> > flag of mce_record is not cleared, even though there is no trigger referring
> > the event.
> > 
> > # rmdir instances/hoge
> > # cat events/mce/mce_record/enable
> > 1*
> > 
> > This is because the ftrace actually failed to remove(disable) the event 
> > trigger
> > associated with the instance when doing rmdir, but it just removed that 
> > interface.
> > 
> > > v4:
> > >  - made some changes to the soft-disable for syscall patch, according
> > >to Masami's suggestions.  Actually, since there's now an array of
> > >ftrace_files for syscalls that can serve the same purpose, the
> > >enabled_enter/exit_syscalls bit arrays became redundant and were
> > >removed.
> > >  - moved all the remaining common functions out of the
> > >traceon/traceoff patch and into the basic trigger framework patch
> > >and added comments to all the common functions.
> > >  - extensively commented the event_trigger_ops and event_command ops.
> > >  - made the register/unregister_command functions __init.  Since that
> > >code was originally inspired by similar ftrace code, a new patch
> > >was added to do the same thing for the register/unregister of the
> > >ftrace commands (patch 10/11).
> > >  - fixed the event_trigger_regex_open i_private problem noted by
> > >Masami that's currently being addressed by Oleg Nesterov's fixes
> > >for this.  Note that that patchset also affects patch 8/11 (update
> > >filters for multi-buffer, since it touches event filters as well).
> > >Patch 11/11 depends on that patchset and also moves
> > >event_file_data() to trace.h.b
> > 
> > OK, but I think the last 2 patches should be merged to 2/11 as updates.
> > 
> 
> I did merge the last patch into the new series, but left 10/11 separate
> because it really is just a cleanup independent of the trigger code.
> 
> > And also, could you rebase your patches on trace/for-next branch?
> > Since that branch includes most of the latest fixes, it is better to
> > review with it.
> > 
> 
> Sure, but since now everything in for-next is in rc6, I've rebased on
> rc6...
> 

Looks like I spoke too soon - in the few hours between testing this
patchset and posting it, some new commits hit for-next.

for-next rebase coming up...

Tom

> Thanks for all your comments,
> 
> Tom
> 
> > Thank you,
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Contact me

2013-08-22 Thread LEUNG CHEUNG



Hello,I have a mutual business for us worth $22.5 Million ,contact me for
details,e-mail at
mr.lleungche...@hotmail.com

Mr Cheung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ipvsadm: One-packet scheduling with UDP service is unstable

2013-08-22 Thread Drunkard Zhang
2013/8/22 Julian Anastasov :
>
> Hello,
>
> On Thu, 22 Aug 2013, Drunkard Zhang wrote:
>
>> 2013/8/22 Julian Anastasov :
>> >
>> > No kernel options should be related to OPS. I assume
>> > you are not using the SH scheduler. Make sure the OPS mode
>> > is properly applied to the virtual service, check for "ops"
>> > in the configuration:
>> >
>> > cat /proc/net/ip_vs
>>
>> Still no lucky here, ops is set in running config, but it's not like
>> that in real world.
>>
>> vs3 ~ # cat /proc/net/ip_vs
>> IP Virtual Server version 1.2.1 (size=1024)
>> Prot LocalAddress:Port Scheduler Flags
>>   -> RemoteAddress:Port Forward Weight ActiveConn InActConn
>> UDP  96A46478:0202 wrr ops
>
>>   -> 96A46450:0202  Route   25 0  1
>
> The OPS connections are accounted in InActConn
> for a very short period, they live up to 1 jiffie, eg. 10ms.
> Also, WRR should be reliable for OPS while other
> schedulers (eg. *LC) are not suitable.

I noticed this too. While ops working, the InActConn is always
changing too, if it's fixed, the ops is not working.

>> And the traffic routed to each realserver didn't following weight I
>> set, it's routed pretty much one to one. I got 17 udp sources sending
>> to 16 different realservers, the others are bonding to another VIP.
>>
>> Prot LocalAddress:Port CPSInPPS   OutPPSInBPS   
>> OutBPS
>>   -> RemoteAddress:Port
>> UDP  x.x.x.120:514 0676220 123393730
>>   -> x.x.x.65:514  0   290 28950
>>   -> x.x.x.66:514  0  2250218500
>
> Do you see the same problem with ipvsadm -Ln --stats ?
> ipvsadm -Z may be needed to zero the stats after restoring all
> rules. "Conns" counter in stats should be according to WRR
> weights, it shows the scheduler decisions.

After every restore, the stats also zeroed, right? While, ops still not working.

vs3 ~/pkgs # ./ipvsadm -Z
vs3 ~/pkgs # ./ipvsadm -ln --stats -u [snipped]
Prot LocalAddress:Port   Conns   InPkts  OutPkts  InBytes OutBytes
  -> RemoteAddress:Port
UDP  x.x.x.120:514 0 1249704002572M0
  -> x.x.x.65:514  0 39750   3941710
  -> x.x.x.66:514  0484660  48357160
  -> x.x.x.67:514  0   4070510 584796210
  -> x.x.x.68:514  0   5611200 852898920
  -> x.x.x.69:514  0309580  31205060
  -> x.x.x.70:514  0   6454750  100552K0
  -> x.x.x.71:514  0   1472280 145606490
  -> x.x.x.72:514  0   5356930 840693900
  -> x.x.x.73:514  0   5647870 881651400
  -> x.x.x.74:514  0   3467340 532560880
  -> x.x.x.75:514  0472320  48015780
  -> x.x.x.76:514  0  11752880  192699K0
  -> x.x.x.77:514  0   2549150 259397200
  -> x.x.x.78:514  0  27015310  652417K0
  -> x.x.x.79:514  0  24266860  573897K0
  -> x.x.x.80:514  0  25999010  629793K0
  -> x.x.x.81:514  00000
  -> x.x.x.82:514  00000
  -> x.x.x.83:514  00000
  -> x.x.x.84:514  00000
  -> x.x.x.85:514  00000
  -> x.x.x.86:514  00000
  -> x.x.x.87:514  00000
  -> x.x.x.88:514  00000
  -> x.x.x.89:514  00000

> In your rates listing CPS 0 is confusing, even for OPS.
> Is it from the new ipvsadm?

Yes, latest git version. When CPS is changing, the ops works, or it's not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Luck, Tony
<1>[  383.209057] RIP  [] sysrq_handle_crash+0x16/0x20
<4>[  383.209057]  RSP 
<4>[  383.209057] CR2: 
<4>[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
<3>[  383.209057] pstore: compression failed for Part 2 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
<3>[  383.209057] pstore: compression failed for Part 5 returned -5

Interesting.  With ERST backend I didn't see these messages.  Traces in 
pstore recovered files go as far as the line before the "---[ end trace 
04a1cddad37b4b33 ]---"

Why the difference depending on which back end is in use?

But I agree that we shouldn't have these messages.  They use up space
in the persistent store that could be better used saving some more lines
from earlier in the console log.

-Tony


Re: [PATCH RFC v2 2/5] spmi: Linux driver framework for SPMI

2013-08-22 Thread Greg Kroah-Hartman
On Fri, Aug 09, 2013 at 01:37:09PM -0700, Josh Cartwright wrote:
> +static char dbgfs_help[] =
> + "SPMI Debug-FS support\n"
> + "\n"
> + "Hierarchy schema:\n"
> + "/sys/kernel/debug/spmi\n"
> + "   /help-- Static help text\n"
> + "   /spmi-0  -- Directory for SPMI bus 0\n"
> + "   /spmi-0/0-1  -- Directory for SPMI device '0-1'\n"
> + "   /spmi-0/0-1/address  -- Starting register for reads or writes\n"
> + "   /spmi-0/0-1/count-- Number of registers to read (only used 
> for reads)\n"
> + "   /spmi-0/0-1/data -- Initiates the SPMI read (formatted 
> output)\n"
> + "   /spmi-0/0-1/data_raw -- Initiates the SPMI raw read or write\n"
> + "   /spmi-n  -- Directory for SPMI bus n\n"
> + "\n"
> + "To perform SPMI read or write transactions, you need to first write 
> the\n"
> + "address of the slave device register to the 'address' file.  For 
> read\n"
> + "transactions, the number of bytes to be read needs to be written to 
> the\n"
> + "'count' file.\n"
> + "\n"
> + "The 'address' file specifies the 20-bit address of a slave device 
> register.\n"
> + "The upper 4 bits 'address[19..16]' specify the slave identifier (SID) 
> for\n"
> + "the slave device.  The lower 16 bits specify the slave register 
> address.\n"
> + "\n"
> + "Reading from the 'data' file will initiate a SPMI read transaction 
> starting\n"
> + "from slave register 'address' for 'count' number of bytes.\n"
> + "\n"
> + "Writing to the 'data' file will initiate a SPMI write transaction 
> starting\n"
> + "from slave register 'address'.  The number of registers written to 
> will\n"
> + "match the number of bytes written to the 'data' file.\n"
> + "\n"
> + "Example: Read 4 bytes starting at register address 0x1234 for SID 2\n"
> + "\n"
> + "echo 0x21234 > address\n"
> + "echo 4 > count\n"
> + "cat data\n"
> + "\n"
> + "Example: Write 3 bytes starting at register address 0x1008 for SID 1\n"
> + "\n"
> + "echo 0x11008 > address\n"
> + "echo 0x01 0x02 0x03 > data\n"
> + "\n"
> + "Note that the count file is not used for writes.  Since 3 bytes are\n"
> + "written to the 'data' file, then 3 bytes will be written across the\n"
> + "SPMI bus.\n\n";

The help file within the kernel is a nice touch :)

I do know the only rule for debugfs is "There are no rules", but this
looks like you are going to have the way to interact to this bus and
devices as debugfs, is that correct?

Or is this only for "debugging"?  If so, please document it as such.

> +void spmi_dfs_controller_add(struct spmi_controller *ctrl)
> +{
> + ctrl->dfs_dir = debugfs_create_dir(dev_name(>dev),
> +spmi_debug_root);
> + WARN_ON(!ctrl->dfs_dir);

Why?  What is a user going to be able to do with something like this?
You do this in a number of places, please provide "valid" error messages
instead of just kernel stack tracebacks, failing to show the device for
which the error occured (hint, use dev_err()).

Again, never use WARN_ON() as error handling, it's lazy, and wrong.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Seiji Aguchi


>   * callback from kmsg_dump. (s2,l2) has the most recently
>   * written bytes, older bytes are in (s1,l1). Save as much
> @@ -148,23 +243,56 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>   char *dst;
>   unsigned long size;
>   int hsize;
> + int zipped_len = -1;
>   size_t len;
> - bool compressed = false;
> + bool compressed;
> + size_t total_len;
> 
> - dst = psinfo->buf;
> - hsize = sprintf(dst, "%s#%d Part%d\n", why, oopscount, part);
> - size = psinfo->bufsize - hsize;
> - dst += hsize;
> + if (big_oops_buf) {
> + dst = big_oops_buf;
> + hsize = sprintf(dst, "%s#%d Part%d\n", why,
> + oopscount, part);
> + size = big_oops_buf_sz - hsize;
> 
> - if (!kmsg_dump_get_buffer(dumper, true, dst, size, ))
> - break;
> + if (!kmsg_dump_get_buffer(dumper, true, dst + hsize,
> + size, ))
> + break;
> +
> + zipped_len = pstore_compress(dst, psinfo->buf,
> + hsize + len, psinfo->bufsize);
> +
> + if (zipped_len > 0) {
> + compressed = true;
> + total_len = zipped_len;
> + } else {
> + pr_err("pstore: compression failed for Part %d"
> + " returned %d\n", part, zipped_len);
> + pr_err("pstore: Capture uncompressed"
> + " oops/panic report of Part %d\n", 
> part);

Why did you add these messages in pstore_dump()?
In my test case, they are not needed


# cat dmesg-efi-1
Panic#2 Part1
<4>[  383.209057] RBP: 88006f551e80 R08: 0002 R09: 

<4>[  383.209057] R10: 0382 R11:  R12: 
0063
<4>[  383.209057] R13: 0286 R14: 000f R15: 

<4>[  383.209057] FS:  7f53317cc740() GS:88007c40() 
knlGS:
<4>[  383.209057] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  383.209057] CR2:  CR3: 78a73000 CR4: 
06f0
<4>[  383.209057] Stack:
<4>[  383.209057]  88006f551eb8 813d40a2 0002 
7f53317db000
<4>[  383.209057]  88006f551f50 0002  
88006f551ed8
<4>[  383.209057]  813d45aa 7f53317db000 88003f8c2b00 
88006f551ef8
<4>[  383.209057] Call Trace:
<4>[  383.209057]  [] __handle_sysrq+0xa2/0x170
<4>[  383.209057]  [] write_sysrq_trigger+0x4a/0x50
<4>[  383.209057]  [] proc_reg_write+0x3d/0x80
<4>[  383.209057]  [] vfs_write+0xc0/0x1f0
<4>[  383.209057]  [] SyS_write+0x4c/0xa0
<4>[  383.209057]  [] system_call_fastpath+0x16/0x1b
<4>[  383.209057] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8  
04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 
<1>[  383.209057] RIP  [] sysrq_handle_crash+0x16/0x20
<4>[  383.209057]  RSP 
<4>[  383.209057] CR2: 
<4>[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
<3>[  383.209057] pstore: compression failed for Part 2 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
<3>[  383.209057] pstore: compression failed for Part 5 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 5
<3>[  383.209057] pstore: compression failed for Part 12 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 12
<0>[  383.209057] Kernel panic - not syncing: Fatal exception
<3>[  383.209057] drm_kms_helper: panic occurred, switching back to text console



In efi-pstore case, after rebooting a system,
we are able to know which entry failed to compress with 'C' or 'D' as below.

#ls /sys/firmware/efi/vars/ |grep dump
dump-type0-10-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-10-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-11-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-1-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-11-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-12-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-1-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-12-2-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-13-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-13-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-2-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0

RE: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore

2013-08-22 Thread Seiji Aguchi


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Aruna Balakrishnaiah
> Sent: Friday, August 16, 2013 9:18 AM
> To: linuxppc-...@ozlabs.org; tony.l...@intel.com; 
> linux-kernel@vger.kernel.org; keesc...@chromium.org
> Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; 
> cbouatmai...@gmail.com;
> mah...@linux.vnet.ibm.com; ccr...@android.com
> Subject: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore
> 
> Based on the flag 'compressed' set or not, pstore will decompress the
> data returning a plain text file. If decompression fails for a particular
> record it will have the compressed data in the file which can be
> decompressed with 'openssl' command line tool.

If the decompression fails and openssl doesn't work, the worst case is that 
users can't read the entry.
In that case, pstore is meaningless at all.

Also, for users who want to get a single panic message, a compression is not 
needed.

So, I think we still have to support non-compression mode.
(IMO, pstore can take kdump as a model. Kdump supports both compression and 
non-compression mode.)

But, if you think my comment is outside this patchset, it's OK.
We can make it with a separate patch.

Seiji


Re: [GIT PULL v2] DT/core: cpu_ofnode updates for v3.12

2013-08-22 Thread Rafael J. Wysocki
On Thursday, August 22, 2013 02:57:56 PM Sudeep KarkadaNagesha wrote:
> Hi Rafael,
> 
> Here is the v2 of the pull request for cpu of_node updates for v3.12
> It includes ACK for all the new changes since v1(mainly from Ben for
> PPC). Currently there's trivial conflict with today's linux-next in 3
> files. Let me know if you need me to rebase this on any particular
> branch if needed.
> 
> Regards,
> Sudeep
> 
> The following changes since commit b36f4be3de1b123d8601de062e7dbfc904f305fb:
> 
>   Linux 3.11-rc6 (2013-08-18 14:36:53 -0700)
> 
> are available in the git repository at:
> 
>   git://linux-arm.org/linux-skn.git cpu_of_node
> 
> for you to fetch changes up to 1037b2752345cc5666e90b711a913ab2ae6c5920:
> 
>   cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes
> (2013-08-21 10:29:56 +0100)
> 
> 
> Sudeep KarkadaNagesha (19):
>microblaze: remove undefined of_get_cpu_node declaration
>openrisc: remove undefined of_get_cpu_node declaration
>powerpc: refactor of_get_cpu_node to support other architectures
>of: move of_get_cpu_node implementation to DT core library
>ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id
>driver/core: cpu: initialize of_node in cpu's device struture
>of/device: add helper to get cpu device node from logical cpu index
>ARM: topology: remove hwid/MPIDR dependency from cpu_capacity
>ARM: mvebu: remove device tree parsing for cpu nodes
>drivers/bus: arm-cci: avoid parsing DT for cpu device nodes
>cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes
>cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
>cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes
>cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes
>cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes
>cpufreq: arm_big_little: remove device tree parsing for cpu nodes
>cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes
>cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes
>cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes
> 
>  arch/arm/kernel/devtree.c   |  5 ++
>  arch/arm/kernel/topology.c  | 61 +---
>  arch/arm/mach-imx/mach-imx6q.c  |  3 +-
>  arch/arm/mach-mvebu/platsmp.c   | 51 ++---
>  arch/microblaze/include/asm/prom.h  |  3 -
>  arch/openrisc/include/asm/prom.h|  3 -
>  arch/powerpc/include/asm/prom.h |  3 -
>  arch/powerpc/kernel/prom.c  | 43 +---
>  drivers/base/cpu.c  |  2 +
>  drivers/bus/arm-cci.c   | 28 ++--
>  drivers/cpufreq/arm_big_little_dt.c | 40 ---
>  drivers/cpufreq/cpufreq-cpu0.c  | 23 +-
>  drivers/cpufreq/highbank-cpufreq.c  | 18 ++---
>  drivers/cpufreq/imx6q-cpufreq.c |  4 +-
>  drivers/cpufreq/kirkwood-cpufreq.c  |  8 ++-
>  drivers/cpufreq/maple-cpufreq.c | 23 +-
>  drivers/cpufreq/pmac32-cpufreq.c|  5 +-
>  drivers/cpufreq/pmac64-cpufreq.c| 47 +++-
>  drivers/cpufreq/spear-cpufreq.c |  4 +-
>  drivers/of/base.c   | 95 
>  include/linux/cpu.h |  1 +
>  include/linux/of.h  |  7 ++
>  include/linux/of_device.h   | 15 
>  23 files changed, 226 insertions(+), 266 deletions(-)

Pulled, thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >