[PATCH 2/3] panic: improve panic_timeout calculation

2013-11-07 Thread Felipe Contreras
We want to calculate the blinks per second, and instead of making it 5
(1000 / (3600 / 18)), let's make it 4, so the user can see two blinks
per second.

Signed-off-by: Felipe Contreras 
---
 kernel/panic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 46e37ff..5a0bdaa 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -25,7 +25,7 @@
 #include 
 
 #define PANIC_TIMER_STEP 100
-#define PANIC_BLINK_SPD 18
+#define PANIC_BLINK_SPD 4
 
 int panic_on_oops = CONFIG_PANIC_ON_OOPS_VALUE;
 static unsigned long tainted_mask;
@@ -147,7 +147,7 @@ void panic(const char *fmt, ...)
touch_nmi_watchdog();
if (i >= i_next) {
i += panic_blink(state ^= 1);
-   i_next = i + 3600 / PANIC_BLINK_SPD;
+   i_next = i + 1000 / PANIC_BLINK_SPD;
}
mdelay(PANIC_TIMER_STEP);
}
@@ -181,7 +181,7 @@ void panic(const char *fmt, ...)
touch_softlockup_watchdog();
if (i >= i_next) {
i += panic_blink(state ^= 1);
-   i_next = i + 3600 / PANIC_BLINK_SPD;
+   i_next = i + 1000 / PANIC_BLINK_SPD;
}
mdelay(PANIC_TIMER_STEP);
}
-- 
1.8.4.2+fc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH RESEND 1/2] clk: add clk accuracy retrieval support

2013-11-07 Thread Mike Turquette
Quoting Boris BREZILLON (2013-10-13 10:17:10)
> The clock accuracy is expressed in ppb (parts per billion) and represents
> the possible clock drift.
> Say you have a clock (e.g. an oscillator) which provides a fixed clock of
> 20MHz with an accuracy of +- 20Hz. This accuracy expressed in ppb is
> 20Hz/20MHz = 1000 ppb (or 1 ppm).
> 
> Clock users may need the clock accuracy information in order to choose
> the best clock (the one with the best accuracy) across several available
> clocks.
> 
> This patch adds clk accuracy retrieval support for common clk framework by
> means of a new function called clk_get_accuracy.
> This function returns the given clock accuracy expressed in ppb.
> 
> In order to get the clock accuracy, this implementation adds one callback
> called recalc_accuracy to the clk_ops structure.
> This callback is given the parent clock accuracy (if the clock is not a
> root clock) and should recalculate the given clock accuracy.
> 
> This callback is optional and may be implemented if the clock is not
> a perfect clock (accuracy != 0 ppb).
> 
> Signed-off-by: Boris BREZILLON 
> ---
>  Documentation/clk.txt|4 ++
>  drivers/clk/Kconfig  |4 ++
>  drivers/clk/clk.c|   92 
> --
>  include/linux/clk-private.h  |1 +
>  include/linux/clk-provider.h |   11 +
>  include/linux/clk.h  |   17 
>  6 files changed, 125 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/clk.txt b/Documentation/clk.txt
> index 3aeb5c4..dc52da1 100644
> --- a/Documentation/clk.txt
> +++ b/Documentation/clk.txt
> @@ -77,6 +77,8 @@ the operations defined in clk.h:
> int (*set_parent)(struct clk_hw *hw, u8 index);
> u8  (*get_parent)(struct clk_hw *hw);
> int (*set_rate)(struct clk_hw *hw, unsigned long);
> +   unsigned long   (*recalc_accuracy)(struct clk_hw *hw,
> +  unsigned long 
> parent_accuracy);
> void(*init)(struct clk_hw *hw);
> };
>  
> @@ -202,6 +204,8 @@ optional or must be evaluated on a case-by-case basis.
>  .set_parent |  | | n | y   | n|
>  .get_parent |  | | n | y   | n|
>  |  | |   | |  |
> +.recalc_rate|  | |   | |  |

s/recalc_rate/recalc_accuracy/

> +|  | |   | |  |
>  .init   |  | |   | |  |
>  ---
>  [1] either one of round_rate or determine_rate is required.
> diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
> index 279407a..4d12ae7 100644
> --- a/drivers/clk/Kconfig
> +++ b/drivers/clk/Kconfig
> @@ -6,12 +6,16 @@ config CLKDEV_LOOKUP
>  config HAVE_CLK_PREPARE
> bool
>  
> +config HAVE_CLK_GET_ACCURACY
> +   bool
> +

This sort of thing gets messy. For platforms converted to common struct
clk we select HAVE_CLK_PREPARE and we let legacy platforms select it on
a case-by-case basis.

For something like HAVE_CLK_GET_ACCURACY I am inclined to only add it
for platforms converted to the common struct clk and not even expose it
to legacy clock framework implementations. In those cases the call to
clk_get_accuracy would return -EINVAL or -EPERM or something.

Russell, any thoughts on that approach?

>  config HAVE_MACH_CLKDEV
> bool
>  
>  config COMMON_CLK
> bool
> select HAVE_CLK_PREPARE
> +   select HAVE_CLK_GET_ACCURACY
> select CLKDEV_LOOKUP
> ---help---
>   The common clock framework is a single definition of struct
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index a004769..6a8f3ef 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -104,10 +104,11 @@ static void clk_summary_show_one(struct seq_file *s, 
> struct clk *c, int level)
> if (!c)
> return;
>  
> -   seq_printf(s, "%*s%-*s %-11d %-12d %-10lu",
> +   seq_printf(s, "%*s%-*s %-11d %-12d %-10lu %-11lu",
>level * 3 + 1, "",
>30 - level * 3, c->name,
> -  c->enable_count, c->prepare_count, clk_get_rate(c));
> +  c->enable_count, c->prepare_count, clk_get_rate(c),
> +  clk_get_accuracy(c));
> seq_printf(s, "\n");
>  }
>  
> @@ -129,8 +130,8 @@ static int clk_summary_show(struct seq_file *s, void 
> *data)
>  {
> struct clk *c;
>  
> -   seq_printf(s, "   clockenable_cnt  
> prepare_cnt  rate\n");
> -   seq_printf(s, 
> "-\n");
> +   seq_printf(s, "   clock

Re: kernel BUG at kernel/kallsyms.c:222!

2013-11-07 Thread Ming Lei
On Fri, Nov 8, 2013 at 7:44 AM, Rusty Russell  wrote:
> Axel Lin  writes:
>> 2013/11/7 Ming Lei :
>>> Hi,
>>>
>>> On Thu, Nov 7, 2013 at 10:47 AM, Axel Lin  wrote:

 hi Ming,
 Seems CONFIG_PAGE_OFFSET is not configurabe in "make menuconfig".
 And I found CONFIG_PAGE_OFFSET=0xC000 for all below configs...
 $ make at91_dt_defconfig; grep  CONFIG_PAGE_OFFSET .config
 $ make ep93xx_defconfig; grep  CONFIG_PAGE_OFFSET .config
 $ make imx_v4_v5_defconfig; grep CONFIG_PAGE_OFFSET .config
 $ make mxs_defconfig; grep CONFIG_PAGE_OFFSET .config
 $ make omap2plus_defconfig; grep CONFIG_PAGE_OFFSET .config
 $ make s3c6400_defconfig; grep CONFIG_PAGE_OFFSET .config
 $ make at91x40_defconfig; grep CONFIG_PAGE_OFFSET .config
 ( at91x40_defconfig is also arm7tdmi )
>>>
>>> Firstly it can be configured via VMSPLIT_3GVMSPLIT_2G/VMSPLIT_1G.
>>>
>>> Secondly, configurable or not isn't the point, and maybe some uclinux
>>> platforms do not use CONFIG_PAGE_OFFSET at all, but they should
>>> set it as a reasonable value or at least be below than the start link
>>> address of vmlinux.
>>
>> Hi Ming,
>>
>> I found in arch/arm/include/asm/memory.h:
>> CONFIG_PAGE_OFFSET is not used if !CONFIG_MMU.
>> So looks like setting CONFIG_PAGE_OFFSET to other value still won't work.
>
> This seems like the simplest solution, but it may mean you still have
> crap in /proc/kallsyms.
>
> Does it work for you?

Yes, it should work, but I am wondering if perf can work on uClinux.

Axel, could you post your .config?


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] doc: devicetree: Add bindings documentation for omap-des driver

2013-11-07 Thread Santosh Shilimkar
On Thursday 07 November 2013 07:37 PM, Joel Fernandes wrote:
> Add documentation for the generic OMAP DES crypto module describing the device
> tree bindings.
> 
> Signed-off-by: Joel Fernandes 
> ---
Thanks Joel for picking this up.
FWIW, Acked-by: Santosh Shilimkar 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP2+: omap_device: maintain sane runtime pm status around suspend/resume

2013-11-07 Thread Kevin Hilman
Nishanth Menon  writes:

> On 11/07/2013 05:44 PM, Kevin Hilman wrote:
>> Nishanth Menon  writes:
>>
>>> OMAP device hooks around suspend|resume_noirq ensures that hwmod
>>> devices are forced to idle using omap_device_idle/enable as part of
>>> the last stage of suspend activity.
>>>
>>> For a device such as i2c who uses autosuspend, it is possible to enter
>>> the suspend path with dev->power.runtime_status = RPM_ACTIVE.
>>>
>>> As part of the suspend flow, the generic runtime logic would increment
>>> it's dev->power.disable_depth to 1. This should prevent further
>>> pm_runtime_get_sync from succeeding once the runtime_status has been
>>> set to RPM_SUSPENDED.
>>>
>>> Now, as part of the suspend_noirq handler in omap_device, we force the
>>> following: if the device status is !suspended, we force the device
>>> to idle using omap_device_idle (clocks are cut etc..). This ensures
>>> that from a hardware perspective, the device is "suspended". However,
>>> runtime_status is left to be active.
>>>
>>> *if* an operation is attempted after this point to
>>> pm_runtime_get_sync, runtime framework depends on runtime_status to
>>> indicate accurately the device status, and since it sees it to be
>>> ACTIVE, it assumes the module is functional and returns a non-error
>>> value. As a result the user will see pm_runtime_get succeed, however a
>>> register access will crash due to the lack of clocks.
>>
>> Ouch.
>>
>> Dumb Q: who is requesting an i2c transaction after ->suspend_noirq().
>> The i2c driver itself should be able to detect that it's being accessed
>> after this point and return an error.
>
> i2c dropped it generic suspend handlers at
> commit 584b408d37af4e0b38ad5b60f236381bcdf396bc
> Author: Kevin Hilman 

sheesh, who is that crazy TI guy.  They should probably fire him.

> Date:   Thu Aug 4 07:53:02 2011 -0700
>
> Revert "i2c-omap: fix static suspend vs. runtime suspend"
>
> Also, as of v3.1, the PM domain level code for OMAP handles device
> power state transistions automatically for devices, so drivers no
> longer need to specifically call the bus/pm_domain methods themselves.
>
>
> - So it is rightly in the mercy of runtime PM to adequately represent
> it's state. I disagree that i2c driver should in addition have to
> remember what it's generic suspend state is.

That's debatable I guess.  The ideal world is that runtime PM hides all
of this, but I'm not sure it's achievable in all cases.

>  - See the link to the cpufreq and the logs to see the call stack
> where it fails.
>
> Now, this is fine, since omap_i2c_xfer should ideally fail with a
> pm_runtime_get_sync returning -EACCESS when device is really suspended
> (remember, we are doing autosuspend - so, in the case I caught, there
> was regulator voltage set prior to entering suspend, but timeout was
> not yet hit for autosuspend of i2c to kick in yet).

Ah, I see.  Makes sense.

>>
>> That being said, I agree that omap_device should still be catching this
>> case in order to find/fix driver races like this.
>
>>
>>> To prevent this from happening, we should ensure that runtime_status
>>> exactly indicates the device status. As a result of this change
>>> any further calls to pm_runtime_get* would return -EACCES (since
>>> disable_depth is 1). On resume, we restore the clocks and runtime
>>> status exactly as we suspended with.
>>>
>>> Reported-by: J Keerthy 
>>> Signed-off-by: Nishanth Menon 
>>> Acked-by: Rajendra Nayak 
>>> ---
>>> patch baseline: V3.12 tag (also applies on linux-next next-20131107 tag)
>>>
>>> Logs from 3.12 based vendor kernel:
>>> Before: http://pastebin.com/m5KxnB7a
>>> After: http://pastebin.com/8AfX4e7r
>>>
>>> The discussion on cpufreq side of the story which triggered this scenario:
>>> http://marc.info/?l=linux-pm=138263811321921=2
>>>
>>> Tested on TI vendor kernel (with dt boot):
>>> AM335x: evm, BBB, sk, BBW
>>> OMAP5uEVM, DRA7-evm
>>>
>>>   arch/arm/mach-omap2/omap_device.c |   16 ++--
>>>   arch/arm/mach-omap2/omap_device.h |1 +
>>>   2 files changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm/mach-omap2/omap_device.c 
>>> b/arch/arm/mach-omap2/omap_device.c
>>> index b69dd9a..87ecbb0 100644
>>> --- a/arch/arm/mach-omap2/omap_device.c
>>> +++ b/arch/arm/mach-omap2/oma

[PATCH] doc: devicetree: Add bindings documentation for omap-des driver

2013-11-07 Thread Joel Fernandes
Add documentation for the generic OMAP DES crypto module describing the device
tree bindings.

Signed-off-by: Joel Fernandes 
---
 .../devicetree/bindings/crypto/omap-des.txt| 28 ++
 1 file changed, 28 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/omap-des.txt

diff --git a/Documentation/devicetree/bindings/crypto/omap-des.txt 
b/Documentation/devicetree/bindings/crypto/omap-des.txt
new file mode 100644
index 000..0637647
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/omap-des.txt
@@ -0,0 +1,28 @@
+OMAP SoC DES crypto Module
+
+Required properties:
+
+- compatible : Should be "ti,omap4-des"
+- ti,hwmods: Name of the hwmod associated with the DES module
+- reg : Offset and length of the register set for the module
+- interrupts : the interrupt-specifier for the DES module
+- clocks : phandle to the functional clock node of the DES module
+- clock-names : Name of the functional clock
+
+Optional properties:
+- dmas: DMA specifiers for tx and rx dma. See the DMA client binding,
+   Documentation/devicetree/bindings/dma/dma.txt
+- dma-names: DMA request names should include "tx" and "rx" if present
+
+Example:
+   /* DRA7xx SoC */
+   des: des@480a5000 {
+   compatible = "ti,omap4-des";
+   ti,hwmods = "des";
+   reg = <0x480a5000 0xa0>;
+   interrupts = ;
+   dmas = < 117>, < 116>;
+   dma-names = "tx", "rx";
+   clocks = <_iclk_div>;
+   clock-names = "fck";
+   };
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] ARM: pinctrl: Add Broadcom Capri pinctrl driver

2013-11-07 Thread Sherman Yin

On 13-11-06 01:40 AM, Linus Walleij wrote:

On Wed, Nov 6, 2013 at 3:02 AM, Sherman Yin  wrote:

When Linus asked me to try using generic pinconf instead, I ran into
problems with this feature due to how the generic pinconf properties are
defined differently than my properties - perhaps this feature just doesn't
work for generic pinconf-based drivers with the (Unless we are ok with using
1/0 for boolean properties, but it has already been pointed out that these
should be valueless.).


Well it seems you would need a way to pass an array of the same
boolean property and that seems a bit more complex and hard to
read than the generic boolean bindings.

You would have to patch the OF core to do something like that:

bias-pull-up = ;

1/0 isn't so good I think, what should the parser do with e.g. 2?
This is more to the point.


I would imagine that the platform-specific device tree bindings would 
need to clearly explain what the valid values are, as they should.  If 
we're using integers, we could either have a) !0 and 0, or b) just 1 and 
0, and everything else is an error.  Or c) the platform could decide 
that the value provides addition info like pull-up-strength, so 0 = no 
pull up, >0 = pull up enabled and the number is the pull up strength in 
Ohm (bindings should indicate which values are valid), and everything 
else is an error.



While I'd love to be able define my pin config nodes this way, if I have to
use generic pinconf for the driver to be upstreamed, then I'm fine with it.


Well you need to use generic pin config because all the custom
stuff - i.MX comes to mind - is creating a mess. I prefer that we
share bindings and code, as any programmer would...


Ok


That said, if you can patch the OF core and the generic pin config
parser to do what you want with lists like that, it's your pick.
It may take some time though.


I don't mind patching the generic pin config, and I don't think the core 
needs to change, but first there needs to be an agreement on the changes 
since other drivers will need to use pinconf-generic. Am I the only one 
who would like to see this feature, or do others see value in being able 
to group more pins together - resulting in fewer sub-nodes?  Do the pros 
out-weight the cons?



"input disable"
This setting disconnects the input (DIN) to the internal logic from >the
pin pad. However, the output (DOUT) can still drive the pad.  It
seems to match PIN_CONFIG_OUTPUT, but the current generic option is
either "output-low" or "output-high" - are these referring to a static
output of 0 and 1?


What's the best property to use in this case?


Seems like a new case.

What about you patch include/linux/pinctrl/pinconf-generic.h
to add PIN_CONFIG_INPUT_DISABLE with this semantic
and also patch the generic pinconf parser in drivers/pinctrl/pinconf-generic.c
to handle this?


Sure, I can add PIN_CONFIG_INPUT_DISABLE.  However I suspect people 
might be confused by this and PIN_CONFIG_OUTPUT.



"mode"
This controls several aspect of the pin (slew rate, pull up strength,
etc) to meet I2C specs for Standard/Fast mode vs High Speed mode.  I
think the best way is to map this to slew rate, which would require
some explanation because the meaning of slew rate differs depending on
what pin function is selected:
- When I2C (*_SCL or *_SDA) function is selected for the pin: 0:
  Standard (100kbps)
  & Fast mode (400kbps), 1: High Speed mode (3.4Mbps)
- When IC_DM or IC_DP function is selected, 0: normal slew rate, 1:
  fast slew rate
- Else: 0: fast slew rate, 1: normal slew rate


Do we agree that the "slew rate" is the best property to use for "mode"?


It seems to be indeed mostly related to slew rate.

However if you want a custom brcm,mode binding that should
be possible too, as well as augmenting the driver to use
*both* generic pinconf *and* some custom config options
on top. That has not been done so far though I think, so you
might need a bit of hacking to do that.


I think I'll stick with slew rate.  :)

Regards,
Sherman


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo

2013-11-07 Thread Minchan Kim
Hi Hannes, Rik.

On Thu, Nov 07, 2013 at 04:21:32PM -0500, Johannes Weiner wrote:
> On Thu, Nov 07, 2013 at 10:13:45AM -0500, Rik van Riel wrote:
> > 
> > > >  fs/proc/meminfo.c | 36 
> > > >  1 file changed, 36 insertions(+)
> > > 
> > > Documentation/filesystems/proc.txt told me it's feeling all offended.
> > 
> > You're right, of course.  Here is version 2 :)
> > 
> > ---8<---
> > 
> > Subject: provide estimated available memory in /proc/meminfo
> > 
> > Many load balancing and workload placing programs check /proc/meminfo
> > to estimate how much free memory is available. They generally do this
> > by adding up "free" and "cached", which was fine ten years ago, but
> > is pretty much guaranteed to be wrong today.
> > 
> > It is wrong because Cached includes memory that is not freeable as
> > page cache, for example shared memory segments, tmpfs, and ramfs,
> > and it does not include reclaimable slab memory, which can take up
> > a large fraction of system memory on mostly idle systems with lots
> > of files.
> > 
> > Currently, the amount of memory that is available for a new workload,
> > without pushing the system into swap, can be estimated from MemFree,
> > Active(file), Inactive(file), and SReclaimable, as well as the "low"
> > watermarks from /proc/zoneinfo.
> > 
> > However, this may change in the future, and user space really should
> > not be expected to know kernel internals to come up with an estimate
> > for the amount of free memory.
> > 
> > It is more convenient to provide such an estimate in /proc/meminfo.
> > If things change in the future, we only have to change it in one place.
> > 
> > Signed-off-by: Rik van Riel 
> > Reported-by: Erik Mouw 
> 
> Acked-by: Johannes Weiner 
> 
> I have a suspicion that people will end up relying on this number to
> start new workloads in situations where lots of the page cache is
> actually heavily used.  We might not swap, but there will still be IO
> from thrashing cache.
> 
> Maybe we'll have to subtract mapped cache pages in the future to
> mitigate this risk somehow...

It might be huge false positive if there was mmaped used-once stream so
that userlevel could free some objects or kill someone to get a free memory.

And shouldn't we consider dirty + writeback, either?

Anyway, this feature is very handy. Swapping/LMK/OOM is very sensivie
subject for embedded people these days so we have been used some matrix
to get a ballpark estimate like

"buffers + cached + Sreclaimable - (SHMEM + dirty + writeback +
 workingset)"

We included workingset to prevent thrashing page cache.
So, my point is we could include some tunable value in the expression
like workingset and default might be half of page cache like this patch
but admin can control it if the platform is aware of his workingset size.

> 
> Anyway, we can defer this to when it's proven to be an actual problem.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Congratulation!!! Confirm Mail Receipt!!

2013-11-07 Thread Qatar Foundation
This is to inform you that you were among the lucky beneficiary selected to 
receive this donations award sum of $ 1Million US DOLLAR each as charity 
donations/aid from the Qatar Foundation to promote your business and personal 
Interest. Kindly revert back for claims processing via email: 
qatarfoundati...@yahoo.com.hk

On behalf of the foundation, we say congratulations to you.

Yours Sincerely,
President of Qatar Foundation:
Dr. Mohammad Fathy Saoud


IMPORTANT: If you receive this message in your spam or junk its due to your 
network provider.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb-serial lockdep trace in linus' current tree.

2013-11-07 Thread Dave Jones
On Fri, Nov 08, 2013 at 01:09:03AM +0100, Johan Hovold wrote:
 
 > A recent change in usb-serial used the wrong memory-allocation flag in
 > write(), which results in a
 > 
 >  [5.979005] BUG: sleeping function called from invalid context at 
 > /home/johan/src/linux/linux-nu/mm/dmapool.c:310
 > 
 > with usb-next when using a usb-serial console as you seem to do be doing
 > as well.
 > 
 > Could be related. Care to give the fix below a try?

Seems to work. Feel free to add

Tested-by: Dave Jones 

if this is the agreed upon fix for this.

thanks,

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] net: make ndev->irq signed for error handling

2013-11-07 Thread David Miller
From: Dan Carpenter 
Date: Thu, 7 Nov 2013 10:48:49 +0300

> There is a bug in cpsw_probe() where we do:
> 
>   ndev->irq = platform_get_irq(pdev, 0);
>   if (ndev->irq < 0) {
> 
> The problem is that "ndev->irq" is unsigned so the error handling
> doesn't work.  I have changed it to a regular int.
> 
> Signed-off-by: Dan Carpenter 

Even though some simplifications have been suggested wrt. how
this irq value is obtained in the one place where it is used,
I am applying Dan's patch for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] Add Legacy PM OPS usage checks to class, bus, and driver register functions

2013-11-07 Thread Shuah Khan
Add Legacy PM OPS usage checks to class, bus, and driver register functions.
If Legacy PM OPS usage is found, print warning message to indicate the driver
code needs updating to use Dev PM OPS interfaces. This will help serve as a way
to track drivers that still use Legacy PM OPS and fix them.

This patch set adds Legacy PM OPS usage check and warning to bus_register(),
__class_register(), and driver_register() functions.

Individual patches in this series are not dependent on each other. The only
reason this is a series is for context and facilitating discussion on the
overall change as opposed individual patches.

Shuah Khan (3):
  drivers/bus: Add Legacy PM OPS usage check and warning to
bus_register()
  drivers/class: Add Legacy PM OPS usage check and warning to
__class_register()
  driver: Add Legacy PM OPS usage check and warning to driver_register()

 drivers/base/bus.c| 3 +++
 drivers/base/class.c  | 4 
 drivers/base/driver.c | 4 
 3 files changed, 11 insertions(+)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] drivers/class: Add Legacy PM OPS usage check and warning to __class_register()

2013-11-07 Thread Shuah Khan
Add Legacy PM OPS usage checks to __class_register() function. If Legacy PM OPS
usage is found, print warning message to indicate the driver code needs
updating to use Dev PM OPS interfaces. This will help serve as a way to track
drivers that still use Legacy PM OPS and fix them.

The Legacy PM OPS check looks for suspend(struct device *, pm_message_t) or
resume(struct device *) class level interfaces.

Signed-off-by: Shuah Khan 
---
 drivers/base/class.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/base/class.c b/drivers/base/class.c
index 8b7818b..f26a3f2 100644
--- a/drivers/base/class.c
+++ b/drivers/base/class.c
@@ -180,6 +180,10 @@ int __class_register(struct class *cls, struct 
lock_class_key *key)
 
pr_debug("device class '%s': registering\n", cls->name);
 
+   if (cls->suspend || cls->resume)
+   pr_warn("device class '%s': Please update to use dev pm ops\n",
+   cls->name);
+
cp = kzalloc(sizeof(*cp), GFP_KERNEL);
if (!cp)
return -ENOMEM;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] driver: Add Legacy PM OPS usage check and warning to driver_register()

2013-11-07 Thread Shuah Khan
Add Legacy PM OPS usage checks to driver_register() function. If Legacy PM OPS
usage is found, print warning message to indicate the driver code needs
updating to use Dev PM OPS interfaces. This will help serve as a way to track
drivers that still use Legacy PM OPS and fix them.

The Legacy PM OPS check looks for suspend(struct device *, pm_message_t) or
resume(struct device *) struct device_driver interfaces.

Signed-off-by: Shuah Khan 
---
 drivers/base/driver.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 9e29943..10ff280 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -157,6 +157,10 @@ int driver_register(struct device_driver *drv)
printk(KERN_WARNING "Driver '%s' needs updating - please use "
"bus_type methods\n", drv->name);
 
+   if (drv->suspend || drv->resume)
+   pr_warn("Please update driver '%s' to use dev pm ops.\n",
+   drv->name);
+
other = driver_find(drv->name, drv->bus);
if (other) {
printk(KERN_ERR "Error: Driver '%s' is already registered, "
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] drivers/bus: Add Legacy PM OPS usage check and warning to bus_register()

2013-11-07 Thread Shuah Khan
Add Legacy PM OPS usage checks to bus_register() function. If Legacy PM OPS
usage is found, print warning message to indicate the driver code needs
updating to use Dev PM OPS interfaces. This will help serve as a way to track
drivers that still use Legacy PM OPS and fix them.

The Legacy PM OPS check looks for suspend(struct device *, pm_message_t) or
resume(struct device *) bus level interfaces.

Signed-off-by: Shuah Khan 
---
 drivers/base/bus.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 4c289ab..9e68453 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -991,6 +991,9 @@ int bus_register(struct bus_type *bus)
goto bus_groups_fail;
 
pr_debug("bus: '%s': registered\n", bus->name);
+   if (bus->suspend || bus->resume)
+   pr_warn("bus '%s': Please update to use dev pm ops.\n",
+   bus->name);
return 0;
 
 bus_groups_fail:
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb-serial lockdep trace in linus' current tree.

2013-11-07 Thread Johan Hovold
On Thu, Nov 07, 2013 at 05:37:28PM -0500, Dave Jones wrote:
> Seeing this since todays USB merge.
>
> WARNING: CPU: 0 PID: 226 at kernel/lockdep.c:2740 
> lockdep_trace_alloc+0xc5/0xd0()
> DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> Modules linked in: usb_debug(+) kvm_intel kvm crct10dif_pclmul crc32c_intel 
> ghash_clmulni_intel microcode(+) pcspkr serio_raw
> CPU: 0 PID: 226 Comm: systemd-udevd Not tainted 3.12.0+ #112
>  81a22d3d 88023cde5670 8171a8e8 88023cde56b8
>  88023cde56a8 8105430d 0046 80d0
>  0010 0001 880244407a80 88023cde5708
> Call Trace:
>  [] dump_stack+0x4e/0x82
>  [] warn_slowpath_common+0x7d/0xa0
>  [] warn_slowpath_fmt+0x4c/0x50
>  [] lockdep_trace_alloc+0xc5/0xd0
>  [] __kmalloc+0x53/0x350
>  [] ? xhci_urb_enqueue+0xb7/0x610
>  [] ? debug_dma_mapping_error+0x7c/0x90
>  [] xhci_urb_enqueue+0xb7/0x610
>  [] usb_hcd_submit_urb+0xa6/0xae0
>  [] ? native_sched_clock+0x24/0x80
>  [] ? trace_hardirqs_off_caller+0x1f/0xc0
>  [] ? native_sched_clock+0x24/0x80
>  [] ? trace_hardirqs_off_caller+0x1f/0xc0
>  [] ? get_lock_stats+0x19/0x60
>  [] ? put_lock_stats.isra.28+0xe/0x40
>  [] usb_submit_urb+0x1f9/0x470
>  [] usb_serial_generic_write_start+0xf5/0x210
>  [] usb_serial_generic_write+0x70/0x90
>  [] usb_console_write+0xc7/0x220
>  [] call_console_drivers.constprop.23+0xa5/0x1e0
>  [] console_unlock+0x40c/0x460
>  [] register_console+0x12c/0x390
>  [] usb_serial_console_init+0x22/0x40
>  [] usb_serial_probe+0xfea/0x10e0
>  [] ? native_sched_clock+0x24/0x80
>  [] ? trace_hardirqs_off_caller+0x1f/0xc0
>  [] ? get_lock_stats+0x19/0x60
>  [] ? __mutex_unlock_slowpath+0xed/0x1a0
>  [] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [] ? trace_hardirqs_on+0xd/0x10
>  [] usb_probe_interface+0x1cf/0x300
>  [] driver_probe_device+0x87/0x390
>  [] __driver_attach+0x93/0xa0
>  [] ? __device_attach+0x40/0x40
>  [] bus_for_each_dev+0x6b/0xb0
>  [] driver_attach+0x1e/0x20
>  [] usb_serial_register_drivers+0x29e/0x580
>  [] ? 0xa0004fff
>  [] usb_serial_module_init+0x1e/0x1000 [usb_debug]
>  [] do_one_initcall+0xf2/0x1a0
>  [] ? set_memory_nx+0x43/0x50
>  [] load_module+0x1fd2/0x26a0
>  [] ? store_uevent+0x40/0x40
>  [] SyS_finit_module+0x86/0xb0
>  [] tracesys+0xdd/0xe2
> ---[ end trace ee033a3c9fd6263b ]---

A recent change in usb-serial used the wrong memory-allocation flag in
write(), which results in a

[5.979005] BUG: sleeping function called from invalid context at 
/home/johan/src/linux/linux-nu/mm/dmapool.c:310

with usb-next when using a usb-serial console as you seem to do be doing
as well.

Could be related. Care to give the fix below a try?

Thanks,
Johan

>From 7caaef75ebba3cfa3916b53ce1aee95291802ac4 Mon Sep 17 00:00:00 2001
From: Johan Hovold 
Date: Fri, 8 Nov 2013 00:44:31 +0100
Subject: [PATCH] USB: serial: fix write memory-allocation flag

Fix regression introduced by commit 818f60365a29 ("USB: serial: add
memory flags to usb_serial_generic_write_start"), which used GFP_KERNEL
in write, which must not not sleep.
---
 drivers/usb/serial/generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/serial/generic.c b/drivers/usb/serial/generic.c
index 2b01ec8651c2..538498646b06 100644
--- a/drivers/usb/serial/generic.c
+++ b/drivers/usb/serial/generic.c
@@ -208,7 +208,7 @@ int usb_serial_generic_write(struct tty_struct *tty,
return 0;
 
count = kfifo_in_locked(>write_fifo, buf, count, >lock);
-   result = usb_serial_generic_write_start(port, GFP_KERNEL);
+   result = usb_serial_generic_write_start(port, GFP_ATOMIC);
if (result)
return result;
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP2+: omap_device: maintain sane runtime pm status around suspend/resume

2013-11-07 Thread Nishanth Menon

On 11/07/2013 05:44 PM, Kevin Hilman wrote:

Nishanth Menon  writes:


OMAP device hooks around suspend|resume_noirq ensures that hwmod
devices are forced to idle using omap_device_idle/enable as part of
the last stage of suspend activity.

For a device such as i2c who uses autosuspend, it is possible to enter
the suspend path with dev->power.runtime_status = RPM_ACTIVE.

As part of the suspend flow, the generic runtime logic would increment
it's dev->power.disable_depth to 1. This should prevent further
pm_runtime_get_sync from succeeding once the runtime_status has been
set to RPM_SUSPENDED.

Now, as part of the suspend_noirq handler in omap_device, we force the
following: if the device status is !suspended, we force the device
to idle using omap_device_idle (clocks are cut etc..). This ensures
that from a hardware perspective, the device is "suspended". However,
runtime_status is left to be active.

*if* an operation is attempted after this point to
pm_runtime_get_sync, runtime framework depends on runtime_status to
indicate accurately the device status, and since it sees it to be
ACTIVE, it assumes the module is functional and returns a non-error
value. As a result the user will see pm_runtime_get succeed, however a
register access will crash due to the lack of clocks.


Ouch.

Dumb Q: who is requesting an i2c transaction after ->suspend_noirq().
The i2c driver itself should be able to detect that it's being accessed
after this point and return an error.


i2c dropped it generic suspend handlers at
commit 584b408d37af4e0b38ad5b60f236381bcdf396bc
Author: Kevin Hilman 
Date:   Thu Aug 4 07:53:02 2011 -0700

Revert "i2c-omap: fix static suspend vs. runtime suspend"

Also, as of v3.1, the PM domain level code for OMAP handles device
power state transistions automatically for devices, so drivers no
longer need to specifically call the bus/pm_domain methods themselves.


- So it is rightly in the mercy of runtime PM to adequately represent 
it's state. I disagree that i2c driver should in addition have to 
remember what it's generic suspend state is.
 - See the link to the cpufreq and the logs to see the call stack where 
it fails.


Now, this is fine, since omap_i2c_xfer should ideally fail with a 
pm_runtime_get_sync returning -EACCESS when device is really suspended 
(remember, we are doing autosuspend - so, in the case I caught, there 
was regulator voltage set prior to entering suspend, but timeout was not 
yet hit for autosuspend of i2c to kick in yet).




That being said, I agree that omap_device should still be catching this
case in order to find/fix driver races like this.





To prevent this from happening, we should ensure that runtime_status
exactly indicates the device status. As a result of this change
any further calls to pm_runtime_get* would return -EACCES (since
disable_depth is 1). On resume, we restore the clocks and runtime
status exactly as we suspended with.

Reported-by: J Keerthy 
Signed-off-by: Nishanth Menon 
Acked-by: Rajendra Nayak 
---
patch baseline: V3.12 tag (also applies on linux-next next-20131107 tag)

Logs from 3.12 based vendor kernel:
Before: http://pastebin.com/m5KxnB7a
After: http://pastebin.com/8AfX4e7r

The discussion on cpufreq side of the story which triggered this scenario:
http://marc.info/?l=linux-pm=138263811321921=2

Tested on TI vendor kernel (with dt boot):
AM335x: evm, BBB, sk, BBW
OMAP5uEVM, DRA7-evm

  arch/arm/mach-omap2/omap_device.c |   16 ++--
  arch/arm/mach-omap2/omap_device.h |1 +
  2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_device.c 
b/arch/arm/mach-omap2/omap_device.c
index b69dd9a..87ecbb0 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -621,6 +621,13 @@ static int _od_suspend_noirq(struct device *dev)

if (!ret && !pm_runtime_status_suspended(dev)) {
if (pm_generic_runtime_suspend(dev) == 0) {
+   if (!pm_runtime_suspended(dev)) {


Why the addition check for supended here?


pm_runtime_status_suspended checks only the dev->power.runtime_status
but that may not truely indicate device was in previous use - in the 
case of devices like i2c who depend on autosuspend.


pm_runtime_suspended checks for dev->power.runtime_status == 
RPM_SUSPENDED and disable_depth




This version (as opposed to the _status_suspended() version above will
fail if runtime PM has been disabled from userspace (via
/sys/devices/.../power/control), and will thus prevent low power states
from being hit in suspend if runtime suspend has been disabled from
userspace.  That's a bug.


yes, and so it should fail to achieve low power state. we explicitly 
stated disable suspend, yet we go behind the runtime PM's back and force 
disable the module clocks. now drivers should NOT know what the state of 
the omap_device meddling is and should o

Re: [PATCH] net: x86: bpf: don't forget to free sk_filter (v2)

2013-11-07 Thread David Miller
From: Andrey Vagin 
Date: Thu,  7 Nov 2013 08:35:12 +0400

> sk_filter isn't freed if bpf_func is equal to sk_run_filter.
> 
> This memory leak was introduced by v3.12-rc3-224-gd45ed4a4
> "net: fix unsafe set_memory_rw from softirq".
> 
> Before this patch sk_filter was freed in sk_filter_release_rcu,
> now it should be freed in bpf_jit_free.
> 
> Here is output of kmemleak:
 ...
> v2: add extra { } after else
> 
> Fixes: d45ed4a4e33a ("net: fix unsafe set_memory_rw from softirq")
> Acked-by: Daniel Borkmann 
> Cc: Alexei Starovoitov 
> Cc: Eric Dumazet 
> Cc: "David S. Miller" 
> Signed-off-by: Andrey Vagin 

Applied and queued up for -stable, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Bench for testing scheduler

2013-11-07 Thread Rowand, Frank
Hi Vincent,

Thanks for creating some benchmark numbers!


On Thursday, November 07, 2013 5:33 AM, Vincent Guittot 
[vincent.guit...@linaro.org] wrote:
> 
> On 7 November 2013 12:32, Catalin Marinas  wrote:
> > Hi Vincent,
> >
> > (for whatever reason, the text is wrapped and results hard to read)
> 
> Yes, i have just seen that. It looks like gmail has wrapped the lines.
> I have added the results which should not be wrapped, at the end of this email
> 
> >
> >
> > On Thu, Nov 07, 2013 at 10:54:30AM +, Vincent Guittot wrote:
> >> During the Energy-aware scheduling mini-summit, we spoke about benches
> >> that should be used to evaluate the modifications of the scheduler.
> >> I’d like to propose a bench that uses cyclictest to measure the wake
> >> up latency and the power consumption. The goal of this bench is to
> >> exercise the scheduler with various sleeping period and get the
> >> average wakeup latency. The range of the sleeping period must cover
> >> all residency times of the idle state table of the platform. I have
> >> run such tests on a tc2 platform with the packing tasks patchset.
> >> I have use the following command:
> >> #cyclictest -t  -q -e 1000 -i <500-12000> -d 150 -l 
> >> 2000

The number of loops ("-l 2000") should be much larger to create useful
results.  I don't have a specific number that is large enough, I just
know from experience that 2000 is way too small.  For example, running
cyclictest several times with the same values on my laptop gives values
that are not consistent:

   $ sudo ./cyclictest -t -q -e 1000 -i 500 -d 150 -l 2000
   # /dev/cpu_dma_latency set to 1000us
   T: 0 ( 9703) P: 0 I:500 C:   2000 Min:  2 Act:   90 Avg:   77 Max: 
243
   T: 1 ( 9704) P: 0 I:650 C:   1557 Min:  2 Act:   58 Avg:   68 Max: 
226
   T: 2 ( 9705) P: 0 I:800 C:   1264 Min:  2 Act:   54 Avg:   81 Max:
1017
   T: 3 ( 9706) P: 0 I:950 C:   1065 Min:  2 Act:   11 Avg:   80 Max: 
260
   
   $ sudo ./cyclictest -t -q -e 1000 -i 500 -d 150 -l 2000
   # /dev/cpu_dma_latency set to 1000us
   T: 0 ( 9709) P: 0 I:500 C:   2000 Min:  2 Act:   45 Avg:   74 Max: 
390
   T: 1 ( 9710) P: 0 I:650 C:   1554 Min:  2 Act:   82 Avg:   61 Max: 
810
   T: 2 ( 9711) P: 0 I:800 C:   1263 Min:  2 Act:   83 Avg:   74 Max: 
287
   T: 3 ( 9712) P: 0 I:950 C:   1064 Min:  2 Act:  103 Avg:   79 Max: 
551
   
   $ sudo ./cyclictest -t -q -e 1000 -i 500 -d 150 -l 2000
   # /dev/cpu_dma_latency set to 1000us
   T: 0 ( 9716) P: 0 I:500 C:   2000 Min:  2 Act:   82 Avg:   72 Max: 
252
   T: 1 ( 9717) P: 0 I:650 C:   1556 Min:  2 Act:  115 Avg:   77 Max: 
354
   T: 2 ( 9718) P: 0 I:800 C:   1264 Min:  2 Act:   59 Avg:   78 Max:
1143
   T: 3 ( 9719) P: 0 I:950 C:   1065 Min:  2 Act:  104 Avg:   70 Max: 
238
   
   $ sudo ./cyclictest -t -q -e 1000 -i 500 -d 150 -l 2000
   # /dev/cpu_dma_latency set to 1000us
   T: 0 ( 9722) P: 0 I:500 C:   2000 Min:  2 Act:   82 Avg:   68 Max: 
213
   T: 1 ( 9723) P: 0 I:650 C:   1555 Min:  2 Act:   65 Avg:   65 Max:
1279
   T: 2 ( 9724) P: 0 I:800 C:   1264 Min:  2 Act:   91 Avg:   69 Max: 
244
   T: 3 ( 9725) P: 0 I:950 C:   1065 Min:  2 Act:   58 Avg:   76 Max: 
242


> >
> > cyclictest could be a good starting point but we need to improve it to
> > allow threads of different loads, possibly starting multiple processes
> > (can be done with a script), randomly varying load threads. These
> > parameters should be loaded from a file so that we can have multiple
> > configurations (per SoC and per use-case). But the big risk is that we
> > try to optimise the scheduler for something which is not realistic.
> 
> The goal of this simple bench is to measure the wake up latency and the 
> reachable value of the scheduler on a platform but not to emulate a "real" 
> use case. In the same way than sched-pipe tests a specific behavior of the 
> scheduler, this bench tests the wake up latency of a system.
> 
> Starting multi processes and adding some loads can also be useful but the 
> target will be a bit different from wake up latency. I have one concern with 
> randomness because it prevents from having repeatable and comparable tests 
> and results.
> 
> I agree that we have to test "real" use cases but it doesn't prevent from 
> testing the limit of a characteristic on a system
> 
> >
> >
> > We are working on describing some basic scenarios (plain English for
> > now) and one of them could be video playing with threads for audio and
> > video decoding with random change in the workload.
> >
> > So I think the first step should be a set of tools/scripts to analyse
> > the scheduler behaviour, both in terms of latency and power, and these
> > can use perf sched. We can then run some real life scenarios (e.g.
> > Android video playback) and build a benchmark that matches such
> > behaviour as close as possible. We can probably use 

Re: linux-next: manual merge of the block tree with the tree

2013-11-07 Thread Dave Kleikamp
On 11/07/2013 01:25 PM, Kent Overstreet wrote:
> On Thu, Nov 07, 2013 at 01:20:26PM -0600, Dave Kleikamp wrote:

>> I ended up replacing a call to bio_iovec_idx() with __bvec_iter_bvec()
>> since the former was removed. It's not very elegant, but it works. I'm
>> open to suggestions on a cleaner fix, but it can wait until one or both
>> of these trees is merged.
> 
> No, that's definitely wrong. Read Documentation/block/biovecs.txt - you
> need to use either the new bio_iovec() or bio_iovec() iter. I can do the
> conversion later today.

Stephen,
Can you please drop the aio-direct tree for the time being?

My stuff is not aligning very well with the immutable biovecs and Kent
has a different approach in mind. I'll be working with him on a
replacement that will hopefully be simpler.

Thanks,
Shaggy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/11] random: simplify accounting logic

2013-11-07 Thread Greg Price
This logic is exactly equivalent to the old logic, but it should
be easier to see what it's doing.

The equivalence depends on one fact from outside this function:
when 'r->limit' is false, 'reserved' is zero.  (Well, two facts;
the other is that 'reserved' is never negative.)

Which is a good thing, too, because the "entropy_count = reserved"
line would have had a bug otherwise -- 'reserved' counts bytes and
'entropy_count' counts bits.  Fortunately that line could only be
reached if 'r->limit' was false, and zero bytes is zero bits.

Cc: "Theodore Ts'o" 
Cc: Jiri Kosina 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 8824e8d..9e4645e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -860,12 +860,7 @@ retry:
/* If limited, never pull more than available */
if (r->limit)
nbytes = min_t(size_t, nbytes, entropy_count/8 - 
reserved);
-
-   if (entropy_count / 8 >= nbytes + reserved) {
-   entropy_count -= nbytes*8;
-   } else {
-   entropy_count = reserved;
-   }
+   entropy_count = max_t(int, entropy_count - nbytes*8, 0);
if (cmpxchg(>entropy_count, orig, entropy_count) != orig)
goto retry;
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] random: simplify accounting code slightly

2013-11-07 Thread Greg Price
This commit makes the very boring simplifications so that the next
commit, which is a little trickier, is isolated and easy to review.
No change in behavior here at all.

Cc: "Theodore Ts'o" 
Cc: Jiri Kosina 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 8ec4a9a..8824e8d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -858,18 +858,16 @@ static size_t account(struct entropy_store *r, size_t 
nbytes, int min,
 retry:
entropy_count = orig = ACCESS_ONCE(r->entropy_count);
/* If limited, never pull more than available */
-   if (r->limit && nbytes + reserved >= entropy_count / 8)
-   nbytes = entropy_count/8 - reserved;
+   if (r->limit)
+   nbytes = min_t(size_t, nbytes, entropy_count/8 - 
reserved);
 
if (entropy_count / 8 >= nbytes + reserved) {
entropy_count -= nbytes*8;
-   if (cmpxchg(>entropy_count, orig, entropy_count) != 
orig)
-   goto retry;
} else {
entropy_count = reserved;
-   if (cmpxchg(>entropy_count, orig, entropy_count) != 
orig)
-   goto retry;
}
+   if (cmpxchg(>entropy_count, orig, entropy_count) != orig)
+   goto retry;
 
if (entropy_count < random_write_wakeup_thresh)
wakeup_write = 1;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] random: simplify accounting code

2013-11-07 Thread Greg Price
With this we handle "reserved" in just one place.  As a bonus the
code becomes less nested, and the "wakeup_write" flag variable
becomes unnecessary.  The variable "flags" was already unused.

This code behaves identically to the previous version except in
two pathological cases that don't occur.  If the argument "nbytes"
is already less than "min", then we didn't previously enforce
"min".  If r->limit is false while "reserved" is nonzero, then we
previously applied "reserved" in checking whether we had enough
bits, even though we don't apply it to actually limit how many we
take.  The callers of account() never exercise either of these cases.

Cc: "Theodore Ts'o" 
Cc: Jiri Kosina 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 87d3728..3f343ff 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -840,8 +840,6 @@ static void xfer_secondary_pool(struct entropy_store *r, 
size_t nbytes)
 static size_t account(struct entropy_store *r, size_t nbytes, int min,
  int reserved)
 {
-   unsigned long flags;
-   int wakeup_write = 0;
int entropy_count, orig;
 
BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);
@@ -850,24 +848,19 @@ static size_t account(struct entropy_store *r, size_t 
nbytes, int min,
 
 retry:
entropy_count = orig = ACCESS_ONCE(r->entropy_count);
-   if (entropy_count / 8 < min + reserved) {
+   /* If limited, never pull more than available */
+   if (r->limit)
+   nbytes = min_t(size_t, nbytes, entropy_count/8 - reserved);
+   if (nbytes < min)
nbytes = 0;
-   } else {
-   /* If limited, never pull more than available */
-   if (r->limit)
-   nbytes = min_t(size_t, nbytes, entropy_count/8 - 
reserved);
-   entropy_count = max_t(int, entropy_count - nbytes*8, 0);
-   if (cmpxchg(>entropy_count, orig, entropy_count) != orig)
-   goto retry;
-
-   if (entropy_count < random_write_wakeup_thresh)
-   wakeup_write = 1;
-   }
+   entropy_count = max_t(int, entropy_count - nbytes*8, 0);
+   if (nbytes && cmpxchg(>entropy_count, orig, entropy_count) != orig)
+   goto retry;
 
DEBUG_ENT("debiting %zu entropy credits from %s%s\n",
  nbytes * 8, r->name, r->limit ? "" : " (unlimited)");
 
-   if (wakeup_write) {
+   if (nbytes && entropy_count < random_write_wakeup_thresh) {
wake_up_interruptible(_write_wait);
kill_fasync(, SIGIO, POLL_OUT);
}
-- 
1.8.3.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] random: simplify loop in random_read

2013-11-07 Thread Greg Price
The loop condition never changes until just before a break, so we
might as well write it as a constant.  Also since v2.6.33-rc7~40^2~2
("random: drop weird m_time/a_time manipulation") we don't do
anything after the loop finishes, so the 'break's might as well
return directly.  Some other simplifications.

The behavior should be identical except that I've changed a debug
message.

Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 68 +--
 1 file changed, 22 insertions(+), 46 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d1466c9..85f5fce 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1155,58 +1155,34 @@ void rand_initialize_disk(struct gendisk *disk)
 static ssize_t
 random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 {
-   ssize_t n, retval = 0, count = 0;
+   ssize_t n;
 
if (nbytes == 0)
return 0;
 
-   while (nbytes > 0) {
-   n = nbytes;
-   if (n > SEC_XFER_SIZE)
-   n = SEC_XFER_SIZE;
-
-   DEBUG_ENT("reading %zu bits\n", n*8);
-
-   n = extract_entropy_user(_pool, buf, n);
-
-   if (n < 0) {
-   retval = n;
-   break;
-   }
-
-   DEBUG_ENT("read got %zd bits (%zd still needed)\n",
+   nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE);
+   while (1) {
+   DEBUG_ENT("reading %zu bits\n", nbytes*8);
+   n = extract_entropy_user(_pool, buf, nbytes);
+   if (n < 0)
+   return n;
+   DEBUG_ENT("read got %zd bits (%zd short)\n",
  n*8, (nbytes-n)*8);
-
-   if (n == 0) {
-   if (file->f_flags & O_NONBLOCK) {
-   retval = -EAGAIN;
-   break;
-   }
-
-   DEBUG_ENT("sleeping?\n");
-
-   wait_event_interruptible(random_read_wait,
-   input_pool.entropy_count >=
-random_read_wakeup_thresh);
-
-   DEBUG_ENT("awake\n");
-
-   if (signal_pending(current)) {
-   retval = -ERESTARTSYS;
-   break;
-   }
-
-   continue;
-   }
-
-   count += n;
-   buf += n;
-   nbytes -= n;
-   break;  /* This break makes the device work */
-   /* like a named pipe */
+   if (n > 0)
+   return n;
+   /* Pool is (near) empty.  Maybe wait and retry. */
+
+   if (file->f_flags & O_NONBLOCK)
+   return -EAGAIN;
+
+   DEBUG_ENT("sleeping?\n");
+   wait_event_interruptible(random_read_wait,
+   input_pool.entropy_count >=
+random_read_wakeup_thresh);
+   DEBUG_ENT("awake\n");
+   if (signal_pending(current))
+   return -ERESTARTSYS;
}
-
-   return (count ? count : retval);
 }
 
 static ssize_t
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] random: fix description of get_random_bytes

2013-11-07 Thread Greg Price
After this remark was written, commit d2e7c96af added a use of
arch_get_random_long() inside the get_random_bytes codepath.
The main point stands, but it needs to be reworded.

Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a89ba74..d1466c9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1048,8 +1048,9 @@ static ssize_t extract_entropy_user(struct entropy_store 
*r, void __user *buf,
 /*
  * This function is the exported kernel interface.  It returns some
  * number of good random numbers, suitable for key generation, seeding
- * TCP sequence numbers, etc.  It does not use the hw random number
- * generator, if available; use get_random_bytes_arch() for that.
+ * TCP sequence numbers, etc.  It does not rely on the hardware random
+ * number generator.  For random bytes direct from the hardware RNG
+ * (when available), use get_random_bytes_arch().
  */
 void get_random_bytes(void *buf, int nbytes)
 {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] random: fix comment on "account"

2013-11-07 Thread Greg Price
This comment didn't quite keep up as extract_entropy() was split into
four functions.  Put each bit by the function it describes.

Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 04a117a..8ec4a9a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -834,17 +834,9 @@ static void xfer_secondary_pool(struct entropy_store *r, 
size_t nbytes)
 }
 
 /*
- * These functions extracts randomness from the "entropy pool", and
- * returns it in a buffer.
- *
- * The min parameter specifies the minimum amount we can pull before
- * failing to avoid races that defeat catastrophic reseeding while the
- * reserved parameter indicates how much entropy we must leave in the
- * pool after each pull to avoid starving other readers.
- *
- * Note: extract_entropy() assumes that .poolwords is a multiple of 16 words.
+ * This function decides how many bytes to actually take from the
+ * given pool, and also debits the entropy count accordingly.
  */
-
 static size_t account(struct entropy_store *r, size_t nbytes, int min,
  int reserved)
 {
@@ -896,6 +888,12 @@ retry:
return nbytes;
 }
 
+/*
+ * This function does the actual extraction for extract_entropy and
+ * extract_entropy_user.
+ *
+ * Note: we assume that .poolwords is a multiple of 16 words.
+ */
 static void extract_buf(struct entropy_store *r, __u8 *out)
 {
int i;
@@ -957,6 +955,15 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
memset(, 0, sizeof(hash));
 }
 
+/*
+ * This function extracts randomness from the "entropy pool", and
+ * returns it in a buffer.
+ *
+ * The min parameter specifies the minimum amount we can pull before
+ * failing to avoid races that defeat catastrophic reseeding while the
+ * reserved parameter indicates how much entropy we must leave in the
+ * pool after each pull to avoid starving other readers.
+ */
 static ssize_t extract_entropy(struct entropy_store *r, void *buf,
 size_t nbytes, int min, int reserved)
 {
@@ -1007,6 +1014,10 @@ static ssize_t extract_entropy(struct entropy_store *r, 
void *buf,
return ret;
 }
 
+/*
+ * This function extracts randomness from the "entropy pool", and
+ * returns it in a userspace buffer.
+ */
 static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
size_t nbytes)
 {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] random: declare trickle_count unsigned

2013-11-07 Thread Greg Price
We cheerfully overflow these variables (at least where "int"
is 32 bits wide), so pedantically they should be unsigned.

Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 85f5fce..04a117a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -296,7 +296,7 @@ static int random_write_wakeup_thresh = 128;
 
 static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28;
 
-static DEFINE_PER_CPU(int, trickle_count);
+static DEFINE_PER_CPU(unsigned int, trickle_count);
 
 /*
  * A pool of size .poolwords is stirred with a primitive polynomial
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] random: pull 'min' check in accounting to inside lockless update

2013-11-07 Thread Greg Price
Since 902c098a3 ("random: use lockless techniques in the interrupt
path") we have protected entropy_count only with cmpxchg, not the
lock.  In 10b3a32d2 ("random: fix accounting race condition") we
put a cmpxchg-retry loop around most of the logic in account(),
but the enforcement of the 'min' parameter stayed outside.

Under concurrent accesses to /dev/urandom this can suffer a race
that defeats our "catastrophic reseed" measures so that our
reseeding isn't effective.  If accesses to /dev/urandom and
/dev/random are interleaved in the wrong pattern, we could go
indefinitely long without a successful catastrophic reseed of
/dev/urandom, even while consuming an indefinite amount of entropy
in ineffective smaller reseeds.

NB that with or without this bug, if /dev/random is simply
accessed constantly, we can go indefinitely long without any
reseed of /dev/urandom.  So the effect of the bug is not that big.

The race comes when two tasks are in account() on input_pool
and access ->entropy_count in the following order:

 task A   task B
   if (r->entropy_count / 8
   < min + reserved)
   ACCESS_ONCE(r->entropy_count)
   cmpxchg
   ACCESS_ONCE(r->entropy_count)
   cmpxchg

where B causes r->entropy_count / 8 to fall below A's
min + reserved but above reserved.  Task A will cheerfully
take r->entropy_count / 8 - reserved bytes from the pool,
even though this is less than min.

Move the "min" check inside the ACCESS_ONCE/cmpxchg loop to
prevent the race.

Cc: Jiri Kosina 
Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 1bf6bf8..87d3728 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -842,18 +842,17 @@ static size_t account(struct entropy_store *r, size_t 
nbytes, int min,
 {
unsigned long flags;
int wakeup_write = 0;
+   int entropy_count, orig;
 
BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);
DEBUG_ENT("trying to extract %zu bits from %s\n",
  nbytes * 8, r->name);
 
-   /* Can we pull enough? */
-   if (r->entropy_count / 8 < min + reserved) {
+retry:
+   entropy_count = orig = ACCESS_ONCE(r->entropy_count);
+   if (entropy_count / 8 < min + reserved) {
nbytes = 0;
} else {
-   int entropy_count, orig;
-retry:
-   entropy_count = orig = ACCESS_ONCE(r->entropy_count);
/* If limited, never pull more than available */
if (r->limit)
nbytes = min_t(size_t, nbytes, entropy_count/8 - 
reserved);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] random: forget lock in lockless accounting

2013-11-07 Thread Greg Price
The only mutable data accessed here is ->entropy_count, but since
902c098a3 ("random: use lockless techniques in the interrupt path")
that member isn't protected by the lock anyway.  Since 10b3a32d2
("random: fix accounting race condition") we use cmpxchg to protect
our accesses to ->entropy_count here.  Drop the use of the lock.

Cc: Jiri Kosina 
Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 9e4645e..1bf6bf8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -843,9 +843,6 @@ static size_t account(struct entropy_store *r, size_t 
nbytes, int min,
unsigned long flags;
int wakeup_write = 0;
 
-   /* Hold lock while accounting */
-   spin_lock_irqsave(>lock, flags);
-
BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);
DEBUG_ENT("trying to extract %zu bits from %s\n",
  nbytes * 8, r->name);
@@ -871,8 +868,6 @@ retry:
DEBUG_ENT("debiting %zu entropy credits from %s%s\n",
  nbytes * 8, r->name, r->limit ? "" : " (unlimited)");
 
-   spin_unlock_irqrestore(>lock, flags);
-
if (wakeup_write) {
wake_up_interruptible(_write_wait);
kill_fasync(, SIGIO, POLL_OUT);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] random: fix comment on proc_do_uuid

2013-11-07 Thread Greg Price
There's only one function here now, as uuid_strategy is long gone.
Also make the bit about "If accesses via ..." clearer.

Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index ef3e15b..a89ba74 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1370,13 +1370,13 @@ static int max_write_thresh = INPUT_POOL_WORDS * 32;
 static char sysctl_bootid[16];
 
 /*
- * These functions is used to return both the bootid UUID, and random
+ * This function is used to return both the bootid UUID, and random
  * UUID.  The difference is in whether table->data is NULL; if it is,
  * then a new UUID is generated and returned to the user.
  *
- * If the user accesses this via the proc interface, it will be returned
- * as an ASCII string in the standard UUID format.  If accesses via the
- * sysctl system call, it is returned as 16 bytes of binary data.
+ * If the user accesses this via the proc interface, the UUID will be
+ * returned as an ASCII string in the standard UUID format; if via the
+ * sysctl system call, as 16 bytes of binary data.
  */
 static int proc_do_uuid(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/11] random: code cleanups

2013-11-07 Thread Greg Price
Hi Ted, hi all,

I recently read through the random number generator's code.
This series has fixes for some minor things I spotted.

Four of the patches touch comments only.  Four simplify code without
changing its behavior (total diffstat: 35 insertions, 73 deletions),
and one is a trivial signedness fix.  Two patches change the locking
and lockless concurrency control in account(), including one bugfix.
The bug is related to the one Jiri found and fixed in May.

Cheers,
Greg


Greg Price (11):
  random: fix typos / spelling errors in comments
  random: fix comment on proc_do_uuid
  random: fix description of get_random_bytes
  random: simplify loop in random_read
  random: declare trickle_count unsigned
  random: fix comment on "account"
  random: simplify accounting code slightly
  random: simplify accounting logic
  random: forget lock in lockless accounting
  random: pull 'min' check in accounting to inside lockless update
  random: simplify accounting code

 drivers/char/random.c | 164 --
 1 file changed, 66 insertions(+), 98 deletions(-)

-- 
1.8.3.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [trivial PATCH] module.h: Remove unnecessary semicolon

2013-11-07 Thread Joe Perches
On Fri, 2013-11-08 at 09:41 +1030, Rusty Russell wrote:
> Joe Perches  writes:
> > On Thu, 2013-11-07 at 12:32 +1030, Rusty Russell wrote:
> >> Joe Perches  writes:
> >> > This semicolon isn't necessary, remove it.
> >> >
> >> > Signed-off-by: Joe Perches 
> >> 
> >> This is a terrible description.  Really bad.
> >
> > I'd've preferred no description.
> 
> Me too.
> 
> >> First, it just repeats the subject, with more words.
> >
> > Which others have demanded.
> 
> They're wrong.

shrug.  Can't please everyone.

> >> Second, it gives me no indication that you've done a grep to make sure
> >> noone is abusing the macro, so I can't apply it without doing that check
> >> myself.
> >
> > That's a trust issue.
> > I've done it.  It isn't necessary.
> 
> WTF?  Now you just said it's not necessary, I *know* I can't trust you.

"It" in this case is the grep that I did
prior to sending the patch.

> > The other #define module_put_and_exit in a
> > different #if #else already doesn't have one.
> 
> You didn't mention this, and even if you did, it's not sufficient.  Some
> code only ever gets compiled as a module, so it'd never hit the
> !CONFIG_MODULES case.

Which would only show there was a latent bug.

> > Trust it or not, apply it or not.
> 
> Now I know when I receive a patch from you I have to check it carefully,
> because you can't be bothered.

Sure.  If you think so.
Nonetheless, I told you I'd checked it.

> I've fixed your patch, you can find it below.

No, not really fixed.  You modified the commit log to
suit your taste.

cheers, Joe


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] random: fix typos / spelling errors in comments

2013-11-07 Thread Greg Price
Cc: "Theodore Ts'o" 
Signed-off-by: Greg Price 
---
 drivers/char/random.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 7a744d3..ef3e15b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -348,12 +348,12 @@ static struct poolinfo {
 
 /*
  * For the purposes of better mixing, we use the CRC-32 polynomial as
- * well to make a twisted Generalized Feedback Shift Reigster
+ * well to make a twisted Generalized Feedback Shift Register
  *
  * (See M. Matsumoto & Y. Kurita, 1992.  Twisted GFSR generators.  ACM
  * Transactions on Modeling and Computer Simulation 2(3):179-194.
  * Also see M. Matsumoto & Y. Kurita, 1994.  Twisted GFSR generators
- * II.  ACM Transactions on Mdeling and Computer Simulation 4:254-266)
+ * II.  ACM Transactions on Modeling and Computer Simulation 4:254-266)
  *
  * Thanks to Colin Plumb for suggesting this.
  *
@@ -382,7 +382,7 @@ static struct poolinfo {
  * modulo the generator polymnomial.  Now, for random primitive polynomials,
  * this is a universal class of hash functions, meaning that the chance
  * of a collision is limited by the attacker's knowledge of the generator
- * polynomail, so if it is chosen at random, an attacker can never force
+ * polynomial, so if it is chosen at random, an attacker can never force
  * a collision.  Here, we use a fixed polynomial, but we *can* assume that
  * ###--> it is unknown to the processes generating the input entropy. <-###
  * Because of this important property, this is a good, collision-resistant
@@ -943,7 +943,7 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
hash.w[2] ^= rol32(hash.w[2], 16);
 
/*
-* If we have a architectural hardware random number
+* If we have an architectural hardware random number
 * generator, mix that in, too.
 */
for (i = 0; i < LONGS(EXTRACT_SIZE); i++) {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [trivial PATCH] module.h: Remove unnecessary semicolon

2013-11-07 Thread Rusty Russell
Joe Perches  writes:
> On Thu, 2013-11-07 at 12:32 +1030, Rusty Russell wrote:
>> Joe Perches  writes:
>> > This semicolon isn't necessary, remove it.
>> >
>> > Signed-off-by: Joe Perches 
>> 
>> This is a terrible description.  Really bad.
>
> I'd've preferred no description.

Me too.

>> First, it just repeats the subject, with more words.
>
> Which others have demanded.

They're wrong.

>> Second, it gives me no indication that you've done a grep to make sure
>> noone is abusing the macro, so I can't apply it without doing that check
>> myself.
>
> That's a trust issue.
> I've done it.  It isn't necessary.

WTF?  Now you just said it's not necessary, I *know* I can't trust you.

> The other #define module_put_and_exit in a
> different #if #else already doesn't have one.

You didn't mention this, and even if you did, it's not sufficient.  Some
code only ever gets compiled as a module, so it'd never hit the
!CONFIG_MODULES case.

> Trust it or not, apply it or not.

Now I know when I receive a patch from you I have to check it carefully,
because you can't be bothered.

I've fixed your patch, you can find it below.

From: Joe Perches 
Subject: module.h: Remove unnecessary semicolon

[All 8 callers already have semicolons. -- RR]

Signed-off-by: Joe Perches 
Signed-off-by: Rusty Russell 
---
 include/linux/module.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 15cd6b1b211e..46e548fd502a 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -451,7 +451,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const 
char *,
 
 extern void __module_put_and_exit(struct module *mod, long code)
__attribute__((noreturn));
-#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code);
+#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
 
 #ifdef CONFIG_MODULE_UNLOAD
 unsigned long module_refcount(struct module *mod);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/kallsyms.c:222!

2013-11-07 Thread Rusty Russell
Axel Lin  writes:
> 2013/11/7 Ming Lei :
>> Hi,
>>
>> On Thu, Nov 7, 2013 at 10:47 AM, Axel Lin  wrote:
>>>
>>> hi Ming,
>>> Seems CONFIG_PAGE_OFFSET is not configurabe in "make menuconfig".
>>> And I found CONFIG_PAGE_OFFSET=0xC000 for all below configs...
>>> $ make at91_dt_defconfig; grep  CONFIG_PAGE_OFFSET .config
>>> $ make ep93xx_defconfig; grep  CONFIG_PAGE_OFFSET .config
>>> $ make imx_v4_v5_defconfig; grep CONFIG_PAGE_OFFSET .config
>>> $ make mxs_defconfig; grep CONFIG_PAGE_OFFSET .config
>>> $ make omap2plus_defconfig; grep CONFIG_PAGE_OFFSET .config
>>> $ make s3c6400_defconfig; grep CONFIG_PAGE_OFFSET .config
>>> $ make at91x40_defconfig; grep CONFIG_PAGE_OFFSET .config
>>> ( at91x40_defconfig is also arm7tdmi )
>>
>> Firstly it can be configured via VMSPLIT_3GVMSPLIT_2G/VMSPLIT_1G.
>>
>> Secondly, configurable or not isn't the point, and maybe some uclinux
>> platforms do not use CONFIG_PAGE_OFFSET at all, but they should
>> set it as a reasonable value or at least be below than the start link
>> address of vmlinux.
>
> Hi Ming,
>
> I found in arch/arm/include/asm/memory.h:
> CONFIG_PAGE_OFFSET is not used if !CONFIG_MMU.
> So looks like setting CONFIG_PAGE_OFFSET to other value still won't work.

This seems like the simplest solution, but it may mean you still have
crap in /proc/kallsyms.

Does it work for you?

Thanks,
Rusty.


diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 32b10f53d0b4..c3bd3efec4cc 100644
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -82,7 +82,9 @@ kallsyms()
kallsymopt="${kallsymopt} --all-symbols"
fi
 
-   kallsymopt="${kallsymopt} --page-offset=$CONFIG_PAGE_OFFSET"
+   if [ -n "${CONFIG_MMU}" ]; then
+   kallsymopt="${kallsymopt} --page-offset=$CONFIG_PAGE_OFFSET"
+   fi
 
local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}   \
  ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] LEDS: tca6502: add device-tree support for GPIO configuration.

2013-11-07 Thread NeilBrown
On Thu, 7 Nov 2013 15:39:00 -0800 Bryan Wu  wrote:

> On Thu, Oct 31, 2013 at 7:41 PM, NeilBrown  wrote:
> >
> 
> [PATCH 2/2] LEDS: tca6502: add device-tree support for GPIO configuration.
> 
> I think it should be tca6507, right? Typo?

Obviously I'm still dreaming of the Apple IIe.  Yes, a typo.

> 
> For other parts of this patch, I'm OK for them.
> 
> And just a quick scan of the leds-tca6507, I found bunch of typos in
> the comments:
> 
> * An led-tca6507 device must be provided with platform data.  This data
>  * lists for each output: the name, default trigger, and whether the signal
>  * is being used as a GPiO rather than an led.  'struct led_plaform_data'
> 
> Can we unify to use GPIO and gpio?
> 
>  * is used for this.  If 'name' is NULL, the output isn't used.  If 'flags'
>  * is TCA6507_MAKE_CPIO, the output is a GPO.
> 
> Here should be TCA6507_MAKE_GPIO and GPIO instead of GPO
> 
>  * The "struct led_platform_data" can be embedded in a
>  * "struct tca6507_platform_data" which adds a 'gpio_base' for the GPiOs,
>  * and a 'setup' callback which is called once the GPiOs are available.
> 
> Don't use GPiO, please.
> 
> Could you please review the code again and submit a cleanup patch to
> fix those typos?

Yes, I'll send a patch shortly.

Thanks,
NeilBrown




signature.asc
Description: PGP signature


Re: [PATCH 2/2] LEDS: tca6502: add device-tree support for GPIO configuration.

2013-11-07 Thread Bryan Wu
On Thu, Nov 7, 2013 at 3:39 PM, Bryan Wu  wrote:
> On Thu, Oct 31, 2013 at 7:41 PM, NeilBrown  wrote:
>>
>
> [PATCH 2/2] LEDS: tca6502: add device-tree support for GPIO configuration.
>
> I think it should be tca6507, right? Typo?
>

I fixed this typo and queue up this patch in my devel branch.

-Bryan

> For other parts of this patch, I'm OK for them.
>
> And just a quick scan of the leds-tca6507, I found bunch of typos in
> the comments:
>
> * An led-tca6507 device must be provided with platform data.  This data
>  * lists for each output: the name, default trigger, and whether the signal
>  * is being used as a GPiO rather than an led.  'struct led_plaform_data'
>
> Can we unify to use GPIO and gpio?
>
>  * is used for this.  If 'name' is NULL, the output isn't used.  If 'flags'
>  * is TCA6507_MAKE_CPIO, the output is a GPO.
>
> Here should be TCA6507_MAKE_GPIO and GPIO instead of GPO
>
>  * The "struct led_platform_data" can be embedded in a
>  * "struct tca6507_platform_data" which adds a 'gpio_base' for the GPiOs,
>  * and a 'setup' callback which is called once the GPiOs are available.
>
> Don't use GPiO, please.
>
> Could you please review the code again and submit a cleanup patch to
> fix those typos?
>
> Thanks,
> -Bryan
>
>
>> The 7 lines driven by the TCA6507 can either drive LEDs or act as output-only
>> GPIOs.
>>
>> To make this distinction in devicetree we use the "compatible" property.
>>
>> If the device attached to a line is "compatible" with "gpio", we treat it
>> like a GPIO.  If it is "compatible" with "led" (or if no "compatible" value
>> is set) we treat it like an LED.
>>
>> Signed-off-by: NeilBrown 
>>
>> diff --git a/Documentation/devicetree/bindings/leds/tca6507.txt 
>> b/Documentation/devicetree/bindings/leds/tca6507.txt
>> index 80ff3dfb1f32..d7221b84987c 100644
>> --- a/Documentation/devicetree/bindings/leds/tca6507.txt
>> +++ b/Documentation/devicetree/bindings/leds/tca6507.txt
>> @@ -2,6 +2,13 @@ LEDs connected to tca6507
>>
>>  Required properties:
>>  - compatible : should be : "ti,tca6507".
>> +- #address-cells: must be 1
>> +- #size-cells: must be 0
>> +- reg: typically 0x45.
>> +
>> +Optional properties:
>> +- gpio-controller: allows lines to be used as output-only GPIOs.
>> +- #gpio-cells: if present, must be 0.
>>
>>  Each led is represented as a sub-node of the ti,tca6507 device.
>>
>> @@ -10,6 +17,7 @@ LED sub-node properties:
>>  - reg : number of LED line (could be from 0 to 6)
>>  - linux,default-trigger : (optional)
>> see Documentation/devicetree/bindings/leds/common.txt
>> +- compatible: either "led" (the default) or "gpio".
>>
>>  Examples:
>>
>> @@ -19,6 +27,9 @@ tca6507@45 {
>> #size-cells = <0>;
>> reg = <0x45>;
>>
>> +   gpio-controller;
>> +   #gpio-cells = <2>;
>> +
>> led0: red-aux@0 {
>> label = "red:aux";
>> reg = <0x0>;
>> @@ -29,5 +40,10 @@ tca6507@45 {
>> reg = <0x5>;
>> linux,default-trigger = "default-on";
>> };
>> +
>> +   wifi-reset@6 {
>> +   reg = <0x6>;
>> +   compatible = "gpio";
>> +   };
>>  };
>>
>> diff --git a/drivers/leds/leds-tca6507.c b/drivers/leds/leds-tca6507.c
>> index f5063f447463..93a2b1759054 100644
>> --- a/drivers/leds/leds-tca6507.c
>> +++ b/drivers/leds/leds-tca6507.c
>> @@ -638,6 +638,9 @@ static int tca6507_probe_gpios(struct i2c_client *client,
>> tca->gpio.direction_output = tca6507_gpio_direction_output;
>> tca->gpio.set = tca6507_gpio_set_value;
>> tca->gpio.dev = >dev;
>> +#ifdef CONFIG_OF_GPIO
>> +   tca->gpio.of_node = of_node_get(client->dev.of_node);
>> +#endif
>> err = gpiochip_add(>gpio);
>> if (err) {
>> tca->gpio.ngpio = 0;
>> @@ -696,6 +699,8 @@ tca6507_led_dt_init(struct i2c_client *client)
>> led.default_trigger =
>> of_get_property(child, "linux,default-trigger", 
>> NULL);
>> led.flags = 0;
>> +   if (of_property_match_string(child, "compatible", "gpio") >= 
>> 0)
>> +   led.flags |= TCA6507_MAKE_GPIO;
>> ret = of_property_read_u32(child, "reg", );
>> if (ret != 0 || reg < 0 || reg >= NUM_LEDS)
>> continue;
>> @@ -709,6 +714,7 @@ tca6507_led_dt_init(struct i2c_client *client)
>>
>> pdata->leds.leds = tca_leds;
>> pdata->leds.num_leds = NUM_LEDS;
>> +   pdata->gpio_base = -1;
>>
>> return pdata;
>>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:21:51 +0100
> Frederic Weisbecker  wrote:
>  
> > Ok I see now.
> > 
> > But then this irq_work based solution won't work if, say, you run in full 
> > dynticks
> > mode. Also the hook on the timer interrupt is something that I wish we get 
> > rid
> > of on archs that can trigger self-IPIs.
> 
> Do we really want that? What about users that use LAZY? That is, the
> work isn't that important to trigger right now (added interrupt
> expense).

Ah right, I forgot that :)

> 
> > 
> > Notwithstanding it's going to have scalibility issues as irq work then 
> > converges
> > to a single list for unbound works.
> 
> Well, it doesn't seem to be something that would be called often. All
> CPUs checking a variable that is seldom changed should not have any
> scalability issues. Unless of course it happens to share a cache line
> with a variable that does change often.

Right, if it was printk specific code I wouldn't mind much in fact. But having 
that in
such a globally available API like irq work make me feel uncomfortable.

> 
> > 
> > Offloading to a workqueue would be perhaps better, and writing to the serial
> > console could then be done with interrupts enabled, preemptible context, 
> > etc...
> 
> Oh God no ;-)  Adding workqueue logic into printk just spells a
> nightmare of much more complexity for a critical kernel infrastructure.

True, in fact I was thinking about raising an irq work that enqueues a 
workqueue, circumvoluted right? ;)

But it would make it safe, as irq work can be enqueued everywhere, and as it 
seems that
we can have high bandwidth of stuffs to print to the serial device, a process 
context looks much saner to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:21:51 +0100
> Frederic Weisbecker  wrote:
> > 
> > Offloading to a workqueue would be perhaps better, and writing to the serial
> > console could then be done with interrupts enabled, preemptible context, 
> > etc...
> 
> Oh God no ;-)  Adding workqueue logic into printk just spells a
> nightmare of much more complexity for a critical kernel infrastructure.

But yeah that's scary, that means workqueues itself can't printk that safely.
So, you're right after all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: OMAP2+: omap_device: maintain sane runtime pm status around suspend/resume

2013-11-07 Thread Kevin Hilman
Nishanth Menon  writes:

> OMAP device hooks around suspend|resume_noirq ensures that hwmod
> devices are forced to idle using omap_device_idle/enable as part of
> the last stage of suspend activity.
>
> For a device such as i2c who uses autosuspend, it is possible to enter
> the suspend path with dev->power.runtime_status = RPM_ACTIVE.
>
> As part of the suspend flow, the generic runtime logic would increment
> it's dev->power.disable_depth to 1. This should prevent further
> pm_runtime_get_sync from succeeding once the runtime_status has been
> set to RPM_SUSPENDED.
>
> Now, as part of the suspend_noirq handler in omap_device, we force the
> following: if the device status is !suspended, we force the device
> to idle using omap_device_idle (clocks are cut etc..). This ensures
> that from a hardware perspective, the device is "suspended". However,
> runtime_status is left to be active.
>
> *if* an operation is attempted after this point to
> pm_runtime_get_sync, runtime framework depends on runtime_status to
> indicate accurately the device status, and since it sees it to be
> ACTIVE, it assumes the module is functional and returns a non-error
> value. As a result the user will see pm_runtime_get succeed, however a
> register access will crash due to the lack of clocks.

Ouch.

Dumb Q: who is requesting an i2c transaction after ->suspend_noirq().
The i2c driver itself should be able to detect that it's being accessed
after this point and return an error.

That being said, I agree that omap_device should still be catching this
case in order to find/fix driver races like this.

> To prevent this from happening, we should ensure that runtime_status
> exactly indicates the device status. As a result of this change
> any further calls to pm_runtime_get* would return -EACCES (since
> disable_depth is 1). On resume, we restore the clocks and runtime
> status exactly as we suspended with.
>
> Reported-by: J Keerthy 
> Signed-off-by: Nishanth Menon 
> Acked-by: Rajendra Nayak 
> ---
> patch baseline: V3.12 tag (also applies on linux-next next-20131107 tag)
>
> Logs from 3.12 based vendor kernel:
> Before: http://pastebin.com/m5KxnB7a
> After: http://pastebin.com/8AfX4e7r
>
> The discussion on cpufreq side of the story which triggered this scenario:
>   http://marc.info/?l=linux-pm=138263811321921=2
>
> Tested on TI vendor kernel (with dt boot):
>   AM335x: evm, BBB, sk, BBW
>   OMAP5uEVM, DRA7-evm
>
>  arch/arm/mach-omap2/omap_device.c |   16 ++--
>  arch/arm/mach-omap2/omap_device.h |1 +
>  2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/omap_device.c 
> b/arch/arm/mach-omap2/omap_device.c
> index b69dd9a..87ecbb0 100644
> --- a/arch/arm/mach-omap2/omap_device.c
> +++ b/arch/arm/mach-omap2/omap_device.c
> @@ -621,6 +621,13 @@ static int _od_suspend_noirq(struct device *dev)
>  
>   if (!ret && !pm_runtime_status_suspended(dev)) {
>   if (pm_generic_runtime_suspend(dev) == 0) {
> + if (!pm_runtime_suspended(dev)) {

Why the addition check for supended here?

This version (as opposed to the _status_suspended() version above will
fail if runtime PM has been disabled from userspace (via
/sys/devices/.../power/control), and will thus prevent low power states
from being hit in suspend if runtime suspend has been disabled from
userspace.  That's a bug.

> + /* NOTE: *might* indicate driver race */

Yes, a driver race which should then be fixed in the driver.

> + dev_dbg(dev, "%s: Force suspending\n",
> + __func__);
> + pm_runtime_set_suspended(dev);
> + od->flags |= OMAP_DEVICE_SUSPEND_FORCED;

Not sure why you need an additonal flag.  Why not just always do this
and use the existin OMAP_DEVICE_SUSPENDED flag.

Kevin

> + }
>   omap_device_idle(pdev);
>   od->flags |= OMAP_DEVICE_SUSPENDED;
>   }
> @@ -634,10 +641,15 @@ static int _od_resume_noirq(struct device *dev)
>   struct platform_device *pdev = to_platform_device(dev);
>   struct omap_device *od = to_omap_device(pdev);
>  
> - if ((od->flags & OMAP_DEVICE_SUSPENDED) &&
> - !pm_runtime_status_suspended(dev)) {
> + if (od->flags & OMAP_DEVICE_SUSPENDED) {
>   od->flags &= ~OMAP_DEVICE_SUSPENDED;
>   omap_device_enable(pdev);
> +
> + if (od->flags & OMAP_DEVICE_SUSPEND_FORCED) {
> + pm_runtime_set_active(dev);
> + od->flags

[PATCH linux-next] cifs: Use data structures to compute NTLMv2 response offsets

2013-11-07 Thread Tim Gardner
A bit of cleanup plus some gratuitous variable renaming. I think using
structures instead of numeric offsets makes this code much more
understandable.

Also added a comment about current time range expected by
the server.

Cc: Jeff Layton 
Cc: Steve French 
Signed-off-by: Tim Gardner 
---

The comment about time of day needing to be within 5 minutes is important (to me
at least). I spent the best part of a week thinking I had endian issues on 
powerpc
when in truth I was just too stupid to notice that the clock
was not updated. Danged embedded platforms...

checkpatch has some problems with this patch regarding attribute packed, but I 
chose
to remain consistent with existing code.

WARNING: __packed is preferred over __attribute__((packed))
#141: FILE: fs/cifs/cifspdu.h:705:
+   } __attribute__((packed)) challenge;

WARNING: __packed is preferred over __attribute__((packed))
#142: FILE: fs/cifs/cifspdu.h:706:
+   } __attribute__((packed));

total: 0 errors, 2 warnings, 99 lines checked

Tested on cifs-2.6 for-linus (c481e9feee78c6ce1ba0a1c8c892049f6514f6cf) by 
mounting
to iOS 10.8 and Win 8.0 Pro.

rtg

 fs/cifs/cifsencrypt.c |   40 
 fs/cifs/cifspdu.h |8 +++-
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index fc6f4f3..4934347 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -548,7 +548,13 @@ static int
 CalcNTLMv2_response(const struct cifs_ses *ses, char *ntlmv2_hash)
 {
int rc;
-   unsigned int offset = CIFS_SESS_KEY_SIZE + 8;
+   struct ntlmv2_resp *ntlmv2 = (struct ntlmv2_resp *)
+   (ses->auth_key.response + CIFS_SESS_KEY_SIZE);
+   unsigned int hash_len;
+
+   /* The MD5 hash starts at challenge_key.key */
+   hash_len = ses->auth_key.len - (CIFS_SESS_KEY_SIZE +
+   offsetof(struct ntlmv2_resp, challenge.key[0]));
 
if (!ses->server->secmech.sdeschmacmd5) {
cifs_dbg(VFS, "%s: can't generate ntlmv2 hash\n", __func__);
@@ -556,7 +562,7 @@ CalcNTLMv2_response(const struct cifs_ses *ses, char 
*ntlmv2_hash)
}
 
rc = crypto_shash_setkey(ses->server->secmech.hmacmd5,
-   ntlmv2_hash, CIFS_HMAC_MD5_HASH_SIZE);
+ntlmv2_hash, CIFS_HMAC_MD5_HASH_SIZE);
if (rc) {
cifs_dbg(VFS, "%s: Could not set NTLMV2 Hash as a key\n",
 __func__);
@@ -570,20 +576,21 @@ CalcNTLMv2_response(const struct cifs_ses *ses, char 
*ntlmv2_hash)
}
 
if (ses->server->negflavor == CIFS_NEGFLAVOR_EXTENDED)
-   memcpy(ses->auth_key.response + offset,
-   ses->ntlmssp->cryptkey, CIFS_SERVER_CHALLENGE_SIZE);
+   memcpy(ntlmv2->challenge.key,
+  ses->ntlmssp->cryptkey, CIFS_SERVER_CHALLENGE_SIZE);
else
-   memcpy(ses->auth_key.response + offset,
-   ses->server->cryptkey, CIFS_SERVER_CHALLENGE_SIZE);
+   memcpy(ntlmv2->challenge.key,
+  ses->server->cryptkey, CIFS_SERVER_CHALLENGE_SIZE);
rc = crypto_shash_update(>server->secmech.sdeschmacmd5->shash,
-   ses->auth_key.response + offset, ses->auth_key.len - offset);
+ntlmv2->challenge.key, hash_len);
if (rc) {
cifs_dbg(VFS, "%s: Could not update with response\n", __func__);
return rc;
}
 
+   /* Note that the MD5 digest over writes anon.challenge_key.key */
rc = crypto_shash_final(>server->secmech.sdeschmacmd5->shash,
-   ses->auth_key.response + CIFS_SESS_KEY_SIZE);
+   ntlmv2->ntlmv2_hash);
if (rc)
cifs_dbg(VFS, "%s: Could not generate md5 hash\n", __func__);
 
@@ -627,7 +634,7 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct 
nls_table *nls_cp)
int rc;
int baselen;
unsigned int tilen;
-   struct ntlmv2_resp *buf;
+   struct ntlmv2_resp *ntlmv2;
char ntlmv2_hash[16];
unsigned char *tiblob = NULL; /* target info blob */
 
@@ -660,13 +667,14 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct 
nls_table *nls_cp)
}
ses->auth_key.len += baselen;
 
-   buf = (struct ntlmv2_resp *)
+   ntlmv2 = (struct ntlmv2_resp *)
(ses->auth_key.response + CIFS_SESS_KEY_SIZE);
-   buf->blob_signature = cpu_to_le32(0x0101);
-   buf->reserved = 0;
-   buf->time = cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
-   get_random_bytes(>client_chal, sizeof(buf->client_chal));
-   buf->reserved2 = 0;
+   ntlmv2->blob_signature = cpu_to_le32(0x0101);
+   ntlmv2->reserved = 0;
+   /* Must be within 5 minutes of the server */
+   ntlmv2->time = cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
+   

Re: [PATCH 2/2] LEDS: tca6502: add device-tree support for GPIO configuration.

2013-11-07 Thread Bryan Wu
On Thu, Oct 31, 2013 at 7:41 PM, NeilBrown  wrote:
>

[PATCH 2/2] LEDS: tca6502: add device-tree support for GPIO configuration.

I think it should be tca6507, right? Typo?

For other parts of this patch, I'm OK for them.

And just a quick scan of the leds-tca6507, I found bunch of typos in
the comments:

* An led-tca6507 device must be provided with platform data.  This data
 * lists for each output: the name, default trigger, and whether the signal
 * is being used as a GPiO rather than an led.  'struct led_plaform_data'

Can we unify to use GPIO and gpio?

 * is used for this.  If 'name' is NULL, the output isn't used.  If 'flags'
 * is TCA6507_MAKE_CPIO, the output is a GPO.

Here should be TCA6507_MAKE_GPIO and GPIO instead of GPO

 * The "struct led_platform_data" can be embedded in a
 * "struct tca6507_platform_data" which adds a 'gpio_base' for the GPiOs,
 * and a 'setup' callback which is called once the GPiOs are available.

Don't use GPiO, please.

Could you please review the code again and submit a cleanup patch to
fix those typos?

Thanks,
-Bryan


> The 7 lines driven by the TCA6507 can either drive LEDs or act as output-only
> GPIOs.
>
> To make this distinction in devicetree we use the "compatible" property.
>
> If the device attached to a line is "compatible" with "gpio", we treat it
> like a GPIO.  If it is "compatible" with "led" (or if no "compatible" value
> is set) we treat it like an LED.
>
> Signed-off-by: NeilBrown 
>
> diff --git a/Documentation/devicetree/bindings/leds/tca6507.txt 
> b/Documentation/devicetree/bindings/leds/tca6507.txt
> index 80ff3dfb1f32..d7221b84987c 100644
> --- a/Documentation/devicetree/bindings/leds/tca6507.txt
> +++ b/Documentation/devicetree/bindings/leds/tca6507.txt
> @@ -2,6 +2,13 @@ LEDs connected to tca6507
>
>  Required properties:
>  - compatible : should be : "ti,tca6507".
> +- #address-cells: must be 1
> +- #size-cells: must be 0
> +- reg: typically 0x45.
> +
> +Optional properties:
> +- gpio-controller: allows lines to be used as output-only GPIOs.
> +- #gpio-cells: if present, must be 0.
>
>  Each led is represented as a sub-node of the ti,tca6507 device.
>
> @@ -10,6 +17,7 @@ LED sub-node properties:
>  - reg : number of LED line (could be from 0 to 6)
>  - linux,default-trigger : (optional)
> see Documentation/devicetree/bindings/leds/common.txt
> +- compatible: either "led" (the default) or "gpio".
>
>  Examples:
>
> @@ -19,6 +27,9 @@ tca6507@45 {
> #size-cells = <0>;
> reg = <0x45>;
>
> +   gpio-controller;
> +   #gpio-cells = <2>;
> +
> led0: red-aux@0 {
> label = "red:aux";
> reg = <0x0>;
> @@ -29,5 +40,10 @@ tca6507@45 {
> reg = <0x5>;
> linux,default-trigger = "default-on";
> };
> +
> +   wifi-reset@6 {
> +   reg = <0x6>;
> +   compatible = "gpio";
> +   };
>  };
>
> diff --git a/drivers/leds/leds-tca6507.c b/drivers/leds/leds-tca6507.c
> index f5063f447463..93a2b1759054 100644
> --- a/drivers/leds/leds-tca6507.c
> +++ b/drivers/leds/leds-tca6507.c
> @@ -638,6 +638,9 @@ static int tca6507_probe_gpios(struct i2c_client *client,
> tca->gpio.direction_output = tca6507_gpio_direction_output;
> tca->gpio.set = tca6507_gpio_set_value;
> tca->gpio.dev = >dev;
> +#ifdef CONFIG_OF_GPIO
> +   tca->gpio.of_node = of_node_get(client->dev.of_node);
> +#endif
> err = gpiochip_add(>gpio);
> if (err) {
> tca->gpio.ngpio = 0;
> @@ -696,6 +699,8 @@ tca6507_led_dt_init(struct i2c_client *client)
> led.default_trigger =
> of_get_property(child, "linux,default-trigger", NULL);
> led.flags = 0;
> +   if (of_property_match_string(child, "compatible", "gpio") >= 
> 0)
> +   led.flags |= TCA6507_MAKE_GPIO;
> ret = of_property_read_u32(child, "reg", );
> if (ret != 0 || reg < 0 || reg >= NUM_LEDS)
> continue;
> @@ -709,6 +714,7 @@ tca6507_led_dt_init(struct i2c_client *client)
>
> pdata->leds.leds = tca_leds;
> pdata->leds.num_leds = NUM_LEDS;
> +   pdata->gpio_base = -1;
>
> return pdata;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] DT: proc: Add runtime overlay interface in /proc

2013-11-07 Thread delicious quinoa
On Tue, Nov 5, 2013 at 12:41 PM, Pantelis Antoniou
 wrote:
> +
> +   pr_info("%s: Applied #%d overlay segments @%d\n", __func__,
> +   od->ovinfo_cnt, od->id);
> +

This could be pr_debug so that we get normally get silence unless
something fails, like insmod.

Also please take out the '#'.

We see it working with our fpga ip driver and it looks really great.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the gpio tree with the arm-soc and cortex trees

2013-11-07 Thread Stephen Rothwell
Hi Uwe,

On Wed, 6 Nov 2013 09:41:17 +0100 Uwe Kleine-König 
 wrote:
>
> BTW, "cortex" is a bit too general. My tree is only about Cortex-M. Also
> you can drop -2.6 from the URL.

Updated both, thanks.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpDNpGVBonys.pgp
Description: PGP signature


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Steven Rostedt
On Fri, 8 Nov 2013 00:21:51 +0100
Frederic Weisbecker  wrote:
 
> Ok I see now.
> 
> But then this irq_work based solution won't work if, say, you run in full 
> dynticks
> mode. Also the hook on the timer interrupt is something that I wish we get rid
> of on archs that can trigger self-IPIs.

Do we really want that? What about users that use LAZY? That is, the
work isn't that important to trigger right now (added interrupt
expense).

> 
> Notwithstanding it's going to have scalibility issues as irq work then 
> converges
> to a single list for unbound works.

Well, it doesn't seem to be something that would be called often. All
CPUs checking a variable that is seldom changed should not have any
scalability issues. Unless of course it happens to share a cache line
with a variable that does change often.

> 
> Offloading to a workqueue would be perhaps better, and writing to the serial
> console could then be done with interrupts enabled, preemptible context, 
> etc...

Oh God no ;-)  Adding workqueue logic into printk just spells a
nightmare of much more complexity for a critical kernel infrastructure.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Steven Rostedt
On Fri, 8 Nov 2013 00:01:11 +0100
Jan Kara  wrote:

> On Thu 07-11-13 23:54:10, Frederic Weisbecker wrote:

> > So if the current CPU can handle it, what is the problem?
>   I hope this gets cleared out in my other email. But to make sure: If
> other CPUs are idle (i.e. not appending to the printk buffer), we can well
> handle the printing on the current CPU (with some breaks to allow
> interrupts to be served etc.). If other CPUs are also appending to printk
> buffer, that's where we really want to push the work to other CPUs.

I guess the question is, how does it migrate? I guess that's not so
clear. Or do you just hope that the timer tick on another CPU will come
in first and finish the job for you? As the list is global and all CPUs
get to see it.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] Embeddable Position Independent Executable

2013-11-07 Thread Kevin Hilman
On Tue, Sep 17, 2013 at 5:43 AM, Russ Dill  wrote:
> This patch adds support for and demonstrates the usage of an embedded
> position independent executable (PIE). The goal is to allow the use of C
> code in situations where carefully written position independent assembly
> was previously required.

Hmm, I thought I had already replied to this but it appears that I haven't.

I really like this approach and think it's a major improvement on what
we're already doing on OMAP and a few other ARM platforms.  This
approach is done in a way that's significantly easier to maintain and
extend IMO.

As a next step, I think you should drop the RFC and split it up into a
few series that can go via the right trees:

- asm-generic and lib/devres (patches 1, 2, 4)
- drivers/sram (patch 3)
- generic PIE: (patch 5)
- core PIE: (patch 5-8)
- AM33xx support (patch 9-11)

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/17] ARM: at91: move to common clk framework

2013-11-07 Thread Mike Turquette
Quoting Nicolas Ferre (2013-10-18 01:18:04)
> On 11/10/2013 09:37, Boris BREZILLON :
> > Hello,
> >
> > This patch series is the 4th version of the new at91 clock implementation
> > (using common clk framework).
> 
> Mike, DT maintainers,
> 
> Can you have a look at this series converting the AT91 family to common 
> clock framework and associated Device Tree description?
> 
> Here is the thread on gmane:
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/272550
> 
> Mike, I think that you already reviewed a previous version of this 
> series and I guess that we have finalized it.
> 
> So gentlemen, It can be good for us if you can give us your "review" or 
> "acknowledgement" tags...

I guess you want to take this through your tree? If so please add:

Acked-by: Mike Turquette 

Regards,
Mike

> 
> Thanks, best regards,
> 
> 
> > Most of the clock provided by the PMC (Power Management Controller) are
> > implemented :
> > - main clock (main oscillator)
> > - pll clocks
> > - master clock
> > - programmable clocks
> > - utmi clock
> > - peripheral clocks
> > - system clocks
> >
> > This implementation is only compatible with device tree definition.
> > The goal is to define the whole clock tree in the device tree.
> >
> > Could a dt maintainer take a look at these dt bindinds.
> >
> > This patch series is based on linus tree (but should apply correctly on
> > linux-next) and has been tested on sama5d31ek.
> >
> > Best Regards,
> > Boris
> >
> > Changes since v3:
> >   - simplify master clk implementation (drop set_rate/parent support)
> >   - fix bug in set_rate function of pll driver
> >   - fix coding style issues
> >   - define macros and constants where needed
> >   - remove peripheral id macro references
> >   - remove sam9g35 specific handling (sam9g35 = sam9x5)
> >   - rework main clk prepare function to handle automatic rate calculation
> >
> > Changes since v2:
> >   - fix several bugs in clk implementations
> >   - drop non-dt boards support
> >   - split the series to ease review and tests:
> > * 1 patch series for new clk implementations (this series)
> > * 1 patch series to move each at91 SoC to common clk framework (coming 
> > soon)
> >   - modify dt-bindings (add atmel,clk- prefix to atmel specific properties)
> >   - add clk macros for dt-bindings
> >   - add pmc framework (helper function to access pmc registers)
> >   - add interrupt support to enable passive wait in clk_prepare functions
> >
> > Changes since v1:
> >   - fix bugs in pll, programmable and system clock implementations
> > (wrong bit position).
> >   - add usb clock configuration support (ohci and udc drivers +
> > clk_lookup for non dt boards)
> >   - rework of the system clock interfaces (no need to define a parent clock,
> > system clock is a gate with no rate info)
> >   - change system, peripheral and programmable clk dt bindings (1 master 
> > node
> > and multiple child nodes each defining a system/peripheral or prog 
> > clock)
> >   - fix bugs in sama5 dt definition
> >
> > Boris BREZILLON (17):
> >ARM: at91: move at91_pmc.h to include/linux/clk/at91_pmc.h
> >ARM: at91: add Kconfig options for common clk support
> >clk: at91: add PMC base support
> >clk: at91: add PMC macro file for dt definitions
> >clk: at91: add PMC main clock
> >clk: at91: add PMC pll clocks
> >clk: at91: add PMC master clock
> >clk: at91: add PMC system clocks
> >clk: at91: add PMC peripheral clocks
> >clk: at91: add peripheral clk macros for peripheral clk dt bindings
> >clk: at91: add PMC programmable clocks
> >clk: at91: add PMC utmi clock
> >clk: at91: add PMC usb clock
> >clk: at91: add PMC smd clock
> >clk: at91: add PMC clk device tree binding doc.
> >ARM: at91: move pit timer to common clk framework
> >ARM: at91: add new compatible strings for pmc driver
> >
> >   .../devicetree/bindings/clock/at91-clock.txt   |  328 
> >   arch/arm/mach-at91/Kconfig |   44 ++
> >   arch/arm/mach-at91/Kconfig.non_dt  |6 +
> >   arch/arm/mach-at91/Makefile|2 +-
> >   arch/arm/mach-at91/at91rm9200.c|2 +-
> >   arch/arm/mach-at91/at91sam9260.c   |2 +-
> >   arch/arm/mach-at91/at91sam9261.c   |2 +-
> >   arch/arm/mach-at91/at91sam9263.c   |2 +-
> >   arch/arm/mach-at91/at91sam926x_time.c  |   14 +-
> >   arch/arm/mach-at91/at91sam9g45.c   |2 +-
> >   arch/arm/mach-at91/at91sam9n12.c   |2 +-
> >   arch/arm/mach-at91/at91sam9rl.c|2 +-
> >   arch/arm/mach-at91/at91sam9x5.c|2 +-
> >   arch/arm/mach-at91/clock.c |7 +-
> >   arch/arm/mach-at91/generic.h   |3 +-
> >   arch/arm/mach-at91/pm.c|2 +-
> >   

[PATCH v3] ext4: Fix reading of extended tv_sec (bug 23732)

2013-11-07 Thread David Turner
On Fri, 2013-11-08 at 00:14 +0100, Jan Kara wrote:
> Still unnecessary type cast here (but that's a cosmetic issue).
...
> Otherwise the patch looks good. You can add:
> Reviewed-by: Jan Kara 

Thanks.  A version with this correction and the reviewed-by follows.

--
In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.  The representation (which this patch does not alter) is a bit
hackish, in that the most-significant bit is no longer (alone)
sufficient to indicate the sign.  That's because we're representing an
asymmetric range, with seven times as many positive values as
negative.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended.  This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Signed-off-by: David Turner 
Reported-by: Mark Harris 
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Reviewed-by: Jan Kara 
---
 fs/ext4/ext4.h | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index af815ea..3c2d0b3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -722,10 +722,15 @@ static inline __le32 ext4_encode_extra_time(struct 
timespec *time)
 
 static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
 {
-   if (sizeof(time->tv_sec) > 4)
-  time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
-  << 32;
-   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> 
EXT4_EPOCH_BITS;
+   if (sizeof(time->tv_sec) > 4) {
+   u64 extra_bits = le32_to_cpu(extra) & EXT4_EPOCH_MASK;
+   if (time->tv_sec > 0 || extra_bits != EXT4_EPOCH_MASK) {
+   time->tv_sec &= 0x;
+   time->tv_sec |= extra_bits << 32;
+   }
+   }
+   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >>
+   EXT4_EPOCH_BITS;
 }
 
 #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: reminder

2013-11-07 Thread Stephen Rothwell
Hi all,

Just a reminder to *not* add any v3.14 material to linux-next until after
v3.13-rc1 has been released.

Also, please resist the temptation to rebase your trees before sending
them upstream.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpYb_OjfDphf.pgp
Description: PGP signature


common location for devicetree files

2013-11-07 Thread Kumar Gala
As we start having more sharing of device trees between architectures (arm & 
arm64, arm & powerpc, guessing maybe mips & arm) we need dts to live in 
location that 

I was wondering what people felt about doing:

arch/dts//

as a common location that could be shared.  I'm up for other suggestions.

- k

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by 
The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 11:57:33PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> > 2013/11/7 Jan Kara :
> > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > are reported by our customers) when other CPUs are using printk heavily
> > > and serial console makes printing slow. Despite serial console drivers
> > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > because interrupts are disabled for the whole time console_unlock() runs
> > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > >
> > > In my artifical testing I can also easily trigger a situation when disk
> > > disappears from the system apparently because interrupt from it wasn't
> > > served for too long. This is why just silencing watchdogs isn't a
> > > reliable solution to the problem and we simply have to avoid spending
> > > too long in console_unlock() with interrupts disabled.
> > >
> > > The solution this patch works toward is to postpone printing to a later
> > > moment / different CPU when we already printed over X characters in
> > > current console_unlock() invocation. This is a crude heuristic but
> > > measuring time we spent printing doesn't seem to be really viable - we
> > > cannot rely on high resolution time being available and with interrupts
> > > disabled jiffies are not updated. User can tune the value X via
> > > printk.offload_chars kernel parameter.
> > >
> > > Reviewed-by: Steven Rostedt 
> > > Signed-off-by: Jan Kara 
> > 
> > When a message takes tens of seconds to be printed, it usually means
> > we are in trouble somehow :)
> > I wonder what printk source can trigger such a high volume.
>   Machines with tens of processors and thousands of scsi devices. When
> device discovery happens on boot, all processors are busily reporting new
> scsi devices and one poor looser is bound to do the printing for ever and
> ever until the machine dies...
> 
> Or try running sysrq-t on a large machine with serial console connected. The
> machine will die because of lockups (although in this case I agree it is more
> of a problem of sysrq-t doing lots of printing in interrupt-disabled
> context).
> 
> > May be cutting some huge message into smaller chunks could help? That
> > would re enable interrupts between each call.
> > 
> > It's hard to tell without the context, but using other CPUs for
> > rescuing doesn't look like a good solution. What if the issue happens
> > in UP to begin with?
>   The real trouble in practice is that while one cpu is doing printing,
> other cpus are appending to the printk buffer. So the cpu can be printing
> for a *long* time. So offloading the work to other cpus which are also
> appending messages seems as a fair thing to do.

Ok I see now.

But then this irq_work based solution won't work if, say, you run in full 
dynticks
mode. Also the hook on the timer interrupt is something that I wish we get rid
of on archs that can trigger self-IPIs.

Notwithstanding it's going to have scalibility issues as irq work then converges
to a single list for unbound works.

Offloading to a workqueue would be perhaps better, and writing to the serial
console could then be done with interrupts enabled, preemptible context, etc...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ext4: Fix reading of extended tv_sec (bug 23732)

2013-11-07 Thread Jan Kara
On Thu 07-11-13 17:54:24, David Turner wrote:
> On Thu, 2013-11-07 at 17:03 +0100, Jan Kara wrote:
> >   So I'm somewhat wondering: Previously we decoded tv_nsec regardless of
> > tv_sec size. After your patch we do it only if sizeof(time->tv_sec) > 4. Is
> > this an intended change? Why is it OK?
> 
> This is an error.  Here is a corrected version of the patch. 
> 
> 
> --
> 
> In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
> the {a,c,m}time fields, deferring the year 2038 problem to the year
> 2446.  The representation (which this patch does not alter) is a bit
> hackish, in that the most-significant bit is no longer (alone)
> sufficient to indicate the sign.  That's because we're representing an
> asymmetric range, with seven times as many positive values as
> negative.
> 
> When decoding these extended fields, for times whose bottom 32 bits
> would represent a negative number, sign extension causes the 64-bit
> extended timestamp to be negative as well, which is not what's
> intended.  This patch corrects that issue, so that the only negative
> {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
> timestamps).
> 
> Signed-off-by: David Turner 
> Reported-by: Mark Harris 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
> ---
>  fs/ext4/ext4.h | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index af815ea..3c2d0b3 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -722,10 +722,15 @@ static inline __le32 ext4_encode_extra_time(struct 
> timespec *time)
>  
>  static inline void ext4_decode_extra_time(struct timespec *time, __le32 
> extra)
>  {
> -   if (sizeof(time->tv_sec) > 4)
> -time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
> -<< 32;
> -   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> 
> EXT4_EPOCH_BITS;
> + if (sizeof(time->tv_sec) > 4) {
> + u64 extra_bits = (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK);
  
Still unnecessary type cast here (but that's a cosmetic issue).

Otherwise the patch looks good. You can add:
Reviewed-by: Jan Kara 

Honza

> + if (time->tv_sec > 0 || extra_bits != EXT4_EPOCH_MASK) {
> + time->tv_sec &= 0x;
> + time->tv_sec |= extra_bits << 32;
> + }
> + }
> + time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >>
> + EXT4_EPOCH_BITS;
>  }
>  
>  #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode)
>\
> -- 
> 1.8.1.2
> 
> 
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] dmaengine: add msm bam dma driver

2013-11-07 Thread Andy Gross
On Thu, Oct 31, 2013 at 04:46:21PM -0500, Andy Gross wrote:
> On Thu, Oct 31, 2013 at 10:29:48PM +0530, Vinod Koul wrote:
> > On Fri, Oct 25, 2013 at 03:24:02PM -0500, Andy Gross wrote:
> > 
> > This should be sent to dmaeng...@vger.kernel.org
>  
> I'll add the list when I send the second iteration or should I send it over 
> mid
> stream?
> 
> > > Add the DMA engine driver for the MSM Bus Access Manager (BAM) DMA 
> > > controller
> > > found in the MSM 8x74 platforms.
> > > 
> > > Each BAM DMA device is associated with a specific on-chip peripheral.  
> > > Each
> > > channel provides a uni-directional data transfer engine that is capable of
> > > transferring data between the peripheral and system memory (System mode) 
> > > or
> > > between two peripherals (BAM2BAM).
> > > 
> > > The initial release of this driver only supports slave transfers between
> > > peripherals and system memory.
> > > 
> > > Signed-off-by: Andy Gross 
> > > +/*
> > > + * bam_alloc_chan - Allocate channel resources for DMA channel.
> > > + * @chan: specified channel
> > > + *
> > > + * This function allocates the FIFO descriptor memory and resets the 
> > > channel
> > > + */
> > > +static int bam_alloc_chan(struct dma_chan *chan)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(chan);
> > > + struct bam_device *bdev = bchan->device;
> > > + u32 val;
> > > + union bam_pipe_ctrl pctrl;
> > > +
> > > + /* check for channel activity */
> > > + pctrl.value = ioread32(bdev->regs + BAM_P_CTRL(bchan->id));
> > > + if (pctrl.bits.p_en) {
> > > + dev_err(bdev->dev, "channel already active\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + /* allocate FIFO descriptor space */
> > > + bchan->fifo_virt = (struct bam_desc_hw *)dma_alloc_coherent(bdev->dev,
> > > + BAM_DESC_FIFO_SIZE, >fifo_phys,
> > > + GFP_KERNEL);
> > > +
> > > + if (!bchan->fifo_virt) {
> > > + dev_err(bdev->dev, "Failed to allocate descriptor fifo\n");
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + /* reset channel */
> > > + iowrite32(1, bdev->regs + BAM_P_RST(bchan->id));
> > > + iowrite32(0, bdev->regs + BAM_P_RST(bchan->id));
> > > +
> > > + /* configure fifo address/size in bam channel registers */
> > > + iowrite32(bchan->fifo_phys, bdev->regs +
> > > + BAM_P_DESC_FIFO_ADDR(bchan->id));
> > > + iowrite32(BAM_DESC_FIFO_SIZE, bdev->regs +
> > > + BAM_P_FIFO_SIZES(bchan->id));
> > > +
> > > + /* unmask and enable interrupts for defined EE, bam and error irqs */
> > > + iowrite32(BAM_IRQ_MSK, bdev->regs + BAM_IRQ_SRCS_EE(bchan->ee));
> > > +
> > > + /* enable the per pipe interrupts, enable EOT and INT irqs */
> > > + iowrite32(P_DEFAULT_IRQS_EN, bdev->regs + BAM_P_IRQ_EN(bchan->id));
> > > +
> > > + /* unmask the specific pipe and EE combo */
> > > + val = ioread32(bdev->regs + BAM_IRQ_SRCS_MSK_EE(bchan->ee));
> > > + val |= 1 << bchan->id;
> > > + iowrite32(val, bdev->regs + BAM_IRQ_SRCS_MSK_EE(bchan->ee));
> > > +
> > > + /* set fixed direction and mode, then enable channel */
> > I was going to question why you are doing hw specfic stuff in alloc channel 
> > but
> > now why do you enable seems to be a bigger question in mind?
> 
> The fifo_virt is used to store the hardware descriptors that are used directly
> by the dma controller.  I have to at least fill in the descriptor FIFO address
> and size and reset the channel to clear the fifo offset inside the hardware.
> After reset the internal fifo offset is 0.  And every subsequent transaction
> increments this.  That is how it knows which descriptors to work on inside the
> descriptor fifo memory.
> 
> I can definitely defer the rest of hte h/w interactions until the point that I
> need to actually kick off the dma controller.
> 
> 
> > > + pctrl.value = 0;
> > > + pctrl.bits.p_direction =
> > > + (bchan->bam_slave.slave.direction == DMA_DEV_TO_MEM) ?
> > > + BAM_PIPE_PRODUCER : BAM_PIPE_CONSUMER;
> > > + pctrl.bits.p_sys_mode = BAM_PIPE_MODE_SYSTEM;
> > > + pctrl.bits.p_en = 1;
> > > +
> > > + iowrite32(pctrl.value, bdev->regs + BAM_P_CTRL(bchan->id));
> > > +
> > > + /* set desc threshold */
> > > + /* do bookkeeping for tracking used EEs, used during IRQ handling */
> > > + set_bit(bchan->ee, >enabled_ees);
> > > +
> > > + bchan->head = 0;
> > > + bchan->tail = 0;
> > > +
> > > + return 0;
> > You said you are going to allocate descriptors so right thing would be 
> > return
> > number of allocated desc here!
> 
> OK, I missed that.
> 
> > > +}
> > > +
> > > +/*
> > > + * bam_free_chan - Frees dma resources associated with specific channel
> > > + * @chan: specified channel
> > > + *
> > > + * Free the allocated fifo descriptor memory and channel resources
> > > + *
> > > + */
> > > +static void bam_free_chan(struct dma_chan *chan)
> > > +{
> > > + struct bam_chan *bchan = to_bam_chan(chan);
> > > + struct bam_device *bdev = bchan->device;
> > > + u32 val;
> > > 

Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Jan Kara
On Thu 07-11-13 23:54:10, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 11:50:34PM +0100, Jan Kara wrote:
> > On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> > > On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > > > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > > > But then, who's going to process that work if every CPUs is idle?
> > > >   Have a look into irq_work_queue(). There is:
> > > > /*
> > > >  * If the work is not "lazy" or the tick is stopped, raise the 
> > > > irq
> > > >  * work interrupt (if supported by the arch), otherwise, just 
> > > > wait
> > > >  * for the next tick. We do this even for unbound work to make 
> > > > sure
> > > >  * *some* CPU will be doing the work.
> > > >  */
> > > > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) 
> > > > {
> > > > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > > > arch_irq_work_raise();
> > > > }
> > > > 
> > > >   So we raise an interrupt if there would be no timer ticking (which is
> > > > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > > > patches...
> > > 
> > > Ok but we raise that interrupt locally, not to the other CPUs.
> >   True, but that doesn't really matter in this case. Any CPU (including the
> > local one) can handle the unbound work. So from the definition of the
> > unbound work things are OK.
> 
> I don't see how that can be ok. You want to offline a work because the
> local CPU can't handle it, right? If the local CPU can handle it you can
> just use local irq works.
> 
> > Regarding my use for printk - if all (other) CPUs are idle then we can
> > easily afford making the current cpu busy printing, that's not a problem.
> > There's nothing else to do than to print what's remaining in the printk
> > buffer...
> 
> So if the current CPU can handle it, what is the problem?
  I hope this gets cleared out in my other email. But to make sure: If
other CPUs are idle (i.e. not appending to the printk buffer), we can well
handle the printing on the current CPU (with some breaks to allow
interrupts to be served etc.). If other CPUs are also appending to printk
buffer, that's where we really want to push the work to other CPUs.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 - V2] Introducing Device Tree Overlays

2013-11-07 Thread Guenter Roeck
On Thu, Nov 07, 2013 at 09:46:26PM +0100, Sebastian Andrzej Siewior wrote:
> On 07.11.13, Pantelis Antoniou wrote:
> > Hi Sebastian,
> Hi Pantelis,
> 
> > FWIW DT has been ported to x86. And is present on arm/powerpc/mips/arc and 
> > possibly
> > others.
> 
> Yes, I know. I am the one that did the work for CE4100, the first one
> that boots with DT on x86.
> 
> > So what are we talking about again? If you care about the non-DT case, why
> > don't you make a patch about how you could support Guenter's use case on
> > the x86.
> 
> I am only saying that this "hot-plug a device at a non hot-plugagle bus at
> runtime" is not limited to DT but this solution is. X86 + ACPI is not
> the only limitation. ARM is (forced) going to ACPI as well as far I
> know. And this solution is limited to DT. This is what I am pointing
> out.
> 
I can't tell about ARM, but I am not entirely sure how ACPI support on ARM
is going to help us on powerpc.

> > His use case is not uncommon, believe it or not, and x86 would benefit from
> > something this flexible.
> 
> I *think* a more flexible solution would be something like bus_type which is
> exposed via configfs. It would be attached behind a certain device/bus where
> the "physical" hotplug interface is. The user would then be able to read the
> configuration based on whatever information he has and could then create
> devices he likes at runtime. This wouldn't depend much on the firmware that is
> used but would require a little more work I think.
> 
Quite frankly, I am interested at a solution that works and solves our problems.
I am not looking for something that is 100% perfect and may never be delivered.

Fortunately, the Linux kernel was willing to adopt multiple different file
systems, and still accepts new ones on a regular basis. If a new file system
is better, it will start getting used, and old file systems are being phased out
as fewer people use them. I would hope the same should be possible with DT
overlays and possible other future solutions for the same problem, and that
we won't have to wait for the perfect solution from day 1.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Steven Rostedt
On Thu, 7 Nov 2013 23:43:52 +0100
Frederic Weisbecker  wrote:


> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.

The only ones that I'm aware of is the prints from sysrq. Which
includes a ftrace_dump as well. These have caused issues in the past,
but usually I do them on boxes that I reboot afterward anyway.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Jan Kara
On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara :
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> >
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because interrupt from it wasn't
> > served for too long. This is why just silencing watchdogs isn't a
> > reliable solution to the problem and we simply have to avoid spending
> > too long in console_unlock() with interrupts disabled.
> >
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over X characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. User can tune the value X via
> > printk.offload_chars kernel parameter.
> >
> > Reviewed-by: Steven Rostedt 
> > Signed-off-by: Jan Kara 
> 
> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.
  Machines with tens of processors and thousands of scsi devices. When
device discovery happens on boot, all processors are busily reporting new
scsi devices and one poor looser is bound to do the printing for ever and
ever until the machine dies...

Or try running sysrq-t on a large machine with serial console connected. The
machine will die because of lockups (although in this case I agree it is more
of a problem of sysrq-t doing lots of printing in interrupt-disabled
context).

> May be cutting some huge message into smaller chunks could help? That
> would re enable interrupts between each call.
> 
> It's hard to tell without the context, but using other CPUs for
> rescuing doesn't look like a good solution. What if the issue happens
> in UP to begin with?
  The real trouble in practice is that while one cpu is doing printing,
other cpus are appending to the printk buffer. So the cpu can be printing
for a *long* time. So offloading the work to other cpus which are also
appending messages seems as a fair thing to do.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: zsmalloc: Ensure handle is never 0 on success

2013-11-07 Thread Olav Haugan
On 11/6/2013 7:06 PM, Greg KH wrote:
> On Wed, Nov 06, 2013 at 04:00:02PM -0800, Olav Haugan wrote:
>> On 11/5/2013 5:56 PM, Greg KH wrote:
>>> On Tue, Nov 05, 2013 at 04:54:12PM -0800, Olav Haugan wrote:
 zsmalloc encodes a handle using the page pfn and an object
 index. On some hardware platforms the pfn could be 0 and this
 causes the encoded handle to be 0 which is interpreted as an
 allocation failure.
>>>
>>> What platforms specifically have this issue?
>>
>> Currently some of Qualcomm SoC's have physical memory that starts at
>> address 0x0 which causes this problem.
> 
> Then say this, and list the exact SoC's that can have this problem so
> people know how to evaluate the bugfix and see if it is relevant for
> their systems.
> 
>> I believe this could be a problem
>> on any platforms if memory is configured to be starting at physical
>> address 0x0 for these platforms.
> 
> Have you seen this be a problem?  So it's just a theoretical issue at
> this point in time?

Yes, I can consistently reproduce it. It is not just theoretical.

Thanks,

Olav Haugan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] LEDS: tca6507 - fix bugs in parsing of device-tree configuration.

2013-11-07 Thread Bryan Wu
On Thu, Oct 31, 2013 at 7:33 PM, NeilBrown  wrote:
>
>
> 1/ The led_info array must be allocated to allow the full number
>   of LEDs even if not all are present.  The array maybe be sparsely
>   filled but it is indexed by device address so we must at least
>   allocate as many slots as the highest address used.  It is easiest
>   just to allocate all 7.
>
> 2/ range check the 'reg' value properly.
>
> 3/ led.flags must be initialised to zero, else all leds could
>be treated as GPIOs (depending on what happens to be on the stack).
>
> Signed-off-by: NeilBrown 
>

Thanks, queued into devel branch of my tree for 3.14 kernel.

-Bryan

> diff --git a/drivers/leds/leds-tca6507.c b/drivers/leds/leds-tca6507.c
> index 8cc304f36728..f5063f447463 100644
> --- a/drivers/leds/leds-tca6507.c
> +++ b/drivers/leds/leds-tca6507.c
> @@ -682,7 +682,7 @@ tca6507_led_dt_init(struct i2c_client *client)
> return ERR_PTR(-ENODEV);
>
> tca_leds = devm_kzalloc(>dev,
> -   sizeof(struct led_info) * count, GFP_KERNEL);
> +   sizeof(struct led_info) * NUM_LEDS, GFP_KERNEL);
> if (!tca_leds)
> return ERR_PTR(-ENOMEM);
>
> @@ -695,9 +695,9 @@ tca6507_led_dt_init(struct i2c_client *client)
> of_get_property(child, "label", NULL) ? : child->name;
> led.default_trigger =
> of_get_property(child, "linux,default-trigger", NULL);
> -
> +   led.flags = 0;
> ret = of_property_read_u32(child, "reg", );
> -   if (ret != 0)
> +   if (ret != 0 || reg < 0 || reg >= NUM_LEDS)
> continue;
>
> tca_leds[reg] = led;
> @@ -708,7 +708,7 @@ tca6507_led_dt_init(struct i2c_client *client)
> return ERR_PTR(-ENOMEM);
>
> pdata->leds.leds = tca_leds;
> -   pdata->leds.num_leds = count;
> +   pdata->leds.num_leds = NUM_LEDS;
>
> return pdata;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ext4: Fix reading of extended tv_sec (bug 23732)

2013-11-07 Thread David Turner
On Thu, 2013-11-07 at 17:03 +0100, Jan Kara wrote:
>   So I'm somewhat wondering: Previously we decoded tv_nsec regardless of
> tv_sec size. After your patch we do it only if sizeof(time->tv_sec) > 4. Is
> this an intended change? Why is it OK?

This is an error.  Here is a corrected version of the patch. 


--

In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.  The representation (which this patch does not alter) is a bit
hackish, in that the most-significant bit is no longer (alone)
sufficient to indicate the sign.  That's because we're representing an
asymmetric range, with seven times as many positive values as
negative.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended.  This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Signed-off-by: David Turner 
Reported-by: Mark Harris 
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
---
 fs/ext4/ext4.h | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index af815ea..3c2d0b3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -722,10 +722,15 @@ static inline __le32 ext4_encode_extra_time(struct 
timespec *time)
 
 static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
 {
-   if (sizeof(time->tv_sec) > 4)
-  time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
-  << 32;
-   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> 
EXT4_EPOCH_BITS;
+   if (sizeof(time->tv_sec) > 4) {
+   u64 extra_bits = (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK);
+   if (time->tv_sec > 0 || extra_bits != EXT4_EPOCH_MASK) {
+   time->tv_sec &= 0x;
+   time->tv_sec |= extra_bits << 32;
+   }
+   }
+   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >>
+   EXT4_EPOCH_BITS;
 }
 
 #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 11:50:34PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> > On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > > But then, who's going to process that work if every CPUs is idle?
> > >   Have a look into irq_work_queue(). There is:
> > > /*
> > >  * If the work is not "lazy" or the tick is stopped, raise the irq
> > >  * work interrupt (if supported by the arch), otherwise, just wait
> > >  * for the next tick. We do this even for unbound work to make 
> > > sure
> > >  * *some* CPU will be doing the work.
> > >  */
> > > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > > arch_irq_work_raise();
> > > }
> > > 
> > >   So we raise an interrupt if there would be no timer ticking (which is
> > > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > > patches...
> > 
> > Ok but we raise that interrupt locally, not to the other CPUs.
>   True, but that doesn't really matter in this case. Any CPU (including the
> local one) can handle the unbound work. So from the definition of the
> unbound work things are OK.

I don't see how that can be ok. You want to offline a work because the local CPU
can't handle it, right? If the local CPU can handle it you can just use local
irq works.

> 
> Regarding my use for printk - if all (other) CPUs are idle then we can
> easily afford making the current cpu busy printing, that's not a problem.
> There's nothing else to do than to print what's remaining in the printk
> buffer...

So if the current CPU can handle it, what is the problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kmod: Run usermodehelpers only on cpus allowed for kthreadd V2

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 04:43:11PM +, Christoph Lameter wrote:
> usermodehelper() threads can currently run on all processors.
> This is an issue for low latency cores. Spawnig a new thread causes
> cpu holdoffs in the range of hundreds of microseconds to a few
> milliseconds. Not good for cores on which processes run that need
> to react as fast as possible.
> 
> kthreadd threads can be restricted using taskset to a limited set
> of processors. Then the kernel thread pool will not fork processes
> on those anymore thereby protecting those processors from additional
> latencies.
> 
> Make usermodehelper() threads obey the limitations that kthreadd is
> restricted to. Kthreadd is not the parent of usermodehelper threads
> so we need to explicitly get the allowed processors for kthreadd.
> 
> Before this patch there is no way to limit the cpus that usermodehelper
> can run on since the affinity is set when the thread is spawned to
> all processors.
> 
> Signed-off-by: Christoph Lameter 
> 
> Index: linux/include/linux/kthread.h
> ===
> --- linux.orig/include/linux/kthread.h2013-11-07 10:31:46.667807582 
> -0600
> +++ linux/include/linux/kthread.h 2013-11-07 10:31:46.663807693 -0600
> @@ -51,6 +51,7 @@ void kthread_parkme(void);
>  int kthreadd(void *unused);
>  extern struct task_struct *kthreadd_task;
>  extern int tsk_fork_get_node(struct task_struct *tsk);
> +void set_kthreadd_affinity(void);
> 
>  /*
>   * Simple work processor based on kthread.
> Index: linux/kernel/kmod.c
> ===
> --- linux.orig/kernel/kmod.c  2013-11-07 10:31:46.667807582 -0600
> +++ linux/kernel/kmod.c   2013-11-07 10:35:28.825645008 -0600
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
> 
> @@ -209,8 +210,13 @@ static int call_usermodehelper(void
>   flush_signal_handlers(current, 1);
>   spin_unlock_irq(>sighand->siglock);
> 
> - /* We can run anywhere, unlike our parent keventd(). */
> - set_cpus_allowed_ptr(current, cpu_all_mask);
> + /*
> +  * Kthreadd can be restricted to a set of processors if the user wants 
> to
> +  * protect other processors from OS latencies. If that has happened then
> +  * we do not want to disturb the other processors here either so we 
> start
> +  * the usermode helper threads only on the processors allowed for 
> kthreadd.
> +  */
> + set_kthreadd_affinity();

I'm sorry to burden again on this but this looks too odd.

usermodehelper works are created via workqueues, right? And workqueues are an 
issue as
well for those who want CPU isolation.

So this looks like a more general problem than just call_usermodehelper.

Last time I checked, it seemed to me that this is an unbound workqueue? If so 
can't we tune
the affinity of this workqueue? If not perhaps that's something we want to 
solve and which
could be useful for all the users who don't want their CPU to be disturbed.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] wire up CPU features to udev based module loading

2013-11-07 Thread H. Peter Anvin
On 11/07/2013 02:15 PM, Ard Biesheuvel wrote:
> 
> That would involve repurposing/generalizing a bit more of the existing
> x86-only code than I did the first time around, but if you (as x86
> maintainers) are happy with that, I'm all for it.
> 
> I do have a couple of questions then
> - the module aliases host tool has no arch specific dependencies at
> all except having x86cpu as one of the entries: would you mind
> dropping the x86 prefix there? Or rather add dependencies on $ARCH?
> (If we drop it there, we basically end up with 'cpu:' everywhere)

I think it makes sense to indicate what kind of CPU the string refers
to, as the top-level indicator of what is going on.  This might be
possible to macroize the generation of this prefix, though.

> - in the vendor/family/model case, it may be preferable to drop these
> fields entirely from certain modules' aliases if they match on 'any'
> (provided that the module tools permit this) rather than add
> architecture, variant, revision, etc  fields for all architectures if
> they can only ever match on one

I think that can be CPU dependent.

> - some of the X86_ macros would probable be redefined in terms of the
> generic macros rather than the other way around, which would result in
> some changes under arch/x86 as well, is that acceptable for you?

If you are talking about X86_FEATURE_* then almost certainly no,
although I'm willing to listen to what you have in mind.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 - V2] Introducing Device Tree Overlays

2013-11-07 Thread Guenter Roeck
On Thu, Nov 07, 2013 at 10:06:31PM +0200, Pantelis Antoniou wrote:
> Hi Sebastian,
> 
> On Nov 7, 2013, at 9:25 PM, Sebastian Andrzej Siewior wrote:
> 
> > On 06.11.13, Guenter Roeck wrote:
> > |…
> > thanks for the explanation.
> > 
> >> We use DT overlays to describe the hardware on those boards and, if 
> >> necessary,
> >> its configuration. For example, if there is a PCIe switch, the overlay 
> >> would
> >> describe its memory and bus number configuration.
> > 
> > So have your "fix" configuration and a few overlays you switch at
> > runtime. The problem you have is that you want to switch a specific part
> > if your configuration at runtime. I assume you run DT on ARM. What

powerpc

> > happens if you swtich from ARM to x86 and you "keep" your FPGA
> > configuration requirement? You can't use both, DT and ACPI, right? So
> > what happens then?
> > 
> 
> FWIW DT has been ported to x86. And is present on arm/powerpc/mips/arc and 
> possibly
> others.
> 
> So what are we talking about again? If you care about the non-DT case, why
> don't you make a patch about how you could support Guenter's use case on
> the x86.
> 
> His use case is not uncommon, believe it or not, and x86 would benefit from
> something this flexible.
> 
Together with the work Thierry has done a couple of years ago, using DT
to augment ACPI data on x86 platforms, I don't really see a major problem
with using DT overlays on x86. Sure, it will require some work, and the
resulting patches may not be accepted for upstream integration,
but the concept is already there, and we plan to make good use of it.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Jan Kara
On Thu 07-11-13 23:23:14, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> > On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > > But then, who's going to process that work if every CPUs is idle?
> >   Have a look into irq_work_queue(). There is:
> > /*
> >  * If the work is not "lazy" or the tick is stopped, raise the irq
> >  * work interrupt (if supported by the arch), otherwise, just wait
> >  * for the next tick. We do this even for unbound work to make sure
> >  * *some* CPU will be doing the work.
> >  */
> > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > arch_irq_work_raise();
> > }
> > 
> >   So we raise an interrupt if there would be no timer ticking (which is
> > what I suppose you mean by "CPU is idle"). That is nothing changed by my
> > patches...
> 
> Ok but we raise that interrupt locally, not to the other CPUs.
  True, but that doesn't really matter in this case. Any CPU (including the
local one) can handle the unbound work. So from the definition of the
unbound work things are OK.

Regarding my use for printk - if all (other) CPUs are idle then we can
easily afford making the current cpu busy printing, that's not a problem.
There's nothing else to do than to print what's remaining in the printk
buffer...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG][ext2] XIP does not work on ext2

2013-11-07 Thread Andiry Xu
On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara  wrote:
> On Thu 07-11-13 13:50:09, Andiry Xu wrote:
>> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara  wrote:
>> > On Thu 07-11-13 12:14:13, Andiry Xu wrote:
>> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara  wrote:
>> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
>> >> >> >> Do you know the reason why write() outperforms mmap() in some 
>> >> >> >> cases? I
>> >> >> >> know it's not related the thread but I really appreciate if you can
>> >> >> >> answer my question.
>> >> >> >   Well, I'm not completely sure. mmap()ed memory always works on 
>> >> >> > page-by-page
>> >> >> > basis - you first access the page, it gets faulted in and you can 
>> >> >> > further
>> >> >> > access it. So for small (sub page size) accesses this is a win 
>> >> >> > because you
>> >> >> > don't have an overhead of syscall and fs write path. For accesses 
>> >> >> > larger
>> >> >> > than page size the overhead of syscall and some initial checks is 
>> >> >> > well
>> >> >> > hidden by other things. I guess write() ends up being more efficient
>> >> >> > because write path taken for each page is somewhat lighter than full 
>> >> >> > page
>> >> >> > fault. But you'd need to look into perf data to get some hard 
>> >> >> > numbers on
>> >> >> > where the time is spent.
>> >> >> >
>> >> >>
>> >> >> Thanks for the reply. However I have filled up the whole RAM disk
>> >> >> before doing the test, i.e. asked the brd driver to allocate all the
>> >> >> pages initially.
>> >> >   Well, pages in ramdisk are always present, that's not an issue. But 
>> >> > you
>> >> > will get a page fault to map a particular physical page in process'
>> >> > virtual address space when you first access that virtual address in the
>> >> > mapping from the process. The cost of setting up this virtual->physical
>> >> > mapping is what I'm talking about.
>> >> >
>> >>
>> >> Yes, you are right, there are page faults observed with perf. I
>> >> misunderstood page fault as copying pages between backing store and
>> >> physical memory.
>> >>
>> >> > If you had a process which first mmaps the file and writes to all pages 
>> >> > in
>> >> > the mapping and *then* measure the cost of another round of writing to 
>> >> > the
>> >> > mapping, I would expect you should see speeds close to those of memory 
>> >> > bus.
>> >> >
>> >>
>> >> I've tried this as well. mmap() performance improves but still not as
>> >> good as write().
>> >> I used the perf report to compare write() and mmap() applications. For
>> >> write() version, top of perf report shows as:
>> >> 33.33%  __copy_user_nocache
>> >> 4.72%ext2_get_blocks
>> >> 4.42%mutex_unlock
>> >> 3.59%__find_get_block
>> >>
>> >> which looks reasonable.
>> >>
>> >> However, for mmap() version, the perf report looks strange:
>> >> 94.98% libc-2.15.so   [.] 0x0014698d
>> >> 2.25%   page_fault
>> >> 0.18%   handle_mm_fault
>> >>
>> >> I don't know what the first item is but it took the majority of cycles.
>> >   The first item means that it's some userspace code in libc. My guess
>> > would be that it's libc's memcpy() function (or whatever you use to write
>> > to mmap). How do you access the mmap?
>> >
>>
>> Like this:
>>
>> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
>> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
>> for (i = 0; i < count; i++)
>> {
>>memcpy(dest, src, request_size);
>>dest += request_size;
>> }
>   OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how
> to tune that though...
>

Hmm, I will try some different kinds of memcpy to see if there is a
difference. Just want to make sure I do not make some stupid mistakes
before trying that.
Thanks a lot for your help!

Thanks,
Andiry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
2013/11/7 Jan Kara :
> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because interrupt from it wasn't
> served for too long. This is why just silencing watchdogs isn't a
> reliable solution to the problem and we simply have to avoid spending
> too long in console_unlock() with interrupts disabled.
>
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over X characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. User can tune the value X via
> printk.offload_chars kernel parameter.
>
> Reviewed-by: Steven Rostedt 
> Signed-off-by: Jan Kara 

When a message takes tens of seconds to be printed, it usually means
we are in trouble somehow :)
I wonder what printk source can trigger such a high volume.

May be cutting some huge message into smaller chunks could help? That
would re enable interrupts between each call.

It's hard to tell without the context, but using other CPUs for
rescuing doesn't look like a good solution. What if the issue happens
in UP to begin with?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/5] MCS Lock: Barrier corrections

2013-11-07 Thread Michel Lespinasse
On Thu, Nov 7, 2013 at 2:21 PM, Peter Zijlstra  wrote:
> On Thu, Nov 07, 2013 at 01:15:51PM -0800, Tim Chen wrote:
>> Michel, are you planning to do an implementation of
>> load-acquire/store-release functions of various architectures?
>
> A little something like this:
> http://marc.info/?l=linux-arch=138386254111507
>
> It so happens we were working on that the past week or so due to another
> issue ;-)

Haha, awesome, I wasn't aware of this effort.

Tim: my approach would be to provide the acquire/release operations in
arch-specific include files, and have a default implementation using
barriers for arches who don't provide these new ops. That way you make
it work on all arches at once (using the default implementation) and
make it fast on any arch that cares.

>> Or is the approach of arch specific memory barrier for MCS
>> an acceptable one before load-acquire and store-release
>> are available?  Are there any technical issues remaining with
>> the patchset after including including Waiman's arch specific barrier?

I don't want to stand in the way of Waiman's change, and I had
actually taken the same approach with arch-specific barriers when
proposing some queue spinlocks in the past; however I do feel that
this comes back regularly enough that having acquire/release
primitives available would help, hence my proposal.

That said, earlier in the thread Linus said we should probably get all
our ducks in a row before going forward with this, so...

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


usb-serial lockdep trace in linus' current tree.

2013-11-07 Thread Dave Jones
Seeing this since todays USB merge.

WARNING: CPU: 0 PID: 226 at kernel/lockdep.c:2740 
lockdep_trace_alloc+0xc5/0xd0()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
Modules linked in: usb_debug(+) kvm_intel kvm crct10dif_pclmul crc32c_intel 
ghash_clmulni_intel microcode(+) pcspkr serio_raw
CPU: 0 PID: 226 Comm: systemd-udevd Not tainted 3.12.0+ #112
 81a22d3d 88023cde5670 8171a8e8 88023cde56b8
 88023cde56a8 8105430d 0046 80d0
 0010 0001 880244407a80 88023cde5708
Call Trace:
 [] dump_stack+0x4e/0x82
 [] warn_slowpath_common+0x7d/0xa0
 [] warn_slowpath_fmt+0x4c/0x50
 [] lockdep_trace_alloc+0xc5/0xd0
 [] __kmalloc+0x53/0x350
 [] ? xhci_urb_enqueue+0xb7/0x610
 [] ? debug_dma_mapping_error+0x7c/0x90
 [] xhci_urb_enqueue+0xb7/0x610
 [] usb_hcd_submit_urb+0xa6/0xae0
 [] ? native_sched_clock+0x24/0x80
 [] ? trace_hardirqs_off_caller+0x1f/0xc0
 [] ? native_sched_clock+0x24/0x80
 [] ? trace_hardirqs_off_caller+0x1f/0xc0
 [] ? get_lock_stats+0x19/0x60
 [] ? put_lock_stats.isra.28+0xe/0x40
 [] usb_submit_urb+0x1f9/0x470
 [] usb_serial_generic_write_start+0xf5/0x210
 [] usb_serial_generic_write+0x70/0x90
 [] usb_console_write+0xc7/0x220
 [] call_console_drivers.constprop.23+0xa5/0x1e0
 [] console_unlock+0x40c/0x460
 [] register_console+0x12c/0x390
 [] usb_serial_console_init+0x22/0x40
 [] usb_serial_probe+0xfea/0x10e0
 [] ? native_sched_clock+0x24/0x80
 [] ? trace_hardirqs_off_caller+0x1f/0xc0
 [] ? get_lock_stats+0x19/0x60
 [] ? __mutex_unlock_slowpath+0xed/0x1a0
 [] ? trace_hardirqs_on_caller+0x115/0x1e0
 [] ? trace_hardirqs_on+0xd/0x10
 [] usb_probe_interface+0x1cf/0x300
 [] driver_probe_device+0x87/0x390
 [] __driver_attach+0x93/0xa0
 [] ? __device_attach+0x40/0x40
 [] bus_for_each_dev+0x6b/0xb0
 [] driver_attach+0x1e/0x20
 [] usb_serial_register_drivers+0x29e/0x580
 [] ? 0xa0004fff
 [] usb_serial_module_init+0x1e/0x1000 [usb_debug]
 [] do_one_initcall+0xf2/0x1a0
 [] ? set_memory_nx+0x43/0x50
 [] load_module+0x1fd2/0x26a0
 [] ? store_uevent+0x40/0x40
 [] SyS_finit_module+0x86/0xb0
 [] tracesys+0xdd/0xe2
---[ end trace ee033a3c9fd6263b ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] platform: add chrome platform directory

2013-11-07 Thread Olof Johansson
It makes sense to split out the Chromebook/Chromebox hardware platform
drivers to a separate subdirectory, since some of it will be shared
between ARM and x86.

This moves over the existing chromeos_laptop driver without making
any other changes, and adds appropriate Kconfig entries for the new
directory. It also adds a MAINTAINERS entry for the new subdir.

Signed-off-by: Olof Johansson 
---
 MAINTAINERS|  5 
 drivers/platform/Kconfig   |  1 +
 drivers/platform/Makefile  |  1 +
 drivers/platform/chrome/Kconfig| 28 ++
 drivers/platform/chrome/Makefile   |  2 ++
 drivers/platform/{x86 => chrome}/chromeos_laptop.c |  0
 drivers/platform/x86/Kconfig   | 11 -
 drivers/platform/x86/Makefile  |  1 -
 8 files changed, 37 insertions(+), 12 deletions(-)
 create mode 100644 drivers/platform/chrome/Kconfig
 create mode 100644 drivers/platform/chrome/Makefile
 rename drivers/platform/{x86 => chrome}/chromeos_laptop.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 831b8690cf13..07e312a3377b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2129,6 +2129,11 @@ L:   linux-...@vger.kernel.org
 S: Maintained
 F: drivers/usb/chipidea/
 
+CHROME HARDWARE PLATFORM SUPPORT
+M: Olof Johansson 
+S: Maintained
+F: drivers/platform/chrome/
+
 CISCO VIC ETHERNET NIC DRIVER
 M: Christian Benvenuti 
 M: Sujith Sankar 
diff --git a/drivers/platform/Kconfig b/drivers/platform/Kconfig
index 69616aeaa966..09fde58b12e0 100644
--- a/drivers/platform/Kconfig
+++ b/drivers/platform/Kconfig
@@ -5,3 +5,4 @@ if GOLDFISH
 source "drivers/platform/goldfish/Kconfig"
 endif
 
+source "drivers/platform/chrome/Kconfig"
diff --git a/drivers/platform/Makefile b/drivers/platform/Makefile
index 8a44a4cd6d1e..3656b7b17b99 100644
--- a/drivers/platform/Makefile
+++ b/drivers/platform/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_X86)  += x86/
 obj-$(CONFIG_OLPC) += olpc/
 obj-$(CONFIG_GOLDFISH) += goldfish/
+obj-$(CONFIG_CHROME_PLATFORMS) += chrome/
diff --git a/drivers/platform/chrome/Kconfig b/drivers/platform/chrome/Kconfig
new file mode 100644
index ..b13303e75a34
--- /dev/null
+++ b/drivers/platform/chrome/Kconfig
@@ -0,0 +1,28 @@
+#
+# Platform support for Chrome OS hardware (Chromebooks and Chromeboxes)
+#
+
+menuconfig CHROME_PLATFORMS
+   bool "Platform support for Chrome hardware"
+   depends on X86
+   ---help---
+ Say Y here to get to see options for platform support for
+ various Chromebooks and Chromeboxes. This option alone does
+ not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and 
disabled.
+
+if CHROME_PLATFORMS
+
+config CHROMEOS_LAPTOP
+   tristate "Chrome OS Laptop"
+   depends on I2C
+   depends on DMI
+   ---help---
+ This driver instantiates i2c and smbus devices such as
+ light sensors and touchpads.
+
+ If you have a supported Chromebook, choose Y or M here.
+ The module will be called chromeos_laptop.
+
+endif # CHROMEOS_PLATFORMS
diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile
new file mode 100644
index ..015e9195e226
--- /dev/null
+++ b/drivers/platform/chrome/Makefile
@@ -0,0 +1,2 @@
+
+obj-$(CONFIG_CHROMEOS_LAPTOP)  += chromeos_laptop.o
diff --git a/drivers/platform/x86/chromeos_laptop.c 
b/drivers/platform/chrome/chromeos_laptop.c
similarity index 100%
rename from drivers/platform/x86/chromeos_laptop.c
rename to drivers/platform/chrome/chromeos_laptop.c
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index b51a7460cc49..d9dcd37b5a52 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -79,17 +79,6 @@ config ASUS_LAPTOP
 
  If you have an ACPI-compatible ASUS laptop, say Y or M here.
 
-config CHROMEOS_LAPTOP
-   tristate "Chrome OS Laptop"
-   depends on I2C
-   depends on DMI
-   ---help---
- This driver instantiates i2c and smbus devices such as
- light sensors and touchpads.
-
- If you have a supported Chromebook, choose Y or M here.
- The module will be called chromeos_laptop.
-
 config DELL_LAPTOP
tristate "Dell Laptop Extras"
depends on X86
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index 5dbe19324351..f0e6aa407ffb 100644
--- a/drivers/platform/x86/Makefile
+++ b/drivers/platform/x86/Makefile
@@ -50,7 +50,6 @@ obj-$(CONFIG_INTEL_MID_POWER_BUTTON)  += intel_mid_powerbtn.o
 obj-$(CONFIG_INTEL_OAKTRAIL)   += intel_oaktrail.o
 obj-$(CONFIG_SAMSUNG_Q10)  += samsung-q10.o
 obj-$(CONFIG_APPLE_GMUX)   += apple-gmux.o
-obj-$(CONFIG_CHROMEOS_LAPTOP)  += chromeos_laptop.o
 obj-$(CONFIG_INTEL_RST)+= intel-rst.o
 

Re: [PATCH 05/11] ARM: Introduce CPU_METHOD_OF_DECLARE() for cpu hotplug/smp

2013-11-07 Thread Stephen Boyd
On 11/06/13 17:50, Josh Cartwright wrote:
> On Fri, Nov 01, 2013 at 03:08:53PM -0700, Stephen Boyd wrote:
>> diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
>> index f35906b..71a8592 100644
>> --- a/arch/arm/kernel/devtree.c
>> +++ b/arch/arm/kernel/devtree.c
>> @@ -25,6 +25,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>>  {
>> @@ -63,6 +64,36 @@ void __init arm_dt_memblock_reserve(void)
>>  }
>>  }
>>  
>> +#ifdef CONFIG_SMP
>> +extern struct of_cpu_method __cpu_method_of_table[];
>> +
>> +static const struct of_cpu_method __cpu_method_of_table_sentinel
>> +__used __section(__cpu_method_of_table_end);
> Having a sentinel allocated into the linked image makes a lot of sense
> in other cases (IRQCHIP/CLOCKSOURCE_OF_DECLARE, etc), where it's used to
> terminate an of_device_id table (as is expected by of_match_table and
> friends).
>
> In this case, however, you aren't building a match table, so having a
> sentinel allocated isn't necessary.  I'd suggest bookending the table
> with a VMLINUX_SYMBOL(__cpu_method_of_table_end) instead.
>
> A whole 2 pointers worth of savings!
>

Yes, will do. Thanks.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
>   Have a look into irq_work_queue(). There is:
> /*
>  * If the work is not "lazy" or the tick is stopped, raise the irq
>  * work interrupt (if supported by the arch), otherwise, just wait
>  * for the next tick. We do this even for unbound work to make sure
>  * *some* CPU will be doing the work.
>  */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> arch_irq_work_raise();
> }
> 
>   So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

That said I agree that it would be nice to have smp_call_function_many() support
non waiting calls, something based on llist, that would be less deadlock prone
to begin with.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] wire up CPU features to udev based module loading

2013-11-07 Thread Andi Kleen
> - the module aliases host tool has no arch specific dependencies at
> all except having x86cpu as one of the entries: would you mind
> dropping the x86 prefix there? Or rather add dependencies on $ARCH?
> (If we drop it there, we basically end up with 'cpu:' everywhere)

Should be fine.

> - in the vendor/family/model case, it may be preferable to drop these
> fields entirely from certain modules' aliases if they match on 'any'
> (provided that the module tools permit this) rather than add
> architecture, variant, revision, etc  fields for all architectures if
> they can only ever match on one

The module tools require everything matching with the same wild cards.

So I don't know how "any" would work.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo

2013-11-07 Thread Andrew Morton
On Thu, 7 Nov 2013 16:21:32 -0500 Johannes Weiner  wrote:

> > Subject: provide estimated available memory in /proc/meminfo
> > 
> > Many load balancing and workload placing programs check /proc/meminfo
> > to estimate how much free memory is available. They generally do this
> > by adding up "free" and "cached", which was fine ten years ago, but
> > is pretty much guaranteed to be wrong today.
> > 
> > It is wrong because Cached includes memory that is not freeable as
> > page cache, for example shared memory segments, tmpfs, and ramfs,
> > and it does not include reclaimable slab memory, which can take up
> > a large fraction of system memory on mostly idle systems with lots
> > of files.
> > 
> > Currently, the amount of memory that is available for a new workload,
> > without pushing the system into swap, can be estimated from MemFree,
> > Active(file), Inactive(file), and SReclaimable, as well as the "low"
> > watermarks from /proc/zoneinfo.
> > 
> > However, this may change in the future, and user space really should
> > not be expected to know kernel internals to come up with an estimate
> > for the amount of free memory.
> > 
> > It is more convenient to provide such an estimate in /proc/meminfo.
> > If things change in the future, we only have to change it in one place.
> > 
> > Signed-off-by: Rik van Riel 
> > Reported-by: Erik Mouw 
> 
> Acked-by: Johannes Weiner 
> 
> I have a suspicion that people will end up relying on this number to
> start new workloads in situations where lots of the page cache is
> actually heavily used.  We might not swap, but there will still be IO
> from thrashing cache.
> 
> Maybe we'll have to subtract mapped cache pages in the future to
> mitigate this risk somehow...
> 
> Anyway, we can defer this to when it's proven to be an actual problem.

Well not really.  Once we release this thing with a particular
implementation, we are constrained in making any later changes.  If we
change it to produce larger numbers, someone's workload will start
swapping.  If we change it to produce smaller numbers, someone's
workload will refuse to start.

It all needs a bit of thought, and even some testing!  I labelled this
one for-3.14.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 11:19:04PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> > But then, who's going to process that work if every CPUs is idle?
>   Have a look into irq_work_queue(). There is:
> /*
>  * If the work is not "lazy" or the tick is stopped, raise the irq
>  * work interrupt (if supported by the arch), otherwise, just wait
>  * for the next tick. We do this even for unbound work to make sure
>  * *some* CPU will be doing the work.
>  */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> arch_irq_work_raise();
> }
> 
>   So we raise an interrupt if there would be no timer ticking (which is
> what I suppose you mean by "CPU is idle"). That is nothing changed by my
> patches...

Ok but we raise that interrupt locally, not to the other CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/5] MCS Lock: Barrier corrections

2013-11-07 Thread Peter Zijlstra
On Thu, Nov 07, 2013 at 01:15:51PM -0800, Tim Chen wrote:
> Michel, are you planning to do an implementation of
> load-acquire/store-release functions of various architectures?

A little something like this:
http://marc.info/?l=linux-arch=138386254111507

It so happens we were working on that the past week or so due to another
issue ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 - V2] Introducing Device Tree Overlays

2013-11-07 Thread Guenter Roeck
On Thu, Nov 07, 2013 at 08:25:58PM +0100, Sebastian Andrzej Siewior wrote:
> On 06.11.13, Guenter Roeck wrote:
> |…
> thanks for the explanation.
> 
> > We use DT overlays to describe the hardware on those boards and, if 
> > necessary,
> > its configuration. For example, if there is a PCIe switch, the overlay would
> > describe its memory and bus number configuration.
> 
> So have your "fix" configuration and a few overlays you switch at
> runtime. The problem you have is that you want to switch a specific part
> if your configuration at runtime. I assume you run DT on ARM. What
> happens if you swtich from ARM to x86 and you "keep" your FPGA
> configuration requirement? You can't use both, DT and ACPI, right? So
> what happens then?
> 
We intend to use DT on x86 to augment ACPI data. There is a variety of reasons
why we can not use ACPI, nor do we want to as we prefer a single method for
handling OIR on all platforms. FWIW, the non-x86 platform is powerpc, not arm.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG][ext2] XIP does not work on ext2

2013-11-07 Thread Jan Kara
On Thu 07-11-13 13:50:09, Andiry Xu wrote:
> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara  wrote:
> > On Thu 07-11-13 12:14:13, Andiry Xu wrote:
> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara  wrote:
> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
> >> >> >> Do you know the reason why write() outperforms mmap() in some cases? 
> >> >> >> I
> >> >> >> know it's not related the thread but I really appreciate if you can
> >> >> >> answer my question.
> >> >> >   Well, I'm not completely sure. mmap()ed memory always works on 
> >> >> > page-by-page
> >> >> > basis - you first access the page, it gets faulted in and you can 
> >> >> > further
> >> >> > access it. So for small (sub page size) accesses this is a win 
> >> >> > because you
> >> >> > don't have an overhead of syscall and fs write path. For accesses 
> >> >> > larger
> >> >> > than page size the overhead of syscall and some initial checks is well
> >> >> > hidden by other things. I guess write() ends up being more efficient
> >> >> > because write path taken for each page is somewhat lighter than full 
> >> >> > page
> >> >> > fault. But you'd need to look into perf data to get some hard numbers 
> >> >> > on
> >> >> > where the time is spent.
> >> >> >
> >> >>
> >> >> Thanks for the reply. However I have filled up the whole RAM disk
> >> >> before doing the test, i.e. asked the brd driver to allocate all the
> >> >> pages initially.
> >> >   Well, pages in ramdisk are always present, that's not an issue. But you
> >> > will get a page fault to map a particular physical page in process'
> >> > virtual address space when you first access that virtual address in the
> >> > mapping from the process. The cost of setting up this virtual->physical
> >> > mapping is what I'm talking about.
> >> >
> >>
> >> Yes, you are right, there are page faults observed with perf. I
> >> misunderstood page fault as copying pages between backing store and
> >> physical memory.
> >>
> >> > If you had a process which first mmaps the file and writes to all pages 
> >> > in
> >> > the mapping and *then* measure the cost of another round of writing to 
> >> > the
> >> > mapping, I would expect you should see speeds close to those of memory 
> >> > bus.
> >> >
> >>
> >> I've tried this as well. mmap() performance improves but still not as
> >> good as write().
> >> I used the perf report to compare write() and mmap() applications. For
> >> write() version, top of perf report shows as:
> >> 33.33%  __copy_user_nocache
> >> 4.72%ext2_get_blocks
> >> 4.42%mutex_unlock
> >> 3.59%__find_get_block
> >>
> >> which looks reasonable.
> >>
> >> However, for mmap() version, the perf report looks strange:
> >> 94.98% libc-2.15.so   [.] 0x0014698d
> >> 2.25%   page_fault
> >> 0.18%   handle_mm_fault
> >>
> >> I don't know what the first item is but it took the majority of cycles.
> >   The first item means that it's some userspace code in libc. My guess
> > would be that it's libc's memcpy() function (or whatever you use to write
> > to mmap). How do you access the mmap?
> >
> 
> Like this:
> 
> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
> for (i = 0; i < count; i++)
> {
>memcpy(dest, src, request_size);
>dest += request_size;
> }
  OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how
to tune that though...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Jan Kara
On Thu 07-11-13 23:13:39, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara :
> > Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
> > processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
> > simple and we don't have to pick any particular cpu to do the work. We
> > just do the work from a timer tick on whichever cpu it happens first.
> > This is useful as a lightweight and simple code path without locking or
> > other dependencies to offload work to other cpu if possible.
> >
> > We will use this type of irq work to make a guarantee of forward
> > progress of printing to a (serial) console when printing on one cpu
> > would cause interrupts to be disabled for too long.
> >
> > Signed-off-by: Jan Kara 
> > ---
> >  include/linux/irq_work.h |  2 ++
> >  kernel/irq_work.c| 41 +
> >  2 files changed, 27 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > index 66017028dcb3..ca07a16355ed 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -16,6 +16,8 @@
> >  #define IRQ_WORK_BUSY  2UL
> >  #define IRQ_WORK_FLAGS 3UL
> >  #define IRQ_WORK_LAZY  4UL /* Doesn't want IPI, wait for tick */
> > +#define __IRQ_WORK_UNBOUND  8UL /* Use IRQ_WORK_UNBOUND instead! */
> > +#define IRQ_WORK_UNBOUND   (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any 
> > cpu can process this work */
> >
> >  struct irq_work {
> > unsigned long flags;
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index 55fcce6065cf..b06350b63c67 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -22,6 +22,9 @@
> >  static DEFINE_PER_CPU(struct llist_head, irq_work_list);
> >  static DEFINE_PER_CPU(int, irq_work_raised);
> >
> > +/* List of irq-work any CPU can pick up */
> > +static LLIST_HEAD(unbound_irq_work_list);
> > +
> >  /*
> >   * Claim the entry so that no one else will poke at it.
> >   */
> > @@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
> > /* Queue the entry and raise the IPI if needed. */
> > preempt_disable();
> >
> > -   llist_add(>llnode, &__get_cpu_var(irq_work_list));
> > +   if (work->flags & __IRQ_WORK_UNBOUND)
> > +   llist_add(>llnode, _irq_work_list);
> > +   else
> > +   llist_add(>llnode, &__get_cpu_var(irq_work_list));
> >
> > /*
> >  * If the work is not "lazy" or the tick is stopped, raise the irq
> >  * work interrupt (if supported by the arch), otherwise, just wait
> > -* for the next tick.
> > +* for the next tick. We do this even for unbound work to make sure
> > +* *some* CPU will be doing the work.
> >  */
> > if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> > if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> > @@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
> > return true;
> >  }
> >
> > -static void __irq_work_run(void)
> > +static void process_irq_work_list(struct llist_head *llhead)
> >  {
> > unsigned long flags;
> > struct irq_work *work;
> > -   struct llist_head *this_list;
> > struct llist_node *llnode;
> >
> > -
> > -   /*
> > -* Reset the "raised" state right before we check the list because
> > -* an NMI may enqueue after we find the list empty from the runner.
> > -*/
> > -   __this_cpu_write(irq_work_raised, 0);
> > -   barrier();
> > -
> > -   this_list = &__get_cpu_var(irq_work_list);
> > -   if (llist_empty(this_list))
> > +   if (llist_empty(llhead))
> > return;
> >
> > BUG_ON(!irqs_disabled());
> > -
> > -   llnode = llist_del_all(this_list);
> > +   llnode = llist_del_all(llhead);
> > while (llnode != NULL) {
> > work = llist_entry(llnode, struct irq_work, llnode);
> >
> > @@ -146,6 +142,19 @@ static void __irq_work_run(void)
> > }
> >  }
> >
> > +static void __irq_work_run(void)
> > +{
> > +   /*
> > +* Reset the "raised" state right before we check the list because
> > +* an NMI may enqueue after we find the list empty from the runner.
> > +*/
> > +   __this_cpu_write(irq_work_raised, 0);
> > +   barrier();
> > +
> > +   process_irq_work_list(&__get_cpu_var(irq_work_list));
> > +   process_irq_work_list(_irq_work_list);
> > +}
> > +
> 
> But then, who's going to process that work if every CPUs is idle?
  Have a look into irq_work_queue(). There is:
/*
 * If the work is not "lazy" or the tick is stopped, raise the irq
 * work interrupt (if supported by the arch), otherwise, just wait
 * for the next tick. We do this even for unbound work to make sure
 * *some* CPU will be doing the work.
 */
if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
 

Re: [RFC PATCH 0/4] wire up CPU features to udev based module loading

2013-11-07 Thread Ard Biesheuvel
On 7 November 2013 22:39, Andi Kleen  wrote:
> On Thu, Nov 07, 2013 at 01:09:41PM -0800, H. Peter Anvin wrote:
>> On 11/07/2013 09:17 AM, Ard Biesheuvel wrote:
>> > This series implements automatic module loading based on optional CPU 
>> > features,
>> > and tries to do so in a generic way. Currently, 32 feature bits are 
>> > supported,
>> > and how they map to actual CPU features is entirely up to the architecture.
>>
>> NAK.
>>
>> We in the x86 world already left 32 bits way behind; we currently have
>> 320 bit feature masks.
>>
>> If you're aiming at doing this in a generic way, it needs to be able to
>> accommodate the current x86cpu feature stuff as a subset, which this
>> doesn't.
>
> They can just use the exact same code/macros as x86cpu, just need a different
> prefix and use wildcards if they miss something (e.g. family)
>

That would involve repurposing/generalizing a bit more of the existing
x86-only code than I did the first time around, but if you (as x86
maintainers) are happy with that, I'm all for it.

I do have a couple of questions then
- the module aliases host tool has no arch specific dependencies at
all except having x86cpu as one of the entries: would you mind
dropping the x86 prefix there? Or rather add dependencies on $ARCH?
(If we drop it there, we basically end up with 'cpu:' everywhere)
- in the vendor/family/model case, it may be preferable to drop these
fields entirely from certain modules' aliases if they match on 'any'
(provided that the module tools permit this) rather than add
architecture, variant, revision, etc  fields for all architectures if
they can only ever match on one
- some of the X86_ macros would probable be redefined in terms of the
generic macros rather than the other way around, which would result in
some changes under arch/x86 as well, is that acceptable for you?

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] rbtree: Fix rbtree_postorder_for_each_entry_safe() iterator

2013-11-07 Thread Jan Kara
On Thu 07-11-13 13:38:00, Andrew Morton wrote:
> On Wed,  6 Nov 2013 17:42:30 -0800 Cody P Schafer  
> wrote:
> 
> > The iterator rbtree_postorder_for_each_entry_safe() relies on pointer
> > underflow behavior when testing for loop termination. In particular
> > it expects that
> >   _entry(NULL, type, field)->field
> > is NULL. But the result of this expression is not defined by a C standard
> > and some gcc versions (e.g. 4.3.4) assume the above expression can never
> > be equal to NULL. The net result is an oops because the iteration is not
> > properly terminated.
> > 
> > Fix the problem by modifying the iterator to avoid pointer underflows.
> 
> So the sole caller is in zswap.c.  Is that code actually generating oopses?
  Oh, I didn't know there is any user of that iterator already in tree. Let
me check... Umm, looking at the disassembly of
zswap_frontswap_invalidate_aread:
   0x8112c9a5 <+37>:mov%r13,%rdi
   0x8112c9a8 <+40>:callq  0x81227620 
   0x8112c9ad <+45>:mov%rax,%rdi
   0x8112c9b0 <+48>:mov%rax,%rbx
   0x8112c9b3 <+51>:callq  0x812275d0 
   0x8112c9b8 <+56>:mov%rax,%r12
   0x8112c9bb <+59>:nopl   0x0(%rax,%rax,1)
   0x8112c9c0 <+64>:mov0x28(%rbx),%rsi
   0x8112c9c4 <+68>:mov0x40(%r13),%rdi
   0x8112c9c8 <+72>:callq  0x811352b0 
   0x8112c9cd <+77>:mov0x1105504(%rip),%rdi
   0x8112c9d4 <+84>:mov%rbx,%rsi
   0x8112c9d7 <+87>:callq  0x81130b80 
   0x8112c9dc <+92>:lock decl 0x110539d(%rip)
   0x8112c9e3 <+99>:mov%r12,%rdi
   0x8112c9e6 <+102>:   mov%r12,%rbx
   0x8112c9e9 <+105>:   callq  0x812275d0 
   0x8112c9ee <+110>:   mov%rax,%r12
   0x8112c9f1 <+113>:   jmp 0x8112c9c0 


So my gcc helpfully compiled that iterator into an endless loop as well,
although now it is a perfectly valid C code.  According to our gcc guys
that was a bug in some gcc versions which is already fixed.  But anyway
pushing my patch to 3.12 or anything that actually uses that iterator will
probably save us some bug reports.

Honza

-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Frederic Weisbecker
2013/11/7 Jan Kara :
> Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
> processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
> simple and we don't have to pick any particular cpu to do the work. We
> just do the work from a timer tick on whichever cpu it happens first.
> This is useful as a lightweight and simple code path without locking or
> other dependencies to offload work to other cpu if possible.
>
> We will use this type of irq work to make a guarantee of forward
> progress of printing to a (serial) console when printing on one cpu
> would cause interrupts to be disabled for too long.
>
> Signed-off-by: Jan Kara 
> ---
>  include/linux/irq_work.h |  2 ++
>  kernel/irq_work.c| 41 +
>  2 files changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 66017028dcb3..ca07a16355ed 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -16,6 +16,8 @@
>  #define IRQ_WORK_BUSY  2UL
>  #define IRQ_WORK_FLAGS 3UL
>  #define IRQ_WORK_LAZY  4UL /* Doesn't want IPI, wait for tick */
> +#define __IRQ_WORK_UNBOUND  8UL /* Use IRQ_WORK_UNBOUND instead! */
> +#define IRQ_WORK_UNBOUND   (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any 
> cpu can process this work */
>
>  struct irq_work {
> unsigned long flags;
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 55fcce6065cf..b06350b63c67 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -22,6 +22,9 @@
>  static DEFINE_PER_CPU(struct llist_head, irq_work_list);
>  static DEFINE_PER_CPU(int, irq_work_raised);
>
> +/* List of irq-work any CPU can pick up */
> +static LLIST_HEAD(unbound_irq_work_list);
> +
>  /*
>   * Claim the entry so that no one else will poke at it.
>   */
> @@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
> /* Queue the entry and raise the IPI if needed. */
> preempt_disable();
>
> -   llist_add(>llnode, &__get_cpu_var(irq_work_list));
> +   if (work->flags & __IRQ_WORK_UNBOUND)
> +   llist_add(>llnode, _irq_work_list);
> +   else
> +   llist_add(>llnode, &__get_cpu_var(irq_work_list));
>
> /*
>  * If the work is not "lazy" or the tick is stopped, raise the irq
>  * work interrupt (if supported by the arch), otherwise, just wait
> -* for the next tick.
> +* for the next tick. We do this even for unbound work to make sure
> +* *some* CPU will be doing the work.
>  */
> if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> @@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
> return true;
>  }
>
> -static void __irq_work_run(void)
> +static void process_irq_work_list(struct llist_head *llhead)
>  {
> unsigned long flags;
> struct irq_work *work;
> -   struct llist_head *this_list;
> struct llist_node *llnode;
>
> -
> -   /*
> -* Reset the "raised" state right before we check the list because
> -* an NMI may enqueue after we find the list empty from the runner.
> -*/
> -   __this_cpu_write(irq_work_raised, 0);
> -   barrier();
> -
> -   this_list = &__get_cpu_var(irq_work_list);
> -   if (llist_empty(this_list))
> +   if (llist_empty(llhead))
> return;
>
> BUG_ON(!irqs_disabled());
> -
> -   llnode = llist_del_all(this_list);
> +   llnode = llist_del_all(llhead);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> @@ -146,6 +142,19 @@ static void __irq_work_run(void)
> }
>  }
>
> +static void __irq_work_run(void)
> +{
> +   /*
> +* Reset the "raised" state right before we check the list because
> +* an NMI may enqueue after we find the list empty from the runner.
> +*/
> +   __this_cpu_write(irq_work_raised, 0);
> +   barrier();
> +
> +   process_irq_work_list(&__get_cpu_var(irq_work_list));
> +   process_irq_work_list(_irq_work_list);
> +}
> +

But then, who's going to process that work if every CPUs is idle?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] of: irq: Fix interrupt-map entry matching

2013-11-07 Thread Tomasz Figa
On Thursday 07 of November 2013 10:40:16 Rob Herring wrote:
> On Thu, Nov 7, 2013 at 5:32 AM, Tomasz Figa  wrote:
> > Hi Grant,
> > 
> > Could you pick this patch up? It fixes boot-up at least on several
> > Exynos based platforms, which use interrupt-map nodes with
> > #interrupt-cells higher than 1.
> > 
> > Also please disregard patch 2/2, as your fix that has been merged
> > seems
> > to be fine.
> 
> I've applied the 1st patch.

Thanks Rob.

Best regards,
Tomasz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 9/9] ARM: add initial support for Marvell Berlin SoCs

2013-11-07 Thread Arnd Bergmann
On Thursday 07 November 2013, Sebastian Hesselbarth wrote:
> I haven't looked deeper into this, but I guess it will not be hard
> to make ARM_TWD independent of SMP.

Yes, I agree. Just make sure to look at the list archives to see if someone
already did that work.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] ARM: pinctrl: Add Broadcom Capri pinctrl driver

2013-11-07 Thread Sherman Yin

On 13-11-06 09:00 AM, Stephen Warren wrote:

You probably don't want to reference the individual xxx1/2/3 nodes in
the client pinctrl properties, but instead wrap them in a higher-level
node that represents the whole pinctrl state. Then, the client pinctrl
properties can reference just that single parent node, instead of many
small nodes. In other words:

pinctrl@... {
...
sx: state_xxx {
xxx1 { ... };
xxx2 { ... };
xxx3 { ... };
};
sy: state_yyy {
yyy1 { ... };
yyy2 { ... };
};
}

some_client@... {
...
pinctrl-names = "default";
pinctrl-0 = <>;
};

other_client@... {
...
pinctrl-names = "default";
pinctrl-0 = <>;
};

rather than:

pinctrl@... {
...
sx1: xxx1 { ... };
sx2: xxx2 { ... };
sx3: xxx3 { ... };
sy1: yyy1 { ... };
sy2: yyy2 { ... };
}

some_client@... {
...
pinctrl-names = "default";
pinctrl-0 = <  >;
};

other_client@... {
...
pinctrl-names = "default";
pinctrl-0 = < >;
};

This is exactly how the Tegra pinctrl bindings work for example.


Ok, right, I mistakenly thought the "xxx1" nodes are pin config nodes. 
Actually that's the way my original driver works as well, other than the 
fact that I don't have as many "xxx1" type nodes as decribed in the 
"xxx" node below.



This works fine.  However, I'm just thinking that
it would have been easier if we could specify just one node:

xxx {
 pins = , , ;
 function = <...>;
 pull-up = <1 1 0>;
}

This "feature" seems a bit more concise to me and is what I did for my
original pinctrl driver.  The only downside is that with this method,
one cannot specify "don't touch this option for this pin" if the same
property must provide values for other pins.


The other downside is that if the lists get even slightly long, it get
really hard to match up the entries in the t properties.


Agree that it would start to get difficult to read if a subnode has too 
many pins.  I guess the solution would be to somehow split up the pins 
to more subnodes with fewer pins each.


Regards,
Sherman



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] rbtree: Fix rbtree_postorder_for_each_entry_safe() iterator

2013-11-07 Thread Cody P Schafer

On 11/07/2013 01:38 PM, Andrew Morton wrote:

On Wed,  6 Nov 2013 17:42:30 -0800 Cody P Schafer  
wrote:


The iterator rbtree_postorder_for_each_entry_safe() relies on pointer
underflow behavior when testing for loop termination. In particular
it expects that
   _entry(NULL, type, field)->field
is NULL. But the result of this expression is not defined by a C standard
and some gcc versions (e.g. 4.3.4) assume the above expression can never
be equal to NULL. The net result is an oops because the iteration is not
properly terminated.

Fix the problem by modifying the iterator to avoid pointer underflows.


So the sole caller is in zswap.c.  Is that code actually generating oopses?


I can't reproduce the oopses (at all) with my build/gcc version, but Jan 
has reported seeing them (not in zswap, however).




IOW, is there any need to fix this in 3.12 or earlier?



The zswap usage change showed up in 3.12.
In my opinion, it is probably a good idea to apply the fix to 3.12.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

2013-11-07 Thread Alex Thorlton
On Wed, Oct 16, 2013 at 10:54:29AM -0500, Alex Thorlton wrote:
> Hi guys,
> 
> I ran into a bug a week or so ago, that I believe has something to do
> with NUMA balancing, but I'm having a tough time tracking down exactly
> what is causing it.  When running with the following configuration
> options set:
> 
> CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
> CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
> CONFIG_NUMA_BALANCING=y
> # CONFIG_HUGETLBFS is not set
> # CONFIG_HUGETLB_PAGE is not set
> 
> I get intermittent segfaults when running the memscale test that we've
> been using to test some of the THP changes.  Here's a link to the test:
> 
> ftp://shell.sgi.com/collect/memscale/

For anyone who's interested, this test has been moved to:

http://oss.sgi.com/projects/memtests/thp_memscale.tar.gz

It should remain there permanently.

> 
> I typically run the test with a line similar to this:
> 
> ./thp_memscale -C 0 -m 0 -c  -b 
> 
> Where  is the number of cores to spawn threads on, and 
> is the amount of memory to reserve from each core.  The  field
> can accept values like 512m or 1g, etc.  I typically run 256 cores and
> 512m, though I think the problem should be reproducable on anything with
> 128+ cores.
> 
> The test never seems to have any problems when running with hugetlbfs
> on and NUMA balancing off, but it segfaults every once in a while with
> the config options above.  It seems to occur more frequently, the more
> cores you run on.  It segfaults on about 50% of the runs at 256 cores,
> and on almost every run at 512 cores.  The fewest number of cores I've
> seen a segfault on has been 128, though it seems to be rare on this many
> cores.
> 
> At this point, I'm not familiar enough with NUMA balancing code to know
> what could be causing this, and we don't typically run with NUMA
> balancing on, so I don't see this in my everyday testing, but I felt
> that it was definitely worth bringing up.
> 
> If anybody has any ideas of where I could poke around to find a
> solution, please let me know.
> 
> - Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] printk: Use unbound irq work for printing and waking

2013-11-07 Thread Jan Kara
Now when per-cpu printk buffers are gone, there's no need to have printk
flags or printk irq_work per cpu. Just make printk_pending a single
variable operated by atomic operations and have single unbound irq work
doing the waking and printing. This has an advantage that any cpu can do the
printing / wakeup work thus lowering the latency of printing and better
distributing the printing work over cpus.

Reviewed-by: Steven Rostedt 
Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 47 ++-
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 35bb70ea6427..dc314f074e8f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2499,36 +2499,41 @@ late_initcall(printk_late_init);
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-#define PRINTK_PENDING_WAKEUP  0x01
-#define PRINTK_PENDING_OUTPUT  0x02
+#define PRINTK_PENDING_WAKEUP  0
+#define PRINTK_PENDING_OUTPUT  1
 
-static DEFINE_PER_CPU(int, printk_pending);
+static unsigned long printk_pending;
 
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
+static void printk_irq_work_func(struct irq_work *irq_work)
 {
-   int pending = __this_cpu_xchg(printk_pending, 0);
+   if (printk_pending) {
+   unsigned long pending = xchg(_pending, 0);
 
-   if (pending & PRINTK_PENDING_OUTPUT) {
-   /* If trylock fails, someone else is doing the printing */
-   if (console_trylock())
-   console_unlock();
-   }
+   if (test_bit(PRINTK_PENDING_OUTPUT, )) {
+   /*
+* If trylock fails, someone else is doing the
+* printing
+*/
+   if (console_trylock())
+   console_unlock();
+   }
 
-   if (pending & PRINTK_PENDING_WAKEUP)
-   wake_up_interruptible(_wait);
+   if (test_bit(PRINTK_PENDING_WAKEUP, ))
+   wake_up_interruptible(_wait);
+   }
 }
 
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-   .func = wake_up_klogd_work_func,
-   .flags = IRQ_WORK_LAZY,
+static struct irq_work printk_irq_work = {
+   .func = printk_irq_work_func,
+   .flags = IRQ_WORK_UNBOUND,
 };
 
 void wake_up_klogd(void)
 {
preempt_disable();
if (waitqueue_active(_wait)) {
-   this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-   irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+   set_bit(PRINTK_PENDING_WAKEUP, _pending);
+   irq_work_queue(_irq_work);
}
preempt_enable();
 }
@@ -2542,8 +2547,8 @@ int printk_sched(const char *fmt, ...)
r = vprintk_emit(0, -2, NULL, 0, fmt, args);
va_end(args);
 
-   __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-   irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+   set_bit(PRINTK_PENDING_OUTPUT, _pending);
+   irq_work_queue(_irq_work);
 
return r;
 }
@@ -2556,8 +2561,8 @@ void console_unlock(void)
 {
if (__console_unlock()) {
/* Leave the rest of printing for a timer tick */
-   __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-   irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+   set_bit(PRINTK_PENDING_OUTPUT, _pending);
+   irq_work_queue(_irq_work);
}
 }
 EXPORT_SYMBOL(console_unlock);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Jan Kara
A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because interrupt from it wasn't
served for too long. This is why just silencing watchdogs isn't a
reliable solution to the problem and we simply have to avoid spending
too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over X characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. User can tune the value X via
printk.offload_chars kernel parameter.

Reviewed-by: Steven Rostedt 
Signed-off-by: Jan Kara 
---
 Documentation/kernel-parameters.txt | 17 ++
 kernel/printk/printk.c  | 68 +++--
 2 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index fcbb736d55fe..579815bd90eb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2540,6 +2540,23 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
Format:   (1/Y/y=enable, 0/N/n=disable)
default: disabled
 
+   printk.offload_chars=
+   Printing to console can be relatively slow especially
+   in case of serial console. When there is intensive
+   printing happening from several cpus (as is the case
+   during boot), a cpu can be spending significant time
+   (seconds or more) doing printing. To avoid softlockups,
+   lost interrupts, and similar problems a cpu stops
+   printing to console after printing
+   'printk.offload_chars' characters.  Another cpu will
+   then continue printing. Higher value means possibly
+   longer interrupt and other latencies but lower latency
+   of printing and lower chance something goes wrong
+   during system crash and important message is not
+   printed.
+   Format: 
+   default: 0 (disabled)
+
printk.time=Show timing data prefixed to each printk message line
Format:   (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 572fb1f438b6..35bb70ea6427 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -253,6 +253,18 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work (0 means infinity). Tunable via
+ * /proc/sys/kernel/printk_offload_chars.
+ */
+static unsigned int __read_mostly printk_offload_chars = 0;
+
+module_param_named(offload_chars, printk_offload_chars, uint,
+  S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
+   " cpu after this number of characters");
+
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -2026,31 +2038,43 @@ out:
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+   /* Oops? Print everything now to maximize chances user will see it */
+   if (oops_in_progress)
+   return false;
+   return printk_offload_chars && printed_chars > printk_offload_chars;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is 

Re: [BUG][ext2] XIP does not work on ext2

2013-11-07 Thread Andiry Xu
On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara  wrote:
> On Thu 07-11-13 12:14:13, Andiry Xu wrote:
>> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara  wrote:
>> > On Tue 05-11-13 17:28:35, Andiry Xu wrote:
>> >> >> Do you know the reason why write() outperforms mmap() in some cases? I
>> >> >> know it's not related the thread but I really appreciate if you can
>> >> >> answer my question.
>> >> >   Well, I'm not completely sure. mmap()ed memory always works on 
>> >> > page-by-page
>> >> > basis - you first access the page, it gets faulted in and you can 
>> >> > further
>> >> > access it. So for small (sub page size) accesses this is a win because 
>> >> > you
>> >> > don't have an overhead of syscall and fs write path. For accesses larger
>> >> > than page size the overhead of syscall and some initial checks is well
>> >> > hidden by other things. I guess write() ends up being more efficient
>> >> > because write path taken for each page is somewhat lighter than full 
>> >> > page
>> >> > fault. But you'd need to look into perf data to get some hard numbers on
>> >> > where the time is spent.
>> >> >
>> >>
>> >> Thanks for the reply. However I have filled up the whole RAM disk
>> >> before doing the test, i.e. asked the brd driver to allocate all the
>> >> pages initially.
>> >   Well, pages in ramdisk are always present, that's not an issue. But you
>> > will get a page fault to map a particular physical page in process'
>> > virtual address space when you first access that virtual address in the
>> > mapping from the process. The cost of setting up this virtual->physical
>> > mapping is what I'm talking about.
>> >
>>
>> Yes, you are right, there are page faults observed with perf. I
>> misunderstood page fault as copying pages between backing store and
>> physical memory.
>>
>> > If you had a process which first mmaps the file and writes to all pages in
>> > the mapping and *then* measure the cost of another round of writing to the
>> > mapping, I would expect you should see speeds close to those of memory bus.
>> >
>>
>> I've tried this as well. mmap() performance improves but still not as
>> good as write().
>> I used the perf report to compare write() and mmap() applications. For
>> write() version, top of perf report shows as:
>> 33.33%  __copy_user_nocache
>> 4.72%ext2_get_blocks
>> 4.42%mutex_unlock
>> 3.59%__find_get_block
>>
>> which looks reasonable.
>>
>> However, for mmap() version, the perf report looks strange:
>> 94.98% libc-2.15.so   [.] 0x0014698d
>> 2.25%   page_fault
>> 0.18%   handle_mm_fault
>>
>> I don't know what the first item is but it took the majority of cycles.
>   The first item means that it's some userspace code in libc. My guess
> would be that it's libc's memcpy() function (or whatever you use to write
> to mmap). How do you access the mmap?
>

Like this:

fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755);
dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
for (i = 0; i < count; i++)
{
   memcpy(dest, src, request_size);
   dest += request_size;
}

Thanks,
Andiry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] irq_work: Provide a irq work that can be processed on any cpu

2013-11-07 Thread Jan Kara
Provide new irq work flag - IRQ_WORK_UNBOUND - meaning that can be
processed on any cpu. This flag implies IRQ_WORK_LAZY so that things are
simple and we don't have to pick any particular cpu to do the work. We
just do the work from a timer tick on whichever cpu it happens first.
This is useful as a lightweight and simple code path without locking or
other dependencies to offload work to other cpu if possible.

We will use this type of irq work to make a guarantee of forward
progress of printing to a (serial) console when printing on one cpu
would cause interrupts to be disabled for too long.

Signed-off-by: Jan Kara 
---
 include/linux/irq_work.h |  2 ++
 kernel/irq_work.c| 41 +
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 66017028dcb3..ca07a16355ed 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -16,6 +16,8 @@
 #define IRQ_WORK_BUSY  2UL
 #define IRQ_WORK_FLAGS 3UL
 #define IRQ_WORK_LAZY  4UL /* Doesn't want IPI, wait for tick */
+#define __IRQ_WORK_UNBOUND  8UL /* Use IRQ_WORK_UNBOUND instead! */
+#define IRQ_WORK_UNBOUND   (__IRQ_WORK_UNBOUND | IRQ_WORK_LAZY) /* Any cpu 
can process this work */
 
 struct irq_work {
unsigned long flags;
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 55fcce6065cf..b06350b63c67 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -22,6 +22,9 @@
 static DEFINE_PER_CPU(struct llist_head, irq_work_list);
 static DEFINE_PER_CPU(int, irq_work_raised);
 
+/* List of irq-work any CPU can pick up */
+static LLIST_HEAD(unbound_irq_work_list);
+
 /*
  * Claim the entry so that no one else will poke at it.
  */
@@ -70,12 +73,16 @@ void irq_work_queue(struct irq_work *work)
/* Queue the entry and raise the IPI if needed. */
preempt_disable();
 
-   llist_add(>llnode, &__get_cpu_var(irq_work_list));
+   if (work->flags & __IRQ_WORK_UNBOUND)
+   llist_add(>llnode, _irq_work_list);
+   else
+   llist_add(>llnode, &__get_cpu_var(irq_work_list));
 
/*
 * If the work is not "lazy" or the tick is stopped, raise the irq
 * work interrupt (if supported by the arch), otherwise, just wait
-* for the next tick.
+* for the next tick. We do this even for unbound work to make sure
+* *some* CPU will be doing the work.
 */
if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
@@ -100,28 +107,17 @@ bool irq_work_needs_cpu(void)
return true;
 }
 
-static void __irq_work_run(void)
+static void process_irq_work_list(struct llist_head *llhead)
 {
unsigned long flags;
struct irq_work *work;
-   struct llist_head *this_list;
struct llist_node *llnode;
 
-
-   /*
-* Reset the "raised" state right before we check the list because
-* an NMI may enqueue after we find the list empty from the runner.
-*/
-   __this_cpu_write(irq_work_raised, 0);
-   barrier();
-
-   this_list = &__get_cpu_var(irq_work_list);
-   if (llist_empty(this_list))
+   if (llist_empty(llhead))
return;
 
BUG_ON(!irqs_disabled());
-
-   llnode = llist_del_all(this_list);
+   llnode = llist_del_all(llhead);
while (llnode != NULL) {
work = llist_entry(llnode, struct irq_work, llnode);
 
@@ -146,6 +142,19 @@ static void __irq_work_run(void)
}
 }
 
+static void __irq_work_run(void)
+{
+   /*
+* Reset the "raised" state right before we check the list because
+* an NMI may enqueue after we find the list empty from the runner.
+*/
+   __this_cpu_write(irq_work_raised, 0);
+   barrier();
+
+   process_irq_work_list(&__get_cpu_var(irq_work_list));
+   process_irq_work_list(_irq_work_list);
+}
+
 /*
  * Run the irq_work entries on this cpu. Requires to be ran from hardirq
  * context with local IRQs disabled.
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] printk: Remove separate printk_sched buffers and use printk buf instead

2013-11-07 Thread Jan Kara
From: Steven Rostedt 

To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.

What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.

There's a couple of issues with this approach.

1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.

2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.

In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.

Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the
up(_sem) may do a wake up of any pending waiters. This must be
avoided while holding the logbuf_lock.

Luckily, there's not many places that do the unlock, or hold the
logbuf_lock. By moving things around a little, the console_sem can be
released without ever holding the logbuf_lock, and we can safely have
printk_sched() use the printk buffer directly.

Signed-off-by: Steven Rostedt 
Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 86 +++---
 1 file changed, 54 insertions(+), 32 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b4e8500afdb3..572fb1f438b6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -208,6 +208,9 @@ struct printk_log {
 /*
  * The logbuf_lock protects kmsg buffer, indices, counters. It is also
  * used in interesting ways to provide interlocking in console_unlock();
+ * This can be taken within the scheduler's rq lock. It must be released
+ * before calling console_unlock() or anything else that might wake up
+ * a process.
  */
 static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
@@ -1343,27 +1346,43 @@ static inline int can_use_console(unsigned int cpu)
  * interrupts disabled. It should return with 'lockbuf_lock'
  * released but interrupts still disabled.
  */
-static int console_trylock_for_printk(unsigned int cpu)
+static int console_trylock_for_printk(unsigned int cpu, bool in_sched)
__releases(_lock)
 {
int retval = 0, wake = 0;
 
-   if (console_trylock()) {
-   retval = 1;
+   /* if called from the scheduler, we can not call up() */
+   if (in_sched)
+   goto out;
 
-   /*
-* If we can't use the console, we need to release
-* the console semaphore by hand to avoid flushing
-* the buffer. We need to hold the console semaphore
-* in order to do this test safely.
-*/
-   if (!can_use_console(cpu)) {
-   console_locked = 0;
-   wake = 1;
-   retval = 0;
-   }
+   if (down_trylock(_sem))
+   goto out;
+
+   /*
+* If we can't use the console, we need to release
+* the console semaphore by hand to avoid flushing
+* the buffer. We need to hold the console semaphore
+* in order to do this test safely.
+*/
+   if (console_suspended || !can_use_console(cpu)) {
+   wake = 1;
+   goto out;
}
+
+   /* console is now locked */
+
+   console_locked = 1;
+   console_may_schedule = 0;
+   mutex_acquire(_lock_dep_map, 0, 1, _RET_IP_);
+
+   retval = 1;
+
+out:
logbuf_cpu = UINT_MAX;
+   /*
+* The logbuf_lock must not be held when doing a wake up,
+* which the up(_sem) can do.
+*/
raw_spin_unlock(_lock);
if (wake)
up(_sem);
@@ -1495,11 +1514,17 @@ asmlinkage int vprintk_emit(int facility, int level,
static int recursion_bug;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
-   size_t text_len;
+   size_t text_len = 0;
enum log_flags lflags = 0;
unsigned long flags;
int this_cpu;
int printed_len = 0;
+   bool in_sched = false;
+
+   if (level == -2) {
+   level = -1;
+   in_sched = true;
+   }
 
boot_delay_msec(level);
printk_delay();
@@ -1545,7 +1570,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 * The printf needs to come first; we need the syslog
 * prefix which 

[PATCH 0/4 v6] Avoid softlockups in console_unlock()

2013-11-07 Thread Jan Kara
  Hello,

  This is the next iteration of my printk patchset. Since v5 I've made the
limit for printing configurable via kernel parameter and let it default to 0.
So unless user sets printk.offload_chars on kernel command line, there will
be no difference to current printk behavior.

Summary:

These patches avoid softlockups when a CPU gets caught in console_unlock() for
a long time during heavy printing from other CPU. As is discussed in patch 3/4
it isn't enough to just silence the watchdog because if CPU spends too long in
console_unlock() also RCU will complain, other CPUs can be blocked waiting for
printing CPU to process IPI, and even disk can be offlined because interrupts
would be disabled for too long.

This patch series solves the problem by stopping printing in console_unlock()
after X (tunable) characters and the printing is postponed to irq work. To
avoid hogging a single CPU (irq work gets processed on the same CPU where it
was queued so it doesn't really help to reduce the printing load on that CPU)
we introduce a new type of lazy irq work - IRQ_WORK_UNBOUND - which can be
processed by any CPU.

The patch series has survived my testing without any softlockup reports.  I've
tested running sysrq-t (lots of printk output) while inserting modules (to
generate IPIs and also some printk traffic) and also running two delayed works
printing 10 KB of text each. All this was with simulated 9600 baud serial
console.

Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

2013-11-07 Thread Alex Thorlton
On Wed, Nov 06, 2013 at 01:10:48PM +, Mel Gorman wrote:
> On Mon, Nov 04, 2013 at 02:03:46PM -0600, Alex Thorlton wrote:
> > On Mon, Nov 04, 2013 at 02:58:28PM +, Mel Gorman wrote:
> > > On Wed, Oct 16, 2013 at 10:54:29AM -0500, Alex Thorlton wrote:
> > > > Hi guys,
> > > > 
> > > > I ran into a bug a week or so ago, that I believe has something to do
> > > > with NUMA balancing, but I'm having a tough time tracking down exactly
> > > > what is causing it.  When running with the following configuration
> > > > options set:
> > > > 
> > > 
> > > Can you test with patches
> > > cd65718712469ad844467250e8fad20a5838baae..0255d491848032f6c601b6410c3b8ebded3a37b1
> > > applied? They fix some known memory corruption problems, were merged for
> > > 3.12 (so alternatively just test 3.12) and have been tagged for -stable.
> > 
> > I just finished testing with 3.12, and I'm still seeing the same issue.
> 
> Ok, I plugged your test into mmtests and ran it a few times but was not
> able to reproduce the same issue. It's a much smaller machine which
> might be a factor.
> 
> > I'll poke around a bit more on this in the next few days and see if I
> > can come up with any more information.  In the meantime, let me know if
> > you have any other suggestions.
> > 
> 
> Try the following patch on top of 3.12. It's a patch that is expected to
> be merged for 3.13. On its own it'll hurt automatic NUMA balancing in
> -stable but corruption trumps performance and the full series is not
> going to be considered acceptable for -stable

I gave this patch a shot, and it didn't seem to solve the problem.
Actually I'm running into what appear to be *worse* problems on the 3.12
kernel.  Here're a couple stack traces of what I get when I run the test
on 3.12, 512 cores:

(These are just two of the CPUs, obviously, but most of the memscale
processes appeared to be in one of these two spots)

Nov  7 13:54:39 uvpsw1 kernel: NMI backtrace for cpu 6
Nov  7 13:54:39 uvpsw1 kernel: CPU: 6 PID: 17759 Comm: thp_memscale Not tainted 
3.12.0-rc7-medusa-6-g0255d49 #381
Nov  7 13:54:39 uvpsw1 kernel: Hardware name: Intel Corp. Stoutland Platform, 
BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20
Nov  7 13:54:39 uvpsw1 kernel: task: 8810647e0300 ti: 88106413e000 
task.ti: 88106413e000
Nov  7 13:54:39 uvpsw1 kernel: RIP: 0010:[]  
[] _raw_spin_lock+0x1a/0x25
Nov  7 13:54:39 uvpsw1 kernel: RSP: 0018:88106413fd38  EFLAGS: 0283
Nov  7 13:54:39 uvpsw1 kernel: RAX: a1a9a0fe RBX: 0206 RCX: 
8800
Nov  7 13:54:41 uvpsw1 kernel: RDX: a1a9 RSI: 3000 RDI: 
8907ded35494
Nov  7 13:54:41 uvpsw1 kernel: RBP: 88106413fd38 R08: 0006 R09: 
0002
Nov  7 13:54:41 uvpsw1 kernel: R10: 0007 R11: 88106413ff40 R12: 
8907ded35494
Nov  7 13:54:42 uvpsw1 kernel: R13: 88106413fe1c R14: 8810637a05f0 R15: 
0206
Nov  7 13:54:42 uvpsw1 kernel: FS:  7fffd5def700() 
GS:88107d98() knlGS:
Nov  7 13:54:42 uvpsw1 kernel: CS:  0010 DS:  ES:  CR0: 8005003b
Nov  7 13:54:42 uvpsw1 kernel: CR2: 7fffd5ded000 CR3: 0107dfbcf000 CR4: 
07e0
Nov  7 13:54:42 uvpsw1 kernel: Stack:
Nov  7 13:54:42 uvpsw1 kernel:  88106413fda8 810d670a 
0002 0006
Nov  7 13:54:42 uvpsw1 kernel:  7fff57dde000 8810640e1cc0 
02006413fe10 8907ded35440
Nov  7 13:54:45 uvpsw1 kernel:  88106413fda8 0206 
0002 
Nov  7 13:54:45 uvpsw1 kernel: Call Trace:
Nov  7 13:54:45 uvpsw1 kernel:  [] 
follow_page_mask+0x123/0x3f1
Nov  7 13:54:45 uvpsw1 kernel:  [] 
__get_user_pages+0x3e3/0x488
Nov  7 13:54:45 uvpsw1 kernel:  [] get_user_pages+0x4d/0x4f
Nov  7 13:54:45 uvpsw1 kernel:  [] 
SyS_get_mempolicy+0x1a9/0x3e0
Nov  7 13:54:45 uvpsw1 kernel:  [] 
system_call_fastpath+0x16/0x1b
Nov  7 13:54:46 uvpsw1 kernel: Code: b1 17 39 c8 ba 01 00 00 00 74 02 31 d2 89 
d0 c9 c3 55 48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 d0 74 0c 
66 8b 07 <66> 39 d0 74 04 f3 90 eb f4 c9 c3 55 48 89 e5 9c 59 fa b8 00 00

Nov  7 13:55:59 uvpsw1 kernel: NMI backtrace for cpu 8
Nov  7 13:55:59 uvpsw1 kernel: INFO: NMI handler 
(arch_trigger_all_cpu_backtrace_handler) took too long to run: 1.099 msecs
Nov  7 13:56:04 uvpsw1 kernel: CPU: 8 PID: 17761 Comm: thp_memscale Not tainted 
3.12.0-rc7-medusa-6-g0255d49 #381
Nov  7 13:56:04 uvpsw1 kernel: Hardware name: Intel Corp. Stoutland Platform, 
BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20
Nov  7 13:56:04 uvpsw1 kernel: task: 881063c56380 ti: 8810621b8000 
task.ti: 8810621b8000
Nov  7 13:56:04 uvpsw1 kernel: RIP: 0010:[]  
[] _raw_spin_lock+0x1a/0x25
Nov  7 13:56:04 uvpsw1 kernel: RSP: 0018:8810621b9c98  EFLAGS: 0283
Nov  7 13:56:04 uvpsw1 kernel: RAX: a20aa0ff RBX: 8810621002b0 RCX: 
8025
Nov  7 13:56:04 uvpsw1 kernel: RDX: a20a RSI: 

Re: [Xen-devel] [PATCH] PCI: Introduce two new MSI infrastructure calls for masking/unmasking.

2013-11-07 Thread Bjorn Helgaas
On Wed, Nov 06, 2013 at 09:42:15PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 06, 2013 at 04:51:52PM -0700, Bjorn Helgaas wrote:
> > [+cc Thomas, Ingo, Peter, x86 list]
> > 
> > On Wed, Nov 6, 2013 at 2:16 PM, Konrad Rzeszutek Wilk
> >  wrote:
> > > Certain platforms do not allow writes in the MSI-X bars
> > > to setup or tear down vector values. To combat against
> > > the generic code trying to write to that and either silently
> > > being ignored or crashing due to the pagetables being marked r/o
> > > this patch introduces a platform over-write.
> > >
> > > Note that we keep two separate, non-weak, functions
> > > default_mask_msi_irqs() and default_mask_msix_irqs() for the
> > > behavior of the arch_mask_msi_irqs() and arch_mask_msix_irqs(),
> > > as the default behavior is needed by x86 PCI code.
> > >
> > > For Xen, which does not allow the guest to write to MSI-X
> > > tables - as the hypervisor is solely responsible for setting
> > > the vector values - we implement two nops.
> > >
> > > CC: Bjorn Helgaas 
> > > CC: Sucheta Chakraborty 
> > > CC: Zhenzhong Duan 
> > > Signed-off-by: Konrad Rzeszutek Wilk 
> > 
> > I think this is safe, and I'd like to squeeze it into the v3.13 merge
> > window next week, since it supersedes three patches Zhenzhong has been
> > trying to get in since July [1], and this patch is much simpler to
> > understand.
> > 
> > I *think* this also fixes an actual bug on Xen.  Konrad, is there a
> > bugzilla or any kind of email problem description that we can include
> > here as a reference?  I think there's a lost interrupt with qlcnic,
> > but I don't know the details or what the failure looks like to a user.
> 
> It is pretty catastrophic. Here is the console log when I pass in
> a PCI device (with MSI-X) to the guest:

Perfect, thanks!  I put the log info in
https://bugzilla.kernel.org/show_bug.cgi?id=64581 and put this in
my pci/misc branch for v3.13.  I added the following to the changelog:

This fixes a Xen guest crash when passing a PCI device with MSI-X to the
guest.  See the bugzilla for more details.

[bhelgaas: add bugzilla info]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581

Let me know if this needs any tweaking.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sg: O_EXCL and other lock handling

2013-11-07 Thread Christoph Hellwig
On Wed, Nov 06, 2013 at 02:30:40PM -0500, Douglas Gilbert wrote:
> >during sg_open and sg_release, which are guranteed not to migrate
> >to a different process during their run time.
> 
> True. What I stated would be a problem if a mutex tried
> to do something similar to Vaughan's patch that was
> reverted just before lk 3.12.0. That held a read or write
> semaphore between a sg_open() and sg_release().
> 
> But your point begs the question why should any driver
> pay the extra cost and usability restrictions of a kernel
> mutex over a semaphore without good reason:

It allows features like spinning before going to sleep, as well
as helping with RT patches.  But this really isn't the place to
discuss this - the ship has sailed and just about any semaphore
used for basic mutal exclusion has been converted to a mutex.

Complaints about that should be directed to Linus and/or the locking
primitive maintainers.

> So what contexts can mutexes be used in?

They can be used in process context only.  Please make sure to also
run any patches changing locking behaviour with CONFIG_PROVE_LOCKING
enabled, as that will verify all this.

> Also, which one in the above list tells me that sg_open()
> cannot return to the caller with a mutex held? Given
> the special relationship guaranteed by the OS (POSIX)
> between sg_open() and sg_release() *** for the same file
> handle, if this is not allowed, why?

Because you always release it before returning.  Note that the
sparse static checker can verify that for you, if the functions
get to big, which sg_open might get close to.  Sparse is another
tool that I wish every patch author would run before submission.

Read Documentation/sparse.txt for details.

> >It always is called through the class interface remove_dev
> >method, which always is called under a lock.
> 
> This is an important piece of extra information. Does that
> mean sg_add() and sg_remove() can't wait on anything? I note
> that sg_add() calls sg_alloc() which immediately does:
>   sdp = kzalloc(sizeof(Sg_device), GFP_KERNEL);

You can sleep under mutexes just fine, although in general it
is preferable to sleep too long with mutexes held.

> Where are the preconditions from the scsi subsystem to a ULD
> stating:
>   - that sg_remove() and sg_add() are called from a non-interrupt,
> non tasklet and non timer context?

The driver core uses sleeping calls like mutexes in all these code
pathes, so a call from these contexts will blow up immediately.

>   - invocations of sg_add() and sg_remove() are paired in a
> similar fashion as sg_open() and sg_release()

That's how the class_interfaces work, easily verified by code
inspection.  Unfortunately these interfaces aren't really documented,
but a patch to Greg who is in Cc now will take care of that.

> By the second point I mean, for example, that sg_remove()
> cannot be called twice for the same device without a sg_add()
> between them to add (back) that device.

That is correct.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] wire up CPU features to udev based module loading

2013-11-07 Thread Andi Kleen
On Thu, Nov 07, 2013 at 01:09:41PM -0800, H. Peter Anvin wrote:
> On 11/07/2013 09:17 AM, Ard Biesheuvel wrote:
> > This series implements automatic module loading based on optional CPU 
> > features,
> > and tries to do so in a generic way. Currently, 32 feature bits are 
> > supported,
> > and how they map to actual CPU features is entirely up to the architecture.
> 
> NAK.
> 
> We in the x86 world already left 32 bits way behind; we currently have
> 320 bit feature masks.
> 
> If you're aiming at doing this in a generic way, it needs to be able to
> accommodate the current x86cpu feature stuff as a subset, which this
> doesn't.

They can just use the exact same code/macros as x86cpu, just need a different 
prefix and use wildcards if they miss something (e.g. family)

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] rbtree: Fix rbtree_postorder_for_each_entry_safe() iterator

2013-11-07 Thread Andrew Morton
On Wed,  6 Nov 2013 17:42:30 -0800 Cody P Schafer  
wrote:

> The iterator rbtree_postorder_for_each_entry_safe() relies on pointer
> underflow behavior when testing for loop termination. In particular
> it expects that
>   _entry(NULL, type, field)->field
> is NULL. But the result of this expression is not defined by a C standard
> and some gcc versions (e.g. 4.3.4) assume the above expression can never
> be equal to NULL. The net result is an oops because the iteration is not
> properly terminated.
> 
> Fix the problem by modifying the iterator to avoid pointer underflows.

So the sole caller is in zswap.c.  Is that code actually generating oopses?

IOW, is there any need to fix this in 3.12 or earlier?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 -mm] provide estimated available memory in /proc/meminfo

2013-11-07 Thread Johannes Weiner
On Thu, Nov 07, 2013 at 10:13:45AM -0500, Rik van Riel wrote:
> 
> > >  fs/proc/meminfo.c | 36 
> > >  1 file changed, 36 insertions(+)
> > 
> > Documentation/filesystems/proc.txt told me it's feeling all offended.
> 
> You're right, of course.  Here is version 2 :)
> 
> ---8<---
> 
> Subject: provide estimated available memory in /proc/meminfo
> 
> Many load balancing and workload placing programs check /proc/meminfo
> to estimate how much free memory is available. They generally do this
> by adding up "free" and "cached", which was fine ten years ago, but
> is pretty much guaranteed to be wrong today.
> 
> It is wrong because Cached includes memory that is not freeable as
> page cache, for example shared memory segments, tmpfs, and ramfs,
> and it does not include reclaimable slab memory, which can take up
> a large fraction of system memory on mostly idle systems with lots
> of files.
> 
> Currently, the amount of memory that is available for a new workload,
> without pushing the system into swap, can be estimated from MemFree,
> Active(file), Inactive(file), and SReclaimable, as well as the "low"
> watermarks from /proc/zoneinfo.
> 
> However, this may change in the future, and user space really should
> not be expected to know kernel internals to come up with an estimate
> for the amount of free memory.
> 
> It is more convenient to provide such an estimate in /proc/meminfo.
> If things change in the future, we only have to change it in one place.
> 
> Signed-off-by: Rik van Riel 
> Reported-by: Erik Mouw 

Acked-by: Johannes Weiner 

I have a suspicion that people will end up relying on this number to
start new workloads in situations where lots of the page cache is
actually heavily used.  We might not swap, but there will still be IO
from thrashing cache.

Maybe we'll have to subtract mapped cache pages in the future to
mitigate this risk somehow...

Anyway, we can defer this to when it's proven to be an actual problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >