Re: [PATCH] Make math_state_restore() save and restore the interrupt flag

2014-01-30 Thread Suresh Siddha
hi,

On Thu, Jan 30, 2014 at 2:24 PM, Linus Torvalds
 wrote:
> I'm adding in some people here, because I think in the end this bug
> was introduced by commit 304bceda6a18 ("x86, fpu: use non-lazy fpu
> restore for processors supporting xsave") that introduced that
> math_state_restore() in kernel_fpu_end(), but we have other commits
> (like 5187b28ff08: "x86: Allow FPU to be used at interrupt time even
> with eagerfpu") that seem tangential too and might be part of why it
> actually *triggers* now.
>
> Comments?

I haven't been following the recent changes closely, so before I get a
chance to review the current bug and the relevant commits, wanted to
added that:

a. delayed dynamic allocation of FPU state area was not a good idea
(from me). Given most of the future cases will be anyway using eager
FPU (because of processor features like xsaveopt etc, applications
implicitly using FPU because of optimizations in commonly used
libraries etc), we should probably go back to allocation of FPU state
area during thread creation for everyone (including non-eager cases).
Memory savings will be small anyways and the code complexity
introducing subtle bugs like this in not worth it.

b. with the above change, kernel_fpu_begin() will just save any user
live math state and be ready for kernel math operations. And
kernel_fpu_end() will drop the kernel math state and for  eager-fpu
case restore the user math state.

We will avoid worrying about any memory allocations in the
math_state_restore() with interrupts disabled etc.

If there are no objections, I will see if I can come up with a quick
patch. or will ask HPA to help fill me in.

thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers:staging:octeon-usb: Fixed Bitfields coding style errors

2014-01-30 Thread Surendra Patil
Fixed Below coding style errors -
octeon-hcd.h:146: ERROR: spaces prohibited around that ':' (ctx:WxW)
octeon-hcd.h:147: ERROR: spaces prohibited around that ':' (ctx:WxW)
total: 243 errors, 0 warnings, 1819 lines checked - fixed all errors

Signed-off-by: Surendra Patil 
---
 drivers/staging/octeon-usb/octeon-hcd.h | 486 
 1 file changed, 243 insertions(+), 243 deletions(-)

diff --git a/drivers/staging/octeon-usb/octeon-hcd.h 
b/drivers/staging/octeon-usb/octeon-hcd.h
index 42fe4fe..c534e317 100644
--- a/drivers/staging/octeon-usb/octeon-hcd.h
+++ b/drivers/staging/octeon-usb/octeon-hcd.h
@@ -143,13 +143,13 @@ union cvmx_usbcx_gahbcfg {
 *  * 1'b1: Unmask the interrupt assertion to the application.
 */
struct cvmx_usbcx_gahbcfg_s {
-   uint32_t reserved_9_31  : 23;
-   uint32_t ptxfemplvl : 1;
-   uint32_t nptxfemplvl: 1;
-   uint32_t reserved_6_6   : 1;
-   uint32_t dmaen  : 1;
-   uint32_t hbstlen: 4;
-   uint32_t glblintrmsk: 1;
+   uint32_t reserved_9_31:23;
+   uint32_t ptxfemplvl:1;
+   uint32_t nptxfemplvl:1;
+   uint32_t reserved_6_6:1;
+   uint32_t dmaen:1;
+   uint32_t hbstlen:4;
+   uint32_t glblintrmsk:1;
} s;
 };
 
@@ -209,16 +209,16 @@ union cvmx_usbcx_ghwcfg3 {
 *  * Others: Reserved
 */
struct cvmx_usbcx_ghwcfg3_s {
-   uint32_t dfifodepth : 16;
-   uint32_t reserved_13_15 : 3;
-   uint32_t ahbphysync : 1;
-   uint32_t rsttype: 1;
-   uint32_t optfeature : 1;
-   uint32_t vendor_control_interface_support   : 1;
-   uint32_t i2c_selection  : 1;
-   uint32_t otgen  : 1;
-   uint32_t pktsizewidth   : 3;
-   uint32_t xfersizewidth  : 4;
+   uint32_t dfifodepth:16;
+   uint32_t reserved_13_15:3;
+   uint32_t ahbphysync:1;
+   uint32_t rsttype:1;
+   uint32_t optfeature:1;
+   uint32_t vendor_control_interface_support:1;
+   uint32_t i2c_selection:1;
+   uint32_t otgen:1;
+   uint32_t pktsizewidth:3;
+   uint32_t xfersizewidth:4;
} s;
 };
 
@@ -275,38 +275,38 @@ union cvmx_usbcx_gintmsk {
 * @modemismsk: Mode Mismatch Interrupt Mask (ModeMisMsk)
 */
struct cvmx_usbcx_gintmsk_s {
-   uint32_t wkupintmsk : 1;
-   uint32_t sessreqintmsk  : 1;
-   uint32_t disconnintmsk  : 1;
-   uint32_t conidstschngmsk: 1;
-   uint32_t reserved_27_27 : 1;
-   uint32_t ptxfempmsk : 1;
-   uint32_t hchintmsk  : 1;
-   uint32_t prtintmsk  : 1;
-   uint32_t reserved_23_23 : 1;
-   uint32_t fetsuspmsk : 1;
-   uint32_t incomplpmsk: 1;
-   uint32_t incompisoinmsk : 1;
-   uint32_t oepintmsk  : 1;
-   uint32_t inepintmsk : 1;
-   uint32_t epmismsk   : 1;
-   uint32_t reserved_16_16 : 1;
-   uint32_t eopfmsk: 1;
-   uint32_t isooutdropmsk  : 1;
-   uint32_t enumdonemsk: 1;
-   uint32_t usbrstmsk  : 1;
-   uint32_t usbsuspmsk : 1;
-   uint32_t erlysuspmsk: 1;
-   uint32_t i2cint : 1;
-   uint32_t ulpickintmsk   : 1;
-   uint32_t goutnakeffmsk  : 1;
-   uint32_t ginnakeffmsk   : 1;
-   uint32_t nptxfempmsk: 1;
-   uint32_t rxflvlmsk  : 1;
-   uint32_t sofmsk : 1;
-   uint32_t otgintmsk  : 1;
-   uint32_t modemismsk : 1;
-   uint32_t reserved_0_0   : 1;
+   uint32_t wkupintmsk:1;
+   uint32_t sessreqintmsk:1;
+   uint32_t disconnintmsk:1;
+   uint32_t conidstschngmsk:1;
+   uint32_t reserved_27_27:1;
+   uint32_t ptxfempmsk:1;
+   uint32_t hchintmsk:1;
+   uint32_t prtintmsk:1;
+   uint32_t reserved_23_23:1;
+   uint32_t fetsuspmsk:1;
+   uint32_t incomplpmsk:1;
+   ui

Re: [PATCH] kconfig: consolidate arch-specific seccomp options

2014-01-30 Thread Ingo Molnar

* Dave Hansen  wrote:

> On 01/30/2014 12:55 AM, Ingo Molnar wrote:
> >> > +  This kernel feature is useful for number crunching 
> >> > applications
> >> > +  that may need to compute untrusted bytecode during their
> >> > +  execution. By using pipes or other transports made available 
> >> > to
> > I'd change and simplify the first sentence to:
> > 
> >> > +  This kernel feature is useful to sandbox runtimes that need
> >> > +  to execute untrusted machine code.
> > Seccomp isn't primarily about number crunching anymore, and it's 
> > definitely not about 'bytecode' in the classical sense either.
> 
> I'll change that if I need to send it again.  Otherwise, I'll leave 
> it to the folks who actually know something about the feature, which 
> isn't me.

Ok, consider the x86 bits NAK-ed, which is lifted if the text is 
updated as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] memcg: fix mutex not unlocked on memcg_create_kmem_cache fail path

2014-01-30 Thread Vladimir Davydov
Hi, David

Thank you for taking look at this and adding the missing patch
description. WRT your patch, please see the comment inline.

On 01/31/2014 02:39 AM, David Rientjes wrote:
> On Thu, 30 Jan 2014, Andrew Morton wrote:
>
>>> It always was.
>> eh?  kmem_cache_create_memcg()'s kstrdup() will allocate the minimum
>> needed amount of memory.
>>
> Ah, good point.  We could this incrementally on my patch:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -637,6 +637,9 @@ int memcg_limited_groups_array_size;
>   * better kept as an internal representation in cgroup.c. In any case, the
>   * cgrp_id space is not getting any smaller, and we don't have to necessarily
>   * increase ours as well if it increases.
> + *
> + * Updates to MAX_SIZE should update the space for the memcg name in
> + * memcg_create_kmem_cache().
>   */
>  #define MEMCG_CACHES_MIN_SIZE 4
>  #define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
> @@ -3400,8 +3403,10 @@ void mem_cgroup_destroy_cache(struct kmem_cache 
> *cachep)
>  static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
> struct kmem_cache *s)
>  {
> - char *name = NULL;
>   struct kmem_cache *new;
> + const char *cgrp_name;
> + char *name = NULL;
> + size_t len;
>  
>   BUG_ON(!memcg_can_account_kmem(memcg));
>  
> @@ -3409,9 +3414,22 @@ static struct kmem_cache 
> *memcg_create_kmem_cache(struct mem_cgroup *memcg,
>   if (unlikely(!name))
>   return NULL;
>  
> + /*
> +  * Format of a memcg's kmem cache name:
> +  * (:)
> +  */
> + len = strlen(s->name);
> + /* Space for parentheses, colon, terminator */
> + len += 4;
> + /* MEMCG_CACHES_MAX_SIZE is USHRT_MAX */
> + len += 5;
> + BUILD_BUG_ON(MEMCG_CACHES_MAX_SIZE > USHRT_MAX);
> +

This looks cumbersome, IMO. Let's leave it as is for now. AFAIK,
cgroup_name() will be reworked soon so that it won't require RCU-context
(https://lkml.org/lkml/2014/1/28/530). Therefore, it will be possible to
get rid of this pointless tmp_name allocation by making
kmem_cache_create_memcg() take not just name, but printf-like format +
vargs.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] dma: Add Xilinx AXI Video Direct Memory Access Engine driver support

2014-01-30 Thread Srikanth Thokala
Hi Vinod,

On Mon, Jan 27, 2014 at 4:36 PM, Srikanth Thokala  wrote:
> Hi Vinod,
>
> On Sun, Jan 26, 2014 at 7:29 PM, Vinod Koul  wrote:
>> On Fri, Jan 24, 2014 at 02:24:27PM +0100, Lars-Peter Clausen wrote:
>>> On 01/24/2014 12:16 PM, Srikanth Thokala wrote:
>>> > Hi Lars,
>>> >
>>> > On Thu, Jan 23, 2014 at 4:55 PM, Lars-Peter Clausen  
>>> > wrote:
>>> >> On 01/22/2014 05:52 PM, Srikanth Thokala wrote:
>>> >> [...]
>>> >>> +/**
>>> >>> + * xilinx_vdma_device_control - Configure DMA channel of the device
>>> >>> + * @dchan: DMA Channel pointer
>>> >>> + * @cmd: DMA control command
>>> >>> + * @arg: Channel configuration
>>> >>> + *
>>> >>> + * Return: '0' on success and failure value on error
>>> >>> + */
>>> >>> +static int xilinx_vdma_device_control(struct dma_chan *dchan,
>>> >>> +   enum dma_ctrl_cmd cmd, unsigned 
>>> >>> long arg)
>>> >>> +{
>>> >>> + struct xilinx_vdma_chan *chan = to_xilinx_chan(dchan);
>>> >>> +
>>> >>> + switch (cmd) {
>>> >>> + case DMA_TERMINATE_ALL:
>>> >>> + xilinx_vdma_terminate_all(chan);
>>> >>> + return 0;
>>> >>> + case DMA_SLAVE_CONFIG:
>>> >>> + return xilinx_vdma_slave_config(chan,
>>> >>> + (struct xilinx_vdma_config *)arg);
>>> >>
>>> >> You really shouldn't be overloading the generic API with your own 
>>> >> semantics.
>>> >> DMA_SLAVE_CONFIG should take a dma_slave_config and nothing else.
>>> >
>>> > Ok.  The driver needs few additional configuration from the slave
>>> > device like Vertical
>>> > Size, Horizontal Size,  Stride etc., for the DMA transfers, in that case 
>>> > do you
>>> > suggest me to define a separate dma_ctrl_cmd like the one 
>>> > FSLDMA_EXTERNAL_START
>>> > defined for Freescale drivers?
>>>
>>> In my opinion it is not a good idea to have driver implement a generic API,
>>> but at the same time let the driver have custom semantics for those API
>>> calls. It's a bit like having a gpio driver that expects 23 and 42 as the
>>> values passed to gpio_set_value instead of 0 and 1. It completely defeats
>>> the purpose of a generic API, namely that you are able to write generic code
>>> that makes use of the API without having to know about which implementation
>>> API it is talking to. The dmaengine framework provides the
>>> dmaengine_prep_interleaved_dma() function to setup two dimensional
>>> transfers, e.g. take a look at sirf-dma.c or imx-dma.c.
>>
>> The question here i think would be waht this device supports? Is the hardware
>> capable of doing interleaved transfers, then would make sense.
>>
>> While we do try to get users use dma_slave_config, but there will always be
>> someone who have specfic params. If we can generalize then we might want to 
>> add
>> to the dma_slave_config as well
>
> There are many configuration parameters which are specific to IP and I
> would like to
> give an overview of some of parameteres here:
>
> 1) Park Mode ('cfg->park'): In Park mode, engine will park on frame
> referenced by
> 'cfg->park_frm', so user will have control on each frame in this mode.
>
> 2) Interrupt Coalesce ('cfg->coalesce'):  Used for setting interrupt
> threshold. This value
>determines the number of frame buffers to process. To use this feature,
>'cfg->frm_cnt_en' should be set.
>
> 3) Frame Synchronization Source ('cfg->ext_fsync'):  Can be an
> external/internal frame
> synchronization source. Used to synchronize one channel (MM2S/S2MM) with
> another (S2MM/MM2S) channel.
>
> 4) Genlock Synchronization ('cfg->genlock'): Used to avoid mismatch rate 
> between
> master and slave.  In master mode (cfg->master), frames are not dropped 
> and
> slave can drop frames to adjust to master frame rate.
>
> And in future, this Engine being a soft IP, we could expect some more 
> additional
> parameters.  Isn't a good idea to have a private member in dma_slave_config 
> for
> sharing additional configuration between slave device and dma engine? Or a new
> dma_ctrl_cmd like FSLDMA_EXTERNAL_START?


Ping?

>
> Srikanth
>
>>
>> --
>> ~Vinod
>> --/EX
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] dma: Add Xilinx AXI Video Direct Memory Access Engine driver support

2014-01-30 Thread Srikanth Thokala
Hi Vinod,

On Tue, Jan 28, 2014 at 8:43 AM, Vinod Koul  wrote:
> On Mon, Jan 27, 2014 at 06:42:36PM +0530, Srikanth Thokala wrote:
>> Hi Lars/Vinod,
>> >> The question here i think would be waht this device supports? Is the 
>> >> hardware
>> >> capable of doing interleaved transfers, then would make sense.
>> >
>> > The hardware does 2D transfers. The parameters for a transfer are height,
>> > width and stride. That's only a subset of what interleaved transfers can be
>> > (xt->num_frames must be one for 2d transfers). But if I remember correctly
>> > there has been some discussion on this in the past and the result of that
>> > discussion was that using interleaved transfers for 2D transfers is
>> > preferred over adding a custom API for 2D transfers.
>>
>> I went through the prep_interleaved_dma API and I see only one descriptor
>> is prepared per API call (i.e. per frame).  As our IP supports upto 16 frame
>> buffers (can be more in future), isn't it less efficient compared to the
>> prep_slave_sg where we get a single sg list and can prepare all the 
>> descriptors
>> (of non-contiguous buffers) in one go?  Correct me, if am wrong and let me
>> know your opinions.
> Well the descriptor maybe one, but that can represent multiple frames, for
> example 16 as in your case. Can you read up the documentation of how multiple
> frames are passed. Pls see include/linux/dmaengine.h
>
> /**
>  * Interleaved Transfer Request
>  * 
>  * A chunk is collection of contiguous bytes to be transfered.
>  * The gap(in bytes) between two chunks is called inter-chunk-gap(ICG).
>  * ICGs may or maynot change between chunks.
>  * A FRAME is the smallest series of contiguous {chunk,icg} pairs,
>  *  that when repeated an integral number of times, specifies the transfer.
>  * A transfer template is specification of a Frame, the number of times
>  *  it is to be repeated and other per-transfer attributes.
>  *
>  * Practically, a client driver would have ready a template for each
>  *  type of transfer it is going to need during its lifetime and
>  *  set only 'src_start' and 'dst_start' before submitting the requests.
>  *
>  *
>  *  |  Frame-1|   Frame-2   | ~ |   Frame-'numf'  |
>  *  |==.===...=...|==.===...=...| ~ |==.===...=...|
>  *
>  *==  Chunk size
>  *... ICG
>  */

Yes, it can handle multiple frames specified by 'numf' each of size
'frame_size * sgl[0].size'.
But, I see it only works if all the frames' memory is contiguous and
in this case we
can just increment 'src_start' by the total frame size 'numf' number
of times to fill in
for each HW descriptor (each frame is one HW descriptor).  So, there
is no issue when the
memory is contiguous.  If the frames are non contiguous, we have to
call this API for each
frame (hence for each descriptor), as the src_start for each frame is
different.  Is it correct?

FYI: This hardware has an inbuilt Scatter-Gather engine.

Srikanth

>
> --
> ~Vinod
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tile: remove compat_sys_lookup_dcookie declaration to fix compile error

2014-01-30 Thread Heiko Carstens
With d8d14bd09cdd "fs/compat: fix lookup_dcookie() parameter handling" I
changed the type of the len parameter of the lookup_dcookie() syscall.

However I missed that there was still a stale declaration in arch/tile/..
which now causes a compile error on tile:

In file included from fs/dcookies.c:28:0:
include/linux/compat.h:425:17: error: conflicting types for 
'compat_sys_lookup_dcookie'
fs/dcookies.c:207:1: error: conflicting types for 'compat_sys_lookup_dcookie'

Simply remove the declaration in the tile architecture, which is only a
leftover from before the different compat lookup_dcookie() versions have
been merged.
The declaration is now in include/linux/compat.h

The build error was reported by Fenguang's build bot.

Signed-off-by: Heiko Carstens 
---
 arch/tile/include/asm/compat.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
index 78f1f2ded86c..ffd4493efc78 100644
--- a/arch/tile/include/asm/compat.h
+++ b/arch/tile/include/asm/compat.h
@@ -281,7 +281,6 @@ long compat_sys_pread64(unsigned int fd, char __user *ubuf, 
size_t count,
u32 dummy, u32 low, u32 high);
 long compat_sys_pwrite64(unsigned int fd, char __user *ubuf, size_t count,
 u32 dummy, u32 low, u32 high);
-long compat_sys_lookup_dcookie(u32 low, u32 high, char __user *buf, size_t 
len);
 long compat_sys_sync_file_range2(int fd, unsigned int flags,
 u32 offset_lo, u32 offset_hi,
 u32 nbytes_lo, u32 nbytes_hi);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: exynos_hdmi.c fails to build with v3.13-10094-g9b0cd30

2014-01-30 Thread Sachin Kamat
Hi Josh,

On 30 January 2014 22:17, Josh Boyer  wrote:
> Hi All,
>
> After the DRM merge, the exynos_hdmi.c file fails to build with our
> ARM config.  The error is:
>
> drivers/gpu/drm/exynos/exynos_hdmi.c:382:8: error: 'hdmi_infoframe'
> defined as wrong kind of tag
>  struct hdmi_infoframe {
> ^
> make[4]: *** [drivers/gpu/drm/exynos/exynos_hdmi.o] Error 1
> make[3]: *** [drivers/gpu/drm/exynos] Error 2
> make[2]: *** [drivers/gpu/drm] Error 2
>
> which to me was a somewhat confusing error message.  After digging
> further, I believe it means that there is a conflict with the
> definition in exynos_hdmi.c and the one found in include/linux/hdmi.h
> for what hdmi_infoframe is supposed to be.
>
> exynos_hdmi.c:
>
> struct hdmi_infoframe {
> enum HDMI_PACKET_TYPE type;
> u8 ver;
> u8 len;
> };
>
>
> include/linux/hdmi.h:
>
> union hdmi_infoframe {
> struct hdmi_any_infoframe any;
> struct hdmi_avi_infoframe avi;
> struct hdmi_spd_infoframe spd;
> union hdmi_vendor_any_infoframe vendor;
> struct hdmi_audio_infoframe audio;
> };
>
>
> Could someone take a look at this?  I have no idea how this wasn't
> caught before being merged.

Thank you for reporting this issue. I have just posted a fix for this
(CC'd you) and hence the link is not yet available. Please test the
patch.
This issue surfaced because of commit 985e5dc207e1 ("drm/edid:
Populate picture aspect ratio for CEA modes") which includes
linux/hdmi.h and which got merged last month and hence the build issue
did not appear when the failing patch (commit a144c2e9f17b "
drm/exynos: sending AVI and AUI info frames") initially got merged
about a year earlier. Infact "union hdmi_infoframe" itself was added
to include/linux/hdmi.h in Aug. 2013. Now that we have something
available in global header, the next step would be to utilize that
definition instead of the local one in exynos_hdmi.c.

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] rtl8192ce is disabling for too long the irqs

2014-01-30 Thread Larry Finger

On 01/30/2014 11:16 PM, Olivier Langlois wrote:

rtl8192ce is disabling for too long the local interrupts during hw 
initiatialisation when performing scans

The observable symptoms in dmesg can be:

- underruns from ALSA playback
- clock freezes (tstamps do not change for several dmesg entries until irqs are 
finaly reenabled):

[  250.817669] rtlwifi:rtl_op_config():<0-0-0> 0x100
[  250.817685] rtl8192ce:_rtl92ce_phy_set_rf_power_state():<0-1-0> IPS Set eRf 
nic enable
[  250.817732] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.817796] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.817910] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818024] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818139] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818253] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818367] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:98053f15:10
[  250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
[  250.818472] rtl8192c_common:rtl92c_download_fw():<0-1-0> Firmware 
Version(49), Signature(0x88c1),Size(32)
[  250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> 
PairwiseEncAlgorithm = 0 GroupEncAlgorithm = 0
[  250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> The 
SECR-value cc
[  250.818472] 
rtl8192c_common:rtl92c_dm_check_txpower_tracking_thermal_meter():<0-1-0> 
Schedule TxPowerTracking direct call!!
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
rtl92c_dm_txpower_tracking_callback_thermalmeter
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Initial pathA ele_d reg0xc80 = 0x4000, ofdm_index=0xc
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Initial reg0xa24 = 0x90e1317, cck_index=0xc, ch14 0
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf delta 0x1 
delta_lck 0x0 delta_iqk 0x0
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> <===
[  250.818472] 
rtl8192c_common:rtl92c_dm_initialize_txpower_tracking_thermalmeter():<0-1-0> 
pMgntInfo->txpower_tracking = 1
[  250.818472] rtl8192ce:rtl92ce_led_control():<0-1-0> ledaction 3
[  250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
[  250.818472] rtlwifi:rtl_ips_nic_on():<0-1-0> before spin_unlock_irqrestore
[  251.154656] PCM: Lost interrupts? [Q]-0 (stream=0, delta=15903, 
new_hw_ptr=293408, old_hw_ptr=277505)

The exact code flow that causes that is:

1. wpa_supplicant send a start_scan request to the nl80211 driver
2. mac80211 module call rtl_op_config with IEEE80211_CONF_CHANGE_IDLE
3.   rtl_ips_nic_on is called which disable local irqs
4. rtl92c_phy_set_rf_power_state() is called
5.   rtl_ps_enable_nic() is called and hw_init()is executed and then the 
interrupts on the device are enabled

A good solution could be to refactor the code to avoid calling 
rtl92ce_hw_init() with the irqs disabled
but a quick and dirty solution that has proven to work is
to reenable the irqs during the function rtl92ce_hw_init().

I think that it is safe doing so since the device interrupt will only be 
enabled after the init function succeed.

Signed-off-by: Olivier Langlois 


Sorry, I missed that your subject was a little wrong. This E-mail should have 
the title of "[PATCH 1/2] rtlwifi: rtl8192ce: Fix too long disable of IRQs". 
This way the entity that is being patched is identified in the subject in the 
git commit message. Using "fix" helps in getting it incorporated immediately. 
For the second patch, I recommend "[PATCH 2/2] rtlwifi: Fix incorrect return 
from rtl_ps_enable_nic()".


The reason GregKH sent you the message is due to a misunderstanding of the 
business about Cc to stable. Do not send the E-mail to that mailing list but 
include a line that says "Cc: Stable " immediately after 
your "Signed-off-by:" line in the body of the E-mail. When the patch gets added 
to Linus' tree, that will trigger its inclusion in the stable kernels.


Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please rea

[PATCH] Drivers:staging:octeon-usb: Fixed Few coding style errors

2014-01-30 Thread Surendra Patil
fixed below errors - only few listed
octeon-hcd.c:162: ERROR: spaces prohibited around that ':' (ctx:WxW)
cteon-hcd.c:249: ERROR: Macros with complex values should be enclosed in 
parenthesis
octeon-hcd.c:992: WARNING: braces {} are not necessary for single statement 
blocks
octeon-hcd.c:3228: ERROR: return is not a function, parentheses are not required

Signed-off-by: Surendra Patil 
---
 drivers/staging/octeon-usb/octeon-hcd.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/octeon-usb/octeon-hcd.c 
b/drivers/staging/octeon-usb/octeon-hcd.c
index 47e0a91..b25e0f8 100644
--- a/drivers/staging/octeon-usb/octeon-hcd.c
+++ b/drivers/staging/octeon-usb/octeon-hcd.c
@@ -159,13 +159,13 @@ enum cvmx_usb_complete {
  * status call.
  */
 struct cvmx_usb_port_status {
-   uint32_t reserved   : 25;
-   uint32_t port_enabled   : 1;
-   uint32_t port_over_current  : 1;
-   uint32_t port_powered   : 1;
-   enum cvmx_usb_speed port_speed  : 2;
-   uint32_t connected  : 1;
-   uint32_t connect_change : 1;
+   uint32_t reserved:25;
+   uint32_t port_enabled:1;
+   uint32_t port_over_current:1;
+   uint32_t port_powered:1;
+   enum cvmx_usb_speed port_speed:2;
+   uint32_t connected:1;
+   uint32_t connect_change:1;
 };
 
 /**
@@ -181,11 +181,11 @@ struct cvmx_usb_port_status {
 union cvmx_usb_control_header {
uint64_t u64;
struct {
-   uint64_t request_type   : 8;
-   uint64_t request: 8;
-   uint64_t value  : 16;
-   uint64_t index  : 16;
-   uint64_t length : 16;
+   uint64_t request_type:8;
+   uint64_t request:8;
+   uint64_t value:16;
+   uint64_t index:16;
+   uint64_t length:16;
} s;
 };
 
@@ -246,7 +246,7 @@ enum cvmx_usb_pipe_flags {
 };
 
 /* Normal prefetch that use the pref instruction. */
-#define CVMX_PREFETCH(address, offset) asm volatile ("pref %[type], 
%[off](%[rbase])" : : [rbase] "d" (address), [off] "I" (offset), [type] "n" (0))
+#define CVMX_PREFETCH(address, offset) (asm volatile ("pref %[type], 
%[off](%[rbase])" : : [rbase] "d" (address), [off] "I" (offset), [type] "n" 
(0)))
 
 /* Maximum number of times to retry failed transactions */
 #define MAX_RETRIES3
@@ -989,9 +989,8 @@ static int cvmx_usb_enable(struct cvmx_usb_state *usb)
return 0;
 
/* If there is nothing plugged into the port then fail immediately */
-   if (!usb->usbcx_hprt.s.prtconnsts) {
+   if (!usb->usbcx_hprt.s.prtconnsts)
return -ETIMEDOUT;
-   }
 
/* Program the port reset bit to start the reset process */
USB_SET_FIELD32(CVMX_USBCX_HPRT(usb->index), union cvmx_usbcx_hprt, 
prtrst, 1);
@@ -3225,7 +3224,7 @@ static int octeon_usb_hub_status_data(struct usb_hcd 
*hcd, char *buf)
buf[0] = 0;
buf[0] = port_status.connect_change << 1;
 
-   return (buf[0] != 0);
+   return buf[0] != 0;
 }
 
 static int octeon_usb_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 
wValue, u16 wIndex, char *buf, u16 wLength)
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AW: [PATCH 1/1] iMX gpio: Allow reading back of pin status if configured as gpio output

2014-01-30 Thread Waibel Georg
On Thu, Jan 23, 2014 at 08:40 PM, Linus Walleij  
wrote:

> I'm holding this off until you've made up your mind about whether it's needed 
> or not...

By setting the SION bit the actual status of a gpio pin can be read back 
regardless of its driver configuration. 
You can drop my patch as there is no need for it.

Regards
Georg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] usb: hub: Avoid tight loop holding hdev lock

2014-01-30 Thread Manoj Chourasia
Hi Greg,

Sorry, I forgot to mentioned base kernel version. Following patch based on 
linux-3.13.y till date.
I don't find a better way to avoid infinite loops. Moreover in worst case I 
find that device lock is held forever. Usually usb_open was able to get this 
lock in less than 5 seconds but rarely issue reproduces when it never gets lock 
for more than 2 minutes. 

--- drivers/usb/core/hub.c.orig 2014-01-30 12:49:39.797839231 +0530
+++ drivers/usb/core/hub.c  2014-01-30 12:49:56.189839229 +0530
@@ -4895,6 +4895,9 @@ static void hub_events(void)
usb_unlock_device(hdev);
kref_put(&hub->kref, hub_release);

+   /* preventing tight loop holding hdev lock */
+   msleep(20);
+
} /* end while (1) */
 }

-Manoj


-Original Message-
From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] 
Sent: Tuesday, January 14, 2014 12:21 AM
To: Manoj Chourasia
Cc: linux-kernel@vger.kernel.org; sta...@kernel.org
Subject: Re: [PATCH] usb: hub: Avoid tight loop holding hdev lock

On Tue, Dec 31, 2013 at 02:26:41PM +0530, Manoj Chourasia wrote:
> Hi All,
> 
> I was facing an issue bad usb device which was affecting other system. Kernel 
> was trying to enumerate the device but it was failing continuously. 
> Unfortunately that device was on mounted onboard in the platform so I cannot 
> remove it.
> The continuous re-enumeration of the device causing other application 
> malfunctioning which were using libusb. I found that usb_find_devices() call 
> was stuck in usb_open for very long time. It was stuck in getting device lock 
> which was taken in hub_event thread.
> Solution was to add msleep in the loop to prevent is spinning tightly. 
> 
> --
> -
> usb: hub: Avoid tight loop holding hdev lock
> 
>Other system call(like usb_open) to root port device
> starved in getting device lock when the while loop
> in hub_event loops tightly because of misbehaving device.
> 
> Adding a small msleep provides chance to system calls to
> schedule.
> 
> The issue was returning -EPROTO and re-enumerating with
> continuous hub_events. That was makes the while loop in
> hub_event to spin tightly.
> 
> usb_find_devices call from libusb tries to open all devices
> including root hub. The call to usb_open stuck for very
> long time(sometimes forever) because the priority of
> kernel thread is higher than that system call in this case.
> 
> Signed-off-by: Manoj Chourasia 

Please don't make me hand-edit patches in order to be able to apply them.  I 
don't scale at all, so if you want this applied, please resend it.

Also, this isn't how you submit patches to the stable kernel tree, please read 
Documentation/stable_kernel_rules.txt for how to do that.

> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 
> c5c3667..b968fd5 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -4899,6 +4899,9 @@ static void hub_events(void)
> usb_unlock_device(hdev);
> kref_put(&hub->kref, hub_release);
> 
> +   /* preventing tight loop holding hdev lock */
> +   msleep(20);

This feels like a horrible hack for some seriously broken hardware.  Now I know 
we work around broken hardware all the time, is this really the only way the 
system can recover?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/16] Volatile Ranges v10

2014-01-30 Thread Johannes Weiner
On Thu, Jan 30, 2014 at 05:27:18PM -0800, John Stultz wrote:
> On 01/29/2014 10:30 AM, Johannes Weiner wrote:
> > On Tue, Jan 28, 2014 at 05:43:54PM -0800, John Stultz wrote:
> >> On 01/28/2014 04:03 PM, Johannes Weiner wrote:
> >>> On Thu, Jan 02, 2014 at 04:12:08PM +0900, Minchan Kim wrote:
>  o Syscall interface
> >>> Why do we need another syscall for this?  Can't we extend madvise to
> >>> take MADV_VOLATILE, MADV_NONVOLATILE, and return -ENOMEM if something
> >>> in the range was purged?
> >> So the madvise interface is insufficient to provide the semantics
> >> needed. Not so much for MADV_VOLATILE, but MADV_NONVOLATILE. For the
> >> NONVOLATILE call, we have to atomically unmark the volatility status of
> >> the byte range and provide the purge status, which informs the caller if
> >> any of the data in the specified range was discarded (and thus needs to
> >> be regenerated).
> >>
> >> The problem is that by clearing the range, we may need to allocate
> >> memory (possibly by splitting in an existing range segment into two),
> >> which possibly could fail. Unfortunately this could happen after we've
> >> modified the volatile state of part of that range.  At this point we
> >> can't just fail, because we've modified state and we also need to return
> >> the purge status of the modified state.
> > munmap() can theoretically fail for the same reason (splitting has to
> > allocate a new vma) but it's not even documented.  The allocator does
> > not fail allocations of that order.
> >
> > I'm not sure this is good enough, but to me it sounds a bit overkill
> > to design a new system call around a non-existent problem.
> 
> I still think its problematic design issue. With munmap, I think
> re-calling on failure should be fine. But with _NONVOLATILE we could
> possibly lose the purge status on a second call (for instance if only
> the first page of memory was purged, but we errored out mid-call w/
> ENOMEM, on the second call it will seem like the range was successfully
> set non-volatile with no memory purged).
> 
> And even if the current allocator never ever fails, I worry at some
> point in the future that rule might change and then we'd have a broken
> interface.

Fair enough, we don't have to paint ourselves into a corner.

> >>> 2. If page reclaim discards a page from the upper end of a a range,
> >>>you mark the whole range as purged.  If the user later marks the
> >>>lower half of the range as non-volatile, the syscall will report
> >>>purged=1 even though all requested pages are still there.
> >> To me this aspect is a non-ideal but acceptable result of the usage 
> >> pattern.
> >>
> >> Semantically, the hard rule would be we never report non-purged if pages
> >> in a range were purged.  Reporting purged when pages technically weren't
> >> is not optimal but acceptable side effect of unmarking a sub-range. And
> >> could be avoided by applications marking and unmarking objects 
> >> consistently.
> >>
> >>
> >>>The only way to make these semantics clean is either
> >>>
> >>>  a) have vrange() return a range ID so that only full ranges can
> >>>  later be marked non-volatile, or
> >>>
> >>>  b) remember individual page purges so that sub-range changes can
> >>>  properly report them
> >>>
> >>>I don't like a) much because it's somewhat arbitrarily more
> >>>restrictive than madvise, mprotect, mmap/munmap etc.  
> >> Agreed on A.
> >>
> >>> And for b),
> >>>the straight-forward solution would be to put purge-cookies into
> >>>the page tables to properly report purges in subrange changes, but
> >>>that would be even more coordination between vmas, page tables, and
> >>>the ad-hoc vranges.
> >> And for B this would cause way too much overhead for the mark/unmark
> >> operations, which have to be lightweight.
> > Yes, and allocators/message passers truly don't need this because at
> > the time they set a region to volatile the contents are invalidated
> > and the non-volatile declaration doesn't give a hoot if content has
> > been destroyed.
> >
> > But caches certainly would have to know if they should regenerate the
> > contents.  And bigger areas should be using huge pages, so we'd check
> > in 2MB steps.  Is this really more expensive than regenerating the
> > contents on a false positive?
> 
> So you make a good argument. I'd counter that the false-positives are
> only caused when unmarking subranges of larger marked volatile range,
> and for use cases that would care about regenerating the contents,
> that's not a likely useage model (as they're probably going to be
> marking objects in memory volatile/nonvolatile, not just arbitrary
> ranges of pages).

I can imagine that applications have continuous areas of same-sized
objects and want to mark a whole range of them volatile in one go,
then later come back for individual objects.

Otherwise we'd require N adjacent objects to be marked individually
through N syscalls to create N sep

Re: [PATCH v2 2/4] net: ethoc: don't advertise gigabit speed on attached PHY

2014-01-30 Thread Max Filippov
On Wed, Jan 29, 2014 at 10:32 PM, Max Filippov  wrote:
> On Wed, Jan 29, 2014 at 9:12 PM, Florian Fainelli  
> wrote:
>> On Jan 28, 2014 11:01 PM, "Max Filippov"  wrote:
>>>
>>> On Wed, Jan 29, 2014 at 10:47 AM, Florian Fainelli 
>>> wrote:
>>> > Hi Max,
>>> >
>>> > Le 28/01/2014 22:00, Max Filippov a écrit :
>>> >
>>> >> OpenCores 10/100 Mbps MAC does not support speeds above 100 Mbps, but
>>> >> does
>>> >> not disable advertisement when PHY supports them. This results in
>>> >> non-functioning network when the MAC is connected to a gigabit PHY
>>> >> connected
>>> >> to a gigabit switch.
>>> >>
>>> >> The fix is to disable gigabit speed advertisement on attached PHY
>>> >> unconditionally.
>>> >>
>>> >> Signed-off-by: Max Filippov 
>>> >> ---
>>> >> Changes v1->v2:
>>> >> - disable both gigabit advertisement and support.
>>> >>
>>> >>   drivers/net/ethernet/ethoc.c | 8 
>>> >>   1 file changed, 8 insertions(+)
>>> >>
>>> >> diff --git a/drivers/net/ethernet/ethoc.c
>>> >> b/drivers/net/ethernet/ethoc.c
>>> >> index 4de8cfd..5643b2d 100644
>>> >> --- a/drivers/net/ethernet/ethoc.c
>>> >> +++ b/drivers/net/ethernet/ethoc.c
>>> >> @@ -688,6 +688,14 @@ static int ethoc_mdio_probe(struct net_device
>>> >> *dev)
>>> >> }
>>> >>
>>> >> priv->phy = phy;
>>> >> +   phy_update_advert(phy,
>>> >> + ADVERTISED_1000baseT_Full |
>>> >> + ADVERTISED_1000baseT_Half, 0);
>>> >> +   phy_start_aneg(phy);
>>> >
>>> >
>>> > This does not look necessary, you should not have to call
>>> > phy_start_aneg()
>>> > because the PHY state machine is not yet started, at best this calls
>>> > does
>>> > nothing.
>>>
>>> This call actually makes the whole thing work, because otherwise once
>>> gigabit
>>> support is cleared from the supported mask genphy_config_advert does not
>>> update gigabit advertisement register, leaving it enabled.
>>
>> OK, then we need to figure out what is wrong with ethoc since this is
>> unusual.
>
> Maybe they boot up with gigabit advertisement disabled in their PHY
> and thus they don't see the problem?
>
>> Other drivers do the following:
>>
>> - connect to the PHY
>> - phydev->supported = PHY_BASIC_FEATURES
>> - phydev->advertising &= phydev->supported
>> - start the PHY state machine
>>
>> And they work just fine. Is the PHY driver you are bound to the "Generic
>> PHY" or something else which does something funky in config_aneg()?
>
> It's marvell 88E from the KC-705 board, but the behaviour doesn't
> change if I disable it and the generic phy is used.

Florian,

I don't see how the generic genphy_config_advert can ever change
gigabit advertisement if phydev->supported has gigabit speeds masked
off. So I'm pretty sure that other 10/100 cards would exhibit the same
issue if their PHY started with gigabit advertisement enabled. Maybe
we need to fix those other drivers? Or maybe we need to track what
PHY really supports vs. what we report it supports, so that gigabit
advertisement could be changed even when the PHY no longer
appears to support gigabit?

-- 
Thanks.
-- Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/4] net: ethoc: implement ethtool get/set settings

2014-01-30 Thread Max Filippov
Signed-off-by: Max Filippov 
---
Changes v1->v2:
- fix {get,set}_settings return code in case there's no PHY.

 drivers/net/ethernet/ethoc.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 0623c20..779d3c3 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -890,7 +890,31 @@ out:
return NETDEV_TX_OK;
 }
 
+static int ethoc_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+   struct ethoc *priv = netdev_priv(dev);
+   struct phy_device *phydev = priv->phy;
+
+   if (!phydev)
+   return -EOPNOTSUPP;
+
+   return phy_ethtool_gset(phydev, cmd);
+}
+
+static int ethoc_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+   struct ethoc *priv = netdev_priv(dev);
+   struct phy_device *phydev = priv->phy;
+
+   if (!phydev)
+   return -EOPNOTSUPP;
+
+   return phy_ethtool_sset(phydev, cmd);
+}
+
 const struct ethtool_ops ethoc_ethtool_ops = {
+   .get_settings = ethoc_get_settings,
+   .set_settings = ethoc_set_settings,
.get_link = ethtool_op_get_link,
.get_ts_info = ethtool_op_get_ts_info,
 };
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/4] OpenCores 10/100 MAC ethtool operations

2014-01-30 Thread Max Filippov
Hello David, Ben, Florian and everybody,

this series implements ethtool callbacks for the ethoc driver as was
requested by Florian.

Changes v1->v2:
- fix {get,set}_settings return code in case there's no PHY;
- fix set_ringparam: check ring sizes, change ring sizes on the fly.

Max Filippov (4):
  net: ethoc: implement basic ethtool operations
  net: ethoc: implement ethtool get/set settings
  net: ethoc: implement ethtool get registers
  net: ethoc: implement ethtool get/set ring parameters

 drivers/net/ethernet/ethoc.c | 101 +++
 1 file changed, 101 insertions(+)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/4] net: ethoc: implement ethtool get registers

2014-01-30 Thread Max Filippov
Signed-off-by: Max Filippov 
Reviewed-by: Florian Fainelli 
Reviewed-by: Ben Hutchings 
---
 drivers/net/ethernet/ethoc.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 779d3c3..5da32a7 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -51,6 +51,7 @@ MODULE_PARM_DESC(buffer_size, "DMA buffer allocation size");
 #defineETH_HASH0   0x48
 #defineETH_HASH1   0x4c
 #defineETH_TXCTRL  0x50
+#defineETH_END 0x54
 
 /* mode register */
 #defineMODER_RXEN  (1 <<  0) /* receive enable */
@@ -912,9 +913,28 @@ static int ethoc_set_settings(struct net_device *dev, 
struct ethtool_cmd *cmd)
return phy_ethtool_sset(phydev, cmd);
 }
 
+static int ethoc_get_regs_len(struct net_device *netdev)
+{
+   return ETH_END;
+}
+
+static void ethoc_get_regs(struct net_device *dev, struct ethtool_regs *regs,
+  void *p)
+{
+   struct ethoc *priv = netdev_priv(dev);
+   u32 *regs_buff = p;
+   unsigned i;
+
+   regs->version = 0;
+   for (i = 0; i < ETH_END / sizeof(u32); ++i)
+   regs_buff[i] = ethoc_read(priv, i * sizeof(u32));
+}
+
 const struct ethtool_ops ethoc_ethtool_ops = {
.get_settings = ethoc_get_settings,
.set_settings = ethoc_set_settings,
+   .get_regs_len = ethoc_get_regs_len,
+   .get_regs = ethoc_get_regs,
.get_link = ethtool_op_get_link,
.get_ts_info = ethtool_op_get_ts_info,
 };
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/4] net: ethoc: implement basic ethtool operations

2014-01-30 Thread Max Filippov
The following methods are implemented:
- get link state (standard implementation);
- get timestamping info (standard implementation).

Signed-off-by: Max Filippov 
Reviewed-by: Florian Fainelli 
Reviewed-by: Ben Hutchings 
---
 drivers/net/ethernet/ethoc.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 4de8cfd..0623c20 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -890,6 +890,11 @@ out:
return NETDEV_TX_OK;
 }
 
+const struct ethtool_ops ethoc_ethtool_ops = {
+   .get_link = ethtool_op_get_link,
+   .get_ts_info = ethtool_op_get_ts_info,
+};
+
 static const struct net_device_ops ethoc_netdev_ops = {
.ndo_open = ethoc_open,
.ndo_stop = ethoc_stop,
@@ -,6 +1116,7 @@ static int ethoc_probe(struct platform_device *pdev)
netdev->netdev_ops = ðoc_netdev_ops;
netdev->watchdog_timeo = ETHOC_TIMEOUT;
netdev->features |= 0;
+   netdev->ethtool_ops = ðoc_ethtool_ops;
 
/* setup NAPI */
netif_napi_add(netdev, &priv->napi, ethoc_poll, 64);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/4] net: ethoc: implement ethtool get/set ring parameters

2014-01-30 Thread Max Filippov
TX and RX rings share memory and descriptors. Maximal number of
descriptors reported is one less than the total available nuber of
descriptors. For the set operation the requested number of TX descriptors
is rounded down to the nearest power of two (driver logic requirement).

Signed-off-by: Max Filippov 
---
Changes v1->v2:
- fix set_ringparam: check ring sizes, change ring sizes on the fly.

 drivers/net/ethernet/ethoc.c | 51 
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 5da32a7..f9c1cf5 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -180,6 +180,7 @@ MODULE_PARM_DESC(buffer_size, "DMA buffer allocation size");
  * @membase:   pointer to buffer memory region
  * @dma_alloc: dma allocated buffer size
  * @io_region_size:I/O memory region size
+ * @num_bd:number of buffer descriptors
  * @num_tx:number of send buffers
  * @cur_tx:last send buffer written
  * @dty_tx:last buffer actually sent
@@ -200,6 +201,7 @@ struct ethoc {
int dma_alloc;
resource_size_t io_region_size;
 
+   unsigned int num_bd;
unsigned int num_tx;
unsigned int cur_tx;
unsigned int dty_tx;
@@ -930,12 +932,60 @@ static void ethoc_get_regs(struct net_device *dev, struct 
ethtool_regs *regs,
regs_buff[i] = ethoc_read(priv, i * sizeof(u32));
 }
 
+static void ethoc_get_ringparam(struct net_device *dev,
+   struct ethtool_ringparam *ring)
+{
+   struct ethoc *priv = netdev_priv(dev);
+
+   ring->rx_max_pending = priv->num_bd - 1;
+   ring->rx_mini_max_pending = 0;
+   ring->rx_jumbo_max_pending = 0;
+   ring->tx_max_pending = priv->num_bd - 1;
+
+   ring->rx_pending = priv->num_rx;
+   ring->rx_mini_pending = 0;
+   ring->rx_jumbo_pending = 0;
+   ring->tx_pending = priv->num_tx;
+}
+
+static int ethoc_set_ringparam(struct net_device *dev,
+  struct ethtool_ringparam *ring)
+{
+   struct ethoc *priv = netdev_priv(dev);
+
+   if (ring->tx_pending < 1 || ring->rx_pending < 1 ||
+   ring->tx_pending + ring->rx_pending > priv->num_bd)
+   return -EINVAL;
+   if (ring->rx_mini_pending || ring->rx_jumbo_pending)
+   return -EINVAL;
+
+   if (netif_running(dev)) {
+   netif_tx_disable(dev);
+   ethoc_disable_rx_and_tx(priv);
+   ethoc_disable_irq(priv, INT_MASK_TX | INT_MASK_RX);
+   synchronize_irq(dev->irq);
+   }
+
+   priv->num_tx = rounddown_pow_of_two(ring->tx_pending);
+   priv->num_rx = ring->rx_pending;
+   ethoc_init_ring(priv, dev->mem_start);
+
+   if (netif_running(dev)) {
+   ethoc_enable_irq(priv, INT_MASK_TX | INT_MASK_RX);
+   ethoc_enable_rx_and_tx(priv);
+   netif_wake_queue(dev);
+   }
+   return 0;
+}
+
 const struct ethtool_ops ethoc_ethtool_ops = {
.get_settings = ethoc_get_settings,
.set_settings = ethoc_set_settings,
.get_regs_len = ethoc_get_regs_len,
.get_regs = ethoc_get_regs,
.get_link = ethtool_op_get_link,
+   .get_ringparam = ethoc_get_ringparam,
+   .set_ringparam = ethoc_set_ringparam,
.get_ts_info = ethtool_op_get_ts_info,
 };
 
@@ -1065,6 +1115,7 @@ static int ethoc_probe(struct platform_device *pdev)
ret = -ENODEV;
goto error;
}
+   priv->num_bd = num_bd;
/* num_tx must be a power of two */
priv->num_tx = rounddown_pow_of_two(num_bd >> 1);
priv->num_rx = num_bd - priv->num_tx;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/22] Rewrite XIP code and add XIP support to ext4

2014-01-30 Thread Ross Zwisler
On Fri, 31 Jan 2014, Dave Chinner wrote:
> The read/write path is broken, Willy. We can't map arbitrary byte
> ranges to the DIO subsystem. I'm now certain that the data
> corruptions I'm seeing are in sub-sector regions from unaligned IOs
> from userspace. We still need to use the buffered IO path for non
> O_DIRECT IO to avoid these problems. I think I've worked out a way
> to short-circuit page cache lookups for the buffered IO path, so
> stay tuned

Hi Dave,

I found an issue that would cause reads to return bad data earlier this week,
and sent a response to "[PATCH v5 22/22] XIP: Add support for unwritten
extents".  Just wanted to make sure you're not running into that issue.  

I'm also currently chasing a write corruption where we lose the data that we
had just written because ext4 thinks the portion of the extent we had just
written needs to be converted from an unwritten extent to a written extent, so
it clears the data to all zeros via:

xip_clear_blocks+0x53/0xd7
ext4_map_blocks+0x306/0x3d9 [ext4]
jbd2__journal_start+0xbd/0x188 [jbd2]
ext4_convert_unwritten_extents+0xf9/0x1ac [ext4]
ext4_direct_IO+0x2ca/0x3a5 [ext4]

This bug can be easily reproduced by fallocating an empty file up to a page,
and then writing into that page.  The first write is essentially lost, and the
page remains all zeros.  Subsequent writes succeed.

I'm still in the process of figuring out exactly why this is happening, but
unfortunately I won't be able to look at again until next week.  I don't know
if it's related to the corruption that you're seeing or not, just wanted to
let you know.

- Ross
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 21:43 -0500, Mikulas Patocka wrote:
> 
> On Thu, 30 Jan 2014, James Bottomley wrote:
> 
> > > A device may be accessed direcly (by opening /dev/sdX) and it creates a 
> > > mapping too - thus, the size of a mapping limits the size of a block 
> > > device.
> > 
> > Right, that's what I suspected below.  We can't damage large block
> > support on filesystems just because of this corner case.
> 
> Devices larger than 16TiB never worked on 32-bit kernel, so this patch 
> isn't damaging anything.

expectations: 32 bit with CONFIG_LBDAF is supposed to be able to do
almost everything 64 bits can

> Note that if you attach a 16TiB block device, don't open it and mount it, 
> it still won't work, because the buffer cache uses the page cache (see the 
> function __find_get_block_slow and the variable "pgoff_t index" - that 
> variable would overflow if the filesystem accessed a buffer beyond 16TiB).

That depends on the layout of the fs metadata.

> > > The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
> > > fix it - but there may be some hidden places where pgoff is converted to 
> > > unsigned long - who knows, if they exist or not?
> > 
> > I don't think we want to do that ... it will make struct page fatter and
> > have knock on impacts in the radix tree code.  To fix this, we need to
> > make the corner case (i.e. opening large block devices without a
> > filesystem) bear the pain.  It sort of looks like we want to do a linear
> > array of mappings of 64TB for the device so the page cache calculations
> > don't overflow.
> 
> The code that reads and writes data to block devices and files is shared - 
> the functions in mm/filemap.c work for both files and block devices.

Yes.

> So, if you want 64-bit page offsets, you need to increase pgoff_t size, 
> and that will increase the limit for both files and block devices.

No.  The point is the page cache mapping of the device uses a
manufactured inode saved in the backing device. It looks fixable in the
buffer code before the page cache gets involved.

> You shouldn't have separate functions for managing pages on files and 
> separate functions for managing pages on block devices - that would 
> increase code size and cause maintenance problems.

It wouldn't it would add structure to the buffer cache for large
devices.

> > > Though, we need to know if the people who designed memory management 
> > > agree 
> > > with changing pgoff_t to 64 bits.
> > 
> > I don't think we can change the size of pgoff_t ... because it won't
> > just be that, it will be other problems like the radix tree.
> 
> If we can't change it, then we must stay with the current 16TiB limit. 
> There's no other way.
> 
> > However, you also have to bear in mind that truncating large block
> > device support to 64TB on 32 bits is a technical ABI break.  Hopefully
> > it is only technical because I don't know of any current consumer block
> > device that is 64TB yet, but anyone who'd created a filesystem >64TB
> > would find it no-longer mounted on 32 bits.
> > James
> 
> It is not ABI break, because block devices larger than 16TiB never worked 
> on 32-bit architectures. So it's better to refuse them outright, than to 
> cause subtle lockups or data corruption.

An ABI is a contract between the userspace and the kernel.  Saying we
can remove a clause in the contract because no-one ever exercised it and
not call it changing the contract is sophistry.  The correct thing to do
would be to call it a bug and fix it.

In a couple of short years we'll be over 16TB for hard drives.  I don't
really want to be the one explaining to the personal storage people that
the only way to install a 16+TB drive in their arm (or quark) based
Linux systems is a processor upgrade.

I suppose there are a couple of possibilities: pgoff_t + radix tree
expansion or double radix tree in the buffer code.  This should probably
be taken to fsdevel where they might have better ideas.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Pls Confirm

2014-01-30 Thread Tete Appiah


Hello,
I am Tete Appiah Head, Personnel and Business Banking in
Ghana. In 2010, A customer made a fixed Number Deposit $14.6Million
This investor died four years ago leaving no WILL.This is an opportunity
for me to claim the said fund through a trustworthy foreigner, If
interested I will give you more details as soon as i received your
respond.
I wait your response,
Tete Appiah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] retry hw init when it fails

2014-01-30 Thread Greg KH
On Fri, Jan 31, 2014 at 12:16:23AM -0500, Olivier Langlois wrote:
> rtl_ps_enable_nic() is called from loops that will loop until this function 
> returns true or a
> maximum number of retries is performed.
> 
> hw_init() returns non-zero on error. In that situation return false to
> restore the original design intent to retry hw init when it fails.
> 
> Signed-off-by: Olivier Langlois 
> ---
>  drivers/net/wireless/rtlwifi/ps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)




This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.



Same goes for aptch 1/2 as well...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] regulator: gpio: bugfix: add gpios-status for DT

2014-01-30 Thread Kuninori Morimoto
From: Kuninori Morimoto 

config->gpios[x].flags indicates initial pin status,
and it will be used for drvdata->state
on gpio_regulator_probe().
But, current of_get_gpio_regulator_config() doesn't care
about this flags.
This patch adds new gpios-status property in order to
care about initial pin status.

Signed-off-by: Kuninori Morimoto 
---
 .../bindings/regulator/gpio-regulator.txt  |1 +
 drivers/regulator/gpio-regulator.c |   11 +++
 2 files changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/gpio-regulator.txt 
b/Documentation/devicetree/bindings/regulator/gpio-regulator.txt
index 63c6598..3ecb585 100644
--- a/Documentation/devicetree/bindings/regulator/gpio-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/gpio-regulator.txt
@@ -8,6 +8,7 @@ Required properties:
 Optional properties:
 - enable-gpio  : GPIO to use to enable/disable the regulator.
 - gpios: GPIO group used to control voltage.
+- gpios-states : gpios pin's initial states. 1 means HIGH
 - startup-delay-us : Startup time in microseconds.
 - enable-active-high   : Polarity of GPIO is active high (default is low).
 
diff --git a/drivers/regulator/gpio-regulator.c 
b/drivers/regulator/gpio-regulator.c
index c0a1d00..7c8e37a 100644
--- a/drivers/regulator/gpio-regulator.c
+++ b/drivers/regulator/gpio-regulator.c
@@ -172,11 +172,22 @@ of_get_gpio_regulator_config(struct device *dev, struct 
device_node *np)
if (!config->gpios)
return ERR_PTR(-ENOMEM);
 
+   prop = of_find_property(np, "gpios-states", NULL);
+   if (prop) {
+   proplen = prop->length / sizeof(int);
+   if (proplen != config->nr_gpios) {
+   /* gpios <-> gpios-states mismatch */
+   prop = NULL;
+   }
+   }
+
for (i = 0; i < config->nr_gpios; i++) {
gpio = of_get_named_gpio(np, "gpios", i);
if (gpio < 0)
break;
config->gpios[i].gpio = gpio;
+   if (prop && be32_to_cpup((int *)prop->value + i))
+   config->gpios[i].flags = GPIOF_OUT_INIT_HIGH;
}
 
/* Fetch states. */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] rtl8192ce is disabling for too long the irqs

2014-01-30 Thread Olivier Langlois
rtl8192ce is disabling for too long the local interrupts during hw 
initiatialisation when performing scans

The observable symptoms in dmesg can be:

- underruns from ALSA playback
- clock freezes (tstamps do not change for several dmesg entries until irqs are 
finaly reenabled):

[  250.817669] rtlwifi:rtl_op_config():<0-0-0> 0x100
[  250.817685] rtl8192ce:_rtl92ce_phy_set_rf_power_state():<0-1-0> IPS Set eRf 
nic enable
[  250.817732] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.817796] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.817910] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818024] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818139] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818253] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818367] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[  250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:98053f15:10
[  250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
[  250.818472] rtl8192c_common:rtl92c_download_fw():<0-1-0> Firmware 
Version(49), Signature(0x88c1),Size(32)
[  250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> 
PairwiseEncAlgorithm = 0 GroupEncAlgorithm = 0
[  250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> The 
SECR-value cc
[  250.818472] 
rtl8192c_common:rtl92c_dm_check_txpower_tracking_thermal_meter():<0-1-0> 
Schedule TxPowerTracking direct call!!
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
rtl92c_dm_txpower_tracking_callback_thermalmeter
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Initial pathA ele_d reg0xc80 = 0x4000, ofdm_index=0xc
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Initial reg0xa24 = 0x90e1317, cck_index=0xc, ch14 0
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> 
Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf 
delta 0x1 delta_lck 0x0 delta_iqk 0x0
[  250.818472] 
rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> <===
[  250.818472] 
rtl8192c_common:rtl92c_dm_initialize_txpower_tracking_thermalmeter():<0-1-0> 
pMgntInfo->txpower_tracking = 1
[  250.818472] rtl8192ce:rtl92ce_led_control():<0-1-0> ledaction 3
[  250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
[  250.818472] rtlwifi:rtl_ips_nic_on():<0-1-0> before spin_unlock_irqrestore
[  251.154656] PCM: Lost interrupts? [Q]-0 (stream=0, delta=15903, 
new_hw_ptr=293408, old_hw_ptr=277505)

The exact code flow that causes that is:

1. wpa_supplicant send a start_scan request to the nl80211 driver
2. mac80211 module call rtl_op_config with IEEE80211_CONF_CHANGE_IDLE
3.   rtl_ips_nic_on is called which disable local irqs
4. rtl92c_phy_set_rf_power_state() is called
5.   rtl_ps_enable_nic() is called and hw_init()is executed and then the 
interrupts on the device are enabled

A good solution could be to refactor the code to avoid calling 
rtl92ce_hw_init() with the irqs disabled
but a quick and dirty solution that has proven to work is
to reenable the irqs during the function rtl92ce_hw_init().

I think that it is safe doing so since the device interrupt will only be 
enabled after the init function succeed.

Signed-off-by: Olivier Langlois 
---
 drivers/net/wireless/rtlwifi/rtl8192ce/hw.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c 
b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
index a82b30a..2eb0b38 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
@@ -937,14 +937,26 @@ int rtl92ce_hw_init(struct ieee80211_hw *hw)
bool is92c;
int err;
u8 tmp_u1b;
+   unsigned long flags;
 
rtlpci->being_init_adapter = true;
+
+   /* Since this function can take a very long time (up to 350 ms)
+* and can be called with irqs disabled, reenable the irqs
+* to let the other devices continue being serviced.
+*
+* It is safe doing so since our own interrupts will only be enabled
+* in a subsequent step.
+*/
+   local_save_flags(flags);
+   local_irq_enable();
+
rtlpriv->intf_ops->disable_aspm(hw);
rtstatus = _rtl92ce_init_mac(hw);
if (

Re: [PATCH] rtl8192ce is disabling the irqs for too long

2014-01-30 Thread Olivier Langlois
On Thu, 2014-01-30 at 14:19 -0600, Larry Finger wrote:
> On 01/30/2014 12:22 AM, Olivier Langlois wrote:
> > Signed-off-by: Olivier Langlois 
> > ---
> >   drivers/net/wireless/rtlwifi/ps.c   |  2 +-
> >   drivers/net/wireless/rtlwifi/rtl8192ce/hw.c | 18 --
> >   2 files changed, 17 insertions(+), 3 deletions(-)
> 
> Olivier,
> 
> I like the fact that you have proposed a solution that has minimal effect on 
> the 
> other PCI drivers under rtlwifi. That certainly decreases the amount of 
> testing 
> needed. Of course, all of them may need the same kind of fix.

This has been done on purpose. I knew that you would appreciate. The
problem is probably present with the other cards but since I could not
validate or test the patch with them, I have left them unchanged. If you
were to determine that it apply to all of them then maybe the patch
could be moved to ps.c to wrap hw_init() call.

> 
> The changes are certainly minimal; however, I do need to test them before 
> giving 
> my Ack.
> 
> As the problem(s) fixed by this patch will affect stable kernels, you should 
> add 
> a "Cc: Stable " patch.
> 
> Originally, I was going to have you add a comment to the commit message on 
> why 
> you were changing the return value in rtl_ps_enable_nic(), but now I am 
> thinking 
> that this should be a separate commit as it fixes a totally different bug. 
> Yes 
> it is involved with the callback to hw_init, but the bug is independent.

done. I am about to resend the patch splitted into 2 commits.
> 
> Good work on finding this problem.
> 
my pleasure. it has been fun and gratifying.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] retry hw init when it fails

2014-01-30 Thread Olivier Langlois
rtl_ps_enable_nic() is called from loops that will loop until this function 
returns true or a
maximum number of retries is performed.

hw_init() returns non-zero on error. In that situation return false to
restore the original design intent to retry hw init when it fails.

Signed-off-by: Olivier Langlois 
---
 drivers/net/wireless/rtlwifi/ps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rtlwifi/ps.c 
b/drivers/net/wireless/rtlwifi/ps.c
index 0d81f76..a56e9b3 100644
--- a/drivers/net/wireless/rtlwifi/ps.c
+++ b/drivers/net/wireless/rtlwifi/ps.c
@@ -48,7 +48,7 @@ bool rtl_ps_enable_nic(struct ieee80211_hw *hw)
 
/*<2> Enable Adapter */
if (rtlpriv->cfg->ops->hw_init(hw))
-   return 1;
+   return false;
RT_CLEAR_PS_LEVEL(ppsc, RT_RF_OFF_LEVL_HALT_NIC);
 
/*<3> Enable Interrupt */
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/5] spi: sunxi: Add Allwinner A31 SPI controller driver

2014-01-30 Thread Kevin Hilman
On Thu, Jan 30, 2014 at 6:29 PM, Felipe Balbi  wrote:
> Hi,
>
> On Thu, Jan 30, 2014 at 03:52:16PM -0800, Kevin Hilman wrote:
>> On Wed, Jan 29, 2014 at 5:32 AM, Maxime Ripard
>>  wrote:
>> > On Wed, Jan 29, 2014 at 12:25:20PM +, Mark Brown wrote:
>> >> On Wed, Jan 29, 2014 at 12:10:48PM +0100, Maxime Ripard wrote:
>> >>
>> >> > +config SPI_SUN6I
>> >> > +   tristate "Allwinner A31 SPI controller"
>> >> > +   depends on ARCH_SUNXI || COMPILE_TEST
>> >> > +   select PM_RUNTIME
>> >> > +   help
>> >> > + This enables using the SPI controller on the Allwinner A31 SoCs.
>> >> > +
>> >>
>> >> A select of PM_RUNTIME is both surprising and odd - why is that there?
>> >> The usual idiom is that the device starts out powered up (flagged using
>> >> pm_runtime_set_active()) and then runtime PM then suspends it when it's
>> >> compiled in.  That way if for some reason people want to avoid runtime
>> >> PM they can still use the device.
>> >
>> > Since pm_runtime_set_active and all the pm_runtime* callbacks in
>> > general are defined to pretty much empty functions, how the
>> > suspend/resume callbacks are called then? Obviously, we need them to
>> > be run, hence why I added the select here, but now I'm seeing a
>> > construct like what's following acceptable then?
>>
>> Even with your 'select', The runtime PM callbacks will never be called
>> in the current driver.  pm_runtime_enable() doesn't do any runtime PM
>> transitions.  It just allows transitions to happen when they're
>> triggered by _get()/_put()/etc.
>>
>> > pm_runtime_enable(&pdev->dev);
>> > if (!pm_runtime_enabled(&pdev->dev))
>> >sun6i_spi_runtime_resume(&pdev->dev);
>>
>> Similarily here, it's not the pm_runtime_enable that will fail when
>> runtime PM is disabled (or not built-in), it's a pm_runtime_get_sync()
>> that will fail.
>>
>> What you want is something like this in ->probe()
>>
>>sun6i_spi_runtime_resume();
>>/* now, device is always activated whether or not runtime PM is enabled */
>>pm_runtime_enable();
>>pm_runtime_set_active();  /* tells runtime PM core device is
>> already active */
>
> shouldn't this be done before pm_runtime_enable() ?

hmm, possibly yes.  I was doing this from the top of my head without
looking to closely at the code.

>>pm_runtime_get_sync();
>>
>> This 'get' will increase the usecount, but not actually call the
>> callbacks because we told the RPM core that the device was already
>> activated with _set_active().
>>
>> And then, in ->remove(), you'll want
>>
>>pm_runtime_put();
>
> in ->remove() you actually want a put_sync() right ? You don't want to
> schedule anything since you're just about to disable pm_runtime.

Yes, you're correct.

Thanks for the corrections.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] powerpc: Free up an IPI message slot for tick broadcast IPIs

2014-01-30 Thread Preeti U Murthy
This patchset is a precursor for enabling deep idle states on powerpc,
when the local CPU timers stop. The tick broadcast framework in
the Linux Kernel today handles wakeup of such CPUs at their next timer event
by using an external clock device. At the expiry of this clock device, IPIs
are sent to the CPUs in deep idle states  so that they wakeup to handle their
respective timers. This patchset frees up one of the IPI slots on powerpc
so as to be used to handle the tick broadcast IPI.

On certain implementations of powerpc, such an external clock device is absent.
Adding support to the tick broadcast framework to handle wakeup of CPUs from
deep idle states on such implementations is currently under discussion.
https://lkml.org/lkml/2014/1/15/86
https://lkml.org/lkml/2014/1/24/28

Either way this patchset is essential to enable handling the tick broadcast 
IPIs.
---

Preeti U Murthy (1):
  cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt 
handling routines

Srivatsa S. Bhat (2):
  powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
  powerpc: Implement tick broadcast IPI as a fixed IPI message


 arch/powerpc/include/asm/smp.h  |2 -
 arch/powerpc/include/asm/time.h |1 
 arch/powerpc/kernel/smp.c   |   23 ++--
 arch/powerpc/kernel/time.c  |   86 ++-
 arch/powerpc/platforms/cell/interrupt.c |2 -
 arch/powerpc/platforms/ps3/smp.c|2 -
 6 files changed, 71 insertions(+), 45 deletions(-)

-- 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines

2014-01-30 Thread Preeti U Murthy
From: Preeti U Murthy 

Split timer_interrupt(), which is the local timer interrupt handler on ppc
into routines called during regular interrupt handling and __timer_interrupt(),
which takes care of running local timers and collecting time related stats.

This will enable callers interested only in running expired local timers to
directly call into __timer_interupt(). One of the use cases of this is the
tick broadcast IPI handling in which the sleeping CPUs need to handle the local
timers that have expired.

Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/kernel/time.c |   81 +---
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3ff97db..df2989b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -478,6 +478,47 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+void __timer_interrupt(void)
+{
+   struct pt_regs *regs = get_irq_regs();
+   u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+   struct clock_event_device *evt = &__get_cpu_var(decrementers);
+   u64 now;
+
+   trace_timer_interrupt_entry(regs);
+
+   if (test_irq_work_pending()) {
+   clear_irq_work_pending();
+   irq_work_run();
+   }
+
+   now = get_tb_or_rtc();
+   if (now >= *next_tb) {
+   *next_tb = ~(u64)0;
+   if (evt->event_handler)
+   evt->event_handler(evt);
+   __get_cpu_var(irq_stat).timer_irqs_event++;
+   } else {
+   now = *next_tb - now;
+   if (now <= DECREMENTER_MAX)
+   set_dec((int)now);
+   /* We may have raced with new irq work */
+   if (test_irq_work_pending())
+   set_dec(1);
+   __get_cpu_var(irq_stat).timer_irqs_others++;
+   }
+
+#ifdef CONFIG_PPC64
+   /* collect purr register values often, for accurate calculations */
+   if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+   struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+   cu->current_tb = mfspr(SPRN_PURR);
+   }
+#endif
+
+   trace_timer_interrupt_exit(regs);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
@@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs)
 {
struct pt_regs *old_regs;
u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
-   struct clock_event_device *evt = &__get_cpu_var(decrementers);
-   u64 now;
 
/* Ensure a positive value is written to the decrementer, or else
 * some CPUs will continue to take decrementer exceptions.
@@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs)
old_regs = set_irq_regs(regs);
irq_enter();
 
-   trace_timer_interrupt_entry(regs);
-
-   if (test_irq_work_pending()) {
-   clear_irq_work_pending();
-   irq_work_run();
-   }
-
-   now = get_tb_or_rtc();
-   if (now >= *next_tb) {
-   *next_tb = ~(u64)0;
-   if (evt->event_handler)
-   evt->event_handler(evt);
-   __get_cpu_var(irq_stat).timer_irqs_event++;
-   } else {
-   now = *next_tb - now;
-   if (now <= DECREMENTER_MAX)
-   set_dec((int)now);
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
-   __get_cpu_var(irq_stat).timer_irqs_others++;
-   }
-
-#ifdef CONFIG_PPC64
-   /* collect purr register values often, for accurate calculations */
-   if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
-   cu->current_tb = mfspr(SPRN_PURR);
-   }
-#endif
-
-   trace_timer_interrupt_exit(regs);
-
+   __timer_interrupt();
irq_exit();
set_irq_regs(old_regs);
 }
@@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode 
mode,
 /* Interrupt handler for the timer broadcast IPI */
 void tick_broadcast_ipi_handler(void)
 {
+   u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+
+   *next_tb = get_tb_or_rtc();
+   __timer_interrupt();
 }
 
 static void register_decrementer_clockevent(int cpu)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message

2014-01-30 Thread Preeti U Murthy
From: Srivatsa S. Bhat 

The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE map
to a common implementation - generic_smp_call_function_single_interrupt(). So,
we can consolidate them and save one of the IPI message slots, (which are
precious on powerpc, since only 4 of those slots are available).

So, implement the functionality of PPC_MSG_CALL_FUNC_SINGLE using
PPC_MSG_CALL_FUNC itself and release its IPI message slot, so that it can be
used for something else in the future, if desired.

Signed-off-by: Srivatsa S. Bhat 
Signed-off-by: Preeti U. Murthy 
Acked-by: Geoff Levand  [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h  |2 +-
 arch/powerpc/kernel/smp.c   |   12 +---
 arch/powerpc/platforms/cell/interrupt.c |2 +-
 arch/powerpc/platforms/ps3/smp.c|2 +-
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 084e080..9f7356b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE  1
-#define PPC_MSG_CALL_FUNC_SINGLE   2
+#define PPC_MSG_UNUSED 2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..ee7d76b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -145,9 +145,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
return IRQ_HANDLED;
 }
 
-static irqreturn_t call_function_single_action(int irq, void *data)
+static irqreturn_t unused_action(int irq, void *data)
 {
-   generic_smp_call_function_single_interrupt();
+   /* This slot is unused and hence available for use, if needed */
return IRQ_HANDLED;
 }
 
@@ -168,14 +168,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
[PPC_MSG_CALL_FUNCTION] =  call_function_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
-   [PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
+   [PPC_MSG_UNUSED] = unused_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
[PPC_MSG_CALL_FUNCTION] =  "ipi call function",
[PPC_MSG_RESCHEDULE] = "ipi reschedule",
-   [PPC_MSG_CALL_FUNC_SINGLE] = "ipi call function single",
+   [PPC_MSG_UNUSED] = "ipi unused",
[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
 };
 
@@ -251,8 +251,6 @@ irqreturn_t smp_ipi_demux(void)
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
scheduler_ipi();
-   if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNC_SINGLE))
-   generic_smp_call_function_single_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
debug_ipi_action(0, NULL);
} while (info->messages);
@@ -280,7 +278,7 @@ EXPORT_SYMBOL_GPL(smp_send_reschedule);
 
 void arch_send_call_function_single_ipi(int cpu)
 {
-   do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
+   do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/powerpc/platforms/cell/interrupt.c 
b/arch/powerpc/platforms/cell/interrupt.c
index 2d42f3b..adf3726 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
 {
iic_request_ipi(PPC_MSG_CALL_FUNCTION);
iic_request_ipi(PPC_MSG_RESCHEDULE);
-   iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
+   iic_request_ipi(PPC_MSG_UNUSED);
iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
 }
 
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 4b35166..00d1a7c 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
 
BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION!= 0);
BUILD_BUG_ON(PPC_MSG_RESCHEDULE   != 1);
-   BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
+   BUILD_BUG_ON(PPC_MSG_UNUSED   != 2);
BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK   != 3);
 
for (i = 0; i < MSG_COUNT; i++) {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] powerpc: Implement tick broadcast IPI as a fixed IPI message

2014-01-30 Thread Preeti U Murthy
From: Srivatsa S. Bhat 

For scalability and performance reasons, we want the tick broadcast IPIs
to be handled as efficiently as possible. Fixed IPI messages
are one of the most efficient mechanisms available - they are faster than
the smp_call_function mechanism because the IPI handlers are fixed and hence
they don't involve costly operations such as adding IPI handlers to the target
CPU's function queue, acquiring locks for synchronization etc.

Luckily we have an unused IPI message slot, so use that to implement
tick broadcast IPIs efficiently.

Signed-off-by: Srivatsa S. Bhat 
[Functions renamed to tick_broadcast* and Changelog modified by
 Preeti U. Murthy]
Signed-off-by: Preeti U. Murthy 
Acked-by: Geoff Levand  [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h  |2 +-
 arch/powerpc/include/asm/time.h |1 +
 arch/powerpc/kernel/smp.c   |   19 +++
 arch/powerpc/kernel/time.c  |5 +
 arch/powerpc/platforms/cell/interrupt.c |2 +-
 arch/powerpc/platforms/ps3/smp.c|2 +-
 6 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9f7356b..ff51046 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE  1
-#define PPC_MSG_UNUSED 2
+#define PPC_MSG_TICK_BROADCAST 2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..1d428e6 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -28,6 +28,7 @@ extern struct clock_event_device decrementer_clockevent;
 struct rtc_time;
 extern void to_tm(int tim, struct rtc_time * tm);
 extern void GregorianDay(struct rtc_time *tm);
+extern void tick_broadcast_ipi_handler(void);
 
 extern void generic_calibrate_decr(void);
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7d76b..6f06f05 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -145,9 +146,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
return IRQ_HANDLED;
 }
 
-static irqreturn_t unused_action(int irq, void *data)
+static irqreturn_t tick_broadcast_ipi_action(int irq, void *data)
 {
-   /* This slot is unused and hence available for use, if needed */
+   tick_broadcast_ipi_handler();
return IRQ_HANDLED;
 }
 
@@ -168,14 +169,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
[PPC_MSG_CALL_FUNCTION] =  call_function_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
-   [PPC_MSG_UNUSED] = unused_action,
+   [PPC_MSG_TICK_BROADCAST] = tick_broadcast_ipi_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
[PPC_MSG_CALL_FUNCTION] =  "ipi call function",
[PPC_MSG_RESCHEDULE] = "ipi reschedule",
-   [PPC_MSG_UNUSED] = "ipi unused",
+   [PPC_MSG_TICK_BROADCAST] = "ipi tick-broadcast",
[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
 };
 
@@ -251,6 +252,8 @@ irqreturn_t smp_ipi_demux(void)
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
scheduler_ipi();
+   if (all & IPI_MESSAGE(PPC_MSG_TICK_BROADCAST))
+   tick_broadcast_ipi_handler();
if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
debug_ipi_action(0, NULL);
} while (info->messages);
@@ -289,6 +292,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask 
*mask)
do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
+void tick_broadcast(const struct cpumask *mask)
+{
+   unsigned int cpu;
+
+   for_each_cpu(cpu, mask)
+   do_message_pass(cpu, PPC_MSG_TICK_BROADCAST);
+}
+
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 void smp_send_debugger_break(void)
 {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3dab20..3ff97db 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -825,6 +825,11 @@ static void decrementer_set_mode(enum clock_event_mode 
mode,
decrementer_set_next_event(DECREMENTER_MAX, dev);
 }
 
+/* Interrupt handler for the timer broadcast IPI */
+void tick_broadcast_ipi_handler(void)
+{
+}
+
 static void register_decrementer_clockevent(int cpu)
 {
struct clock_event_device *dec = &per_cpu(decrementers, cpu);
diff --git a/arch/powerpc/platforms/cell/interrupt.c 
b/arch/powerpc/platforms/cell/interrupt.

Re: [PATCH 4/4] power_supply: bq24261 charger driver

2014-01-30 Thread Jenny Tc
On Thu, Jan 30, 2014 at 06:01:54PM +0100, Pavel Machek wrote:
> Hi!
> 
> > diff --git a/drivers/power/Makefile b/drivers/power/Makefile
> > index 77535fd..6d184c8 100644
> > --- a/drivers/power/Makefile
> > +++ b/drivers/power/Makefile
> > @@ -59,4 +59,5 @@ obj-$(CONFIG_CHARGER_BQ24735) += bq24735-charger.o
> >  obj-$(CONFIG_POWER_AVS)+= avs/
> >  obj-$(CONFIG_CHARGER_SMB347)   += smb347-charger.o
> >  obj-$(CONFIG_CHARGER_TPS65090) += tps65090-charger.o
> > +obj-$(CONFIG_BQ24261_CHARGER)  += bq24261_charger.o
> >  obj-$(CONFIG_POWER_RESET)  += reset/
> 
> I believe I commented on this one before. Spot two inconsistencies.

Sorry, couldn't find two inconsistencies. One is use of _. And the other?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kmem_cache_alloc panic in 3.10+

2014-01-30 Thread dormando
> On Thu, Jan 30, 2014 at 6:16 PM, Eric Dumazet  wrote:
> > On Wed, 2014-01-29 at 23:05 -0800, dormando wrote:
> >
> >> We hit the routing code fairly hard. Any hints for what to look at or how
> >> to instrument it? Or if it's fixed already? It's a real pain to iterate
> >> since it takes ~30 days to crash, usually. Sometimes.
>
> sounds like adding mdelay() didn't help to crash it sooner. Then I don't
> see how my dst fix was causing it to crash more often. Something odd.
> fyi just to check it more thoroughly I've been running with mdelay()
> and config_slub_debug_on for a week without issues.

Sorry, I'm actually trying to deal with two separate crashes at once :/
One is this 3.10.15 one, and one was the regression in 3.10.23 - I haven't
had time to attempt the mdelay test yet. The two crashes have fairly
distinct traces.

For what it's worth though the machines I have with that one patch
reverted are still running fine.

> > I really wonder... it looks like a possible in SLUB. (might be already
> > fixed)
> >
> > Could you try using SLAB instead ?
>
> try config_slub_debug_on=y ? it should catch double free and other things.
>

Any slowdowns/issues with that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/8] phy: Initialize phy core with subsys_initcall

2014-01-30 Thread Pratyush Anand
On Thu, Jan 30, 2014 at 08:44:58PM +0800, Arnd Bergmann wrote:
> On Thursday 30 January 2014, Pratyush Anand wrote:
> > On Thu, Jan 30, 2014 at 07:43:37PM +0800, Kishon Vijay Abraham I wrote:
> > > Hi,
> > > 
> > > On Thursday 30 January 2014 04:18 PM, Mohit Kumar wrote:
> > > > From: Pratyush Anand 
> > > > 
> > > > PCIe RC drivers are initialized with subsys_initcall. Few PCIe drivers
> > > > like SPEAr13xx needs phy drivers to be initialized.
> > > 
> > > Instead change PCIe RC drivers to module init. Phy drivers should be 
> > > loaded
> > > very early otherwise. (Hint: drivers/Makefile).
> > 
> > I think PCIe RC driver can not be made module init. Bjorn can comment
> > better.
> 
> I don't think there is any problem here: the PCI devices will only appear
> after the PCIe root bus has been probed. All drivers using the regular
> pci_driver framework should work fine even if they are loaded before the
> device is found. There are a handful of drivers using 'pci_get_device'
> rather than pci_register_driver, and those will break. As far as I can
> tell, those drivers are all x86 specific, and you should not worry about
> them.
> 
> Having the PHY driver get initialized after the PCI root driver should
> also work, but it requires correct handling of -EPROBE_DEFER: if phy_get

I had issue with phy-core driver getting initialized after pcie rc
driver. I found a kernel crash, as devm_phy_get was called before
phy_class was created. I think this too need to be fixed, we should
not see a crash.

Anyway, I will keep spear phy and rc driver both with module_init and
-EPROBE_DEFER implementation.

Regards
Pratyush
> returns this error, the PCI driver must silently return the same error
> from its probe() function so it will get called again at a later time
> (after some other devices have been probed successfully).
> 
>   Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kmem_cache_alloc panic in 3.10+

2014-01-30 Thread Alexei Starovoitov
On Thu, Jan 30, 2014 at 6:16 PM, Eric Dumazet  wrote:
> On Wed, 2014-01-29 at 23:05 -0800, dormando wrote:
>
>> We hit the routing code fairly hard. Any hints for what to look at or how
>> to instrument it? Or if it's fixed already? It's a real pain to iterate
>> since it takes ~30 days to crash, usually. Sometimes.

sounds like adding mdelay() didn't help to crash it sooner. Then I don't
see how my dst fix was causing it to crash more often. Something odd.
fyi just to check it more thoroughly I've been running with mdelay()
and config_slub_debug_on for a week without issues.

> I really wonder... it looks like a possible in SLUB. (might be already
> fixed)
>
> Could you try using SLAB instead ?

try config_slub_debug_on=y ? it should catch double free and other things.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued

2014-01-30 Thread Jason Low
On Wed, 2014-01-29 at 12:51 +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
> > > But urgh, nasty problem. Lemme ponder this a bit.
> 
> OK, please have a very careful look at the below. It survived a boot
> with udev -- which usually stresses mutex contention enough to explode
> (in fact it did a few time when I got the contention/cancel path wrong),
> however I have not ran anything else on it.

I tested this patch on a 2 socket, 8 core machine with the AIM7 fserver
workload. After 100 users, the system gets soft lockups.

Some condition may be causing threads to not leave the "goto unqueue"
loop. I added a debug counter, and threads were able to reach more than
1,000,000,000 "goto unqueue".

I also was initially thinking if there can be problems when multiple
threads need_resched() and unqueue at the same time. As an example, 2
nodes that need to reschedule are next to each other in the middle of
the MCS queue. The 1st node executes "while (!(next =
ACCESS_ONCE(node->next)))" and exits the while loop because next is not
NULL. Then, the 2nd node execute its "if (cmpxchg(&prev->next, node,
NULL) != node)". We may then end up in a situation where the node before
the 1st node gets linked with the outdated 2nd node.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/22] Rewrite XIP code and add XIP support to ext4

2014-01-30 Thread Dave Chinner
On Thu, Jan 30, 2014 at 08:25:37PM +1100, Dave Chinner wrote:
> On Thu, Jan 30, 2014 at 05:42:30PM +1100, Dave Chinner wrote:
> > On Wed, Jan 15, 2014 at 08:24:18PM -0500, Matthew Wilcox wrote:
> > > This series of patches add support for XIP to ext4.  Unfortunately,
> > > it turns out to be necessary to rewrite the existing XIP support code
> > > first due to races that are unfixable in the current design.
> > > 
> > > Since v4 of this patchset, I've improved the documentation, fixed a
> > > couple of warnings that a newer version of gcc emitted, and fixed a
> > > bug where we would read/write the wrong address for I/Os that were not
> > > aligned to PAGE_SIZE.
> > 
> > Looks like there's something fundamentally broken with the patch set
> > as it stands. I get this same data corruption on both ext4 and XFS
> > with XIP using fsx. It's as basic as it gets - the first read after
> > a mmapped write fails to see the data written by mmap:
> > 
> > $ sudo mkfs.xfs -f /dev/ram0
> > meta-data=/dev/ram0  isize=256agcount=4, agsize=256000 blks
> >  =   sectsz=512   attr=2, projid32bit=1
> >  =   crc=0
> > data =   bsize=4096   blocks=1024000, imaxpct=25
> >  =   sunit=0  swidth=0 blks
> > naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
> > log  =internal log   bsize=4096   blocks=12800, version=2
> >  =   sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none   extsz=4096   blocks=0, rtextents=0
> > $ sudo mount -o xip /dev/ram0 /mnt/scr
> > $ sudo chmod 777 /mnt/scr
> > $ ltp/fsx -d -N 1000 -S 0 /mnt/scr/fsx
> 
> > operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
> > LOG DUMP (9 total operations):
> > 1(  1 mod 256): MAPWRITE 0x3db39 thru 0x3   (0x24c7 bytes)
> > 2(  2 mod 256): MAPREAD  0x2e947 thru 0x33163   (0x481d bytes)
> > 3(  3 mod 256): READ 0x2e836 thru 0x3cba1   (0xe36c bytes)
> > 4(  4 mod 256): PUNCH0x2e7 thru 0x5c42  (0x595c bytes)
> > 5(  5 mod 256): MAPWRITE 0xcaea thru 0x13ba9(0x70c0 bytes)  **
> > 6(  6 mod 256): PUNCH0x31645 thru 0x38d1c   (0x76d8 bytes)
> > 7(  7 mod 256): FALLOC   0x24f92 thru 0x2f2b7   (0xa325 bytes) INTERIOR
> > 8(  8 mod 256): FALLOC   0xbcf1 thru 0x171ac(0xb4bb bytes) INTERIOR 
> > **
> > 9(  9 mod 256): READ 0x126f thru 0x11136(0xfec8 bytes)  ******
> > Correct content saved for comparison
> > (maybe hexdump "/mnt/scr/fsx" vs "/mnt/scr/fsx.fsxgood")
> > 
> > XFS gives a good indication that we aren't doing something correctly
> > w.r.t. mapped XIP writes, as trying to fiemap the file ASSERT fails
> > with a delayed allocation extent somewhere inside the file after a
> > sync. I shall keep digging.
> 
> Ok, I understand the XFS ASSERT failure, but I don't really
> understand the reason for the read failure. XFS assert failed
> because I was using the delayed allocation enabled xfs_get_blocks()
> to xip_fault/xip_mkwrite, so it was creating a delalloc extent
> rather than allocating blocks, and then not having any pages in the
> page cache to write back to convert the delalloc extent. This
> doesn't explain the zeros being read, though.
> 
> So I changed to use the direct IO version, and that leaves me with
> an unwritten extent over the mapped write code. Why? Because there's
> no IO completion being run from either xip_fault() or xip_mkwrite()
> to zero the buffers and run IO completion to convert the extent to
> written
> 
> $ xfs_io -f -c "truncate 8k" -c "mmap 0 8k" -c "mwrite 0 4k" \
> > -c "bmap -vp" -c "pread -v 0 8k" -c "bmap -vp" /mnt/scr/foo
> 
> /mnt/scr/foo:
>  EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
>0: [0..7]:  224..231  0 (224..231)   8 1
>1: [8..15]: hole 8
> $
> 
> We're trying to do something that the get_block callback has never
> supported.  I note that you added zeroing to ext4_map_blocks() when
> an unwritten extent is found and call xip_clear_blocks() from there
> to try and handle this within the allocation context without
> actually making it obvious why it is necessary.
> 
> Essentially what we need get_blocks(create = 1) to do here is this:
> 
>   if (hole)
>   transactionally allocate and zero block in requested region
>   if (unwritten)
>   transactionally convert to written and zero block
>   if (written)
>   map blocks
> 
> I think we can get away with this from a crash recovery perspective
> because the zeroing of the blocks is synchronous and within the
> allocation transaction. I'm implementing a new xfs_get_blocks_xip to
> do keep this new behaviour "separate" from the direct IO path
> semantics.
> 
> I also got rid of the read block map followed by the "create" block
> map. Just a single c

Reverting warning for vmalloc and kmemcheck faults in NMI

2014-01-30 Thread Michel Lespinasse
Hi,

Way back in 2010, Frederic added commit
ebc8827f75954fe315492883eee5cb3f355d547d to warn us about cases where
faults were incorrectly firing during NMI handling on x86, as the IRET
from such faults would possibly trigger nested NMIs.

Later (2012), Salman added commit
28696f434fef0efa97534b59986ad33b9c4df7f8 to enable nested NMI
handling. See http://lwn.net/Articles/484932/

So, I believe such faults nesting under NMI are not an issue anymore,
and we could revert ebc8827f75954fe315492883eee5cb3f355d547d ?

Background: I'm asking this because at Google we like to dump memory
regions pointed by registers in our arch_trigger_all_cpu_backtrace
handler, and this occasionally causes the vmalloc_fault in_nmi()
warning to fire.

Patch is trivial, I can send it up for review if people don't object
to the idea.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: same ext4 file system corruption on different machines

2014-01-30 Thread Theodore Ts'o
On Thu, Jan 30, 2014 at 08:59:09AM +0100, Luca Ognibene wrote:
> Yes it's indeed very strange.. i tend to rule out application errors
> because i don't write directly to the device so i don't think i can
> break a filesystem from userspace. I've checked previous and next blocks
> and they seem ok, only the block 524320 is getting corrupted. Any idea
> on what should i look for now?

Are you willing to try 3.12.9 or 3.13.1 upstream kernel?  Let's see if
changing the kernel makes any difference.  I don't recall any ext4
problems like this, but maybe it's device driver problem.

The other thing I'd ask is whether you can swap out the hard drive
interface --- can you use a USB 3.0 attached drive, or something like
that?

One final thing that you could try doing, depending on how
easily/quickly you can reproduce the problem, is to use blktrace and
see if you can catch who or what is writing to that specific block
which is getting corrupted.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

> > A device may be accessed direcly (by opening /dev/sdX) and it creates a 
> > mapping too - thus, the size of a mapping limits the size of a block 
> > device.
> 
> Right, that's what I suspected below.  We can't damage large block
> support on filesystems just because of this corner case.

Devices larger than 16TiB never worked on 32-bit kernel, so this patch 
isn't damaging anything.

Note that if you attach a 16TiB block device, don't open it and mount it, 
it still won't work, because the buffer cache uses the page cache (see the 
function __find_get_block_slow and the variable "pgoff_t index" - that 
variable would overflow if the filesystem accessed a buffer beyond 16TiB).

> > The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
> > fix it - but there may be some hidden places where pgoff is converted to 
> > unsigned long - who knows, if they exist or not?
> 
> I don't think we want to do that ... it will make struct page fatter and
> have knock on impacts in the radix tree code.  To fix this, we need to
> make the corner case (i.e. opening large block devices without a
> filesystem) bear the pain.  It sort of looks like we want to do a linear
> array of mappings of 64TB for the device so the page cache calculations
> don't overflow.

The code that reads and writes data to block devices and files is shared - 
the functions in mm/filemap.c work for both files and block devices.

So, if you want 64-bit page offsets, you need to increase pgoff_t size, 
and that will increase the limit for both files and block devices.

You shouldn't have separate functions for managing pages on files and 
separate functions for managing pages on block devices - that would 
increase code size and cause maintenance problems.

> > Though, we need to know if the people who designed memory management agree 
> > with changing pgoff_t to 64 bits.
> 
> I don't think we can change the size of pgoff_t ... because it won't
> just be that, it will be other problems like the radix tree.

If we can't change it, then we must stay with the current 16TiB limit. 
There's no other way.

> However, you also have to bear in mind that truncating large block
> device support to 64TB on 32 bits is a technical ABI break.  Hopefully
> it is only technical because I don't know of any current consumer block
> device that is 64TB yet, but anyone who'd created a filesystem >64TB
> would find it no-longer mounted on 32 bits.
> James

It is not ABI break, because block devices larger than 16TiB never worked 
on 32-bit architectures. So it's better to refuse them outright, than to 
cause subtle lockups or data corruption.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: KEYS: Is this too-big a behavioural change for a system call?

2014-01-30 Thread Linus Torvalds
On Thu, Jan 30, 2014 at 9:03 AM, David Howells  wrote:
>
> I've been asked by Kerberos developers to slightly change the behaviour of the
> add_key() and request_key() system calls and a couple of the keyctl() 
> functions
> - and I'm wondering if you'd be okay with it.

So the rule about ABI changes has always been: "If somebody notices
it, we revert it".

IOW, it's not so much that you can't make changes, it's that you
cannot make changes that break programs.

And quite frankly, I have no good way to judge. Your (5) _sounds_
fair, but who knows what odd things some distributions do. We'd have
to be very careful, in *particular* we'd need to make it very very
clear to the krb people that if the change scews over any old users,
it gets reverted with extreme prejudice. At that point, maybe they say
"never mind, we might as well just always make sure we have a session
keyring".

And no, we are *not* going to say "we'll just stop doing this in the
kernel and expect pam_keyinit to do it for us".

But James' suggestion to perhaps have a new version of add_key() with
new semantics could work - if it's worth the pain (because we *would*
have to maintain the old interface basically forever, so it would be
more of a "the new system call doesn't really deprecate the old one,
it just has more convenient semantics").

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-01-30 Thread Thomas Glanzmann
Hello Jesse,

> This looks like the kernel module included with upstream Linux instead
> of from OVS git, is that correct?

coorect.

> Can you please describe what you are doing instead of just giving your script?

I created 8 hosts. 2 hosts are connected two each switches. That gives
me 4 switches which are connected using a ring topology. The reason for
that is that I want to test the Layer2, Layer3 IPv4 and IPv6
capabilities of OpenDayLight.

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/5] spi: sunxi: Add Allwinner A31 SPI controller driver

2014-01-30 Thread Felipe Balbi
Hi,

On Thu, Jan 30, 2014 at 03:52:16PM -0800, Kevin Hilman wrote:
> On Wed, Jan 29, 2014 at 5:32 AM, Maxime Ripard
>  wrote:
> > On Wed, Jan 29, 2014 at 12:25:20PM +, Mark Brown wrote:
> >> On Wed, Jan 29, 2014 at 12:10:48PM +0100, Maxime Ripard wrote:
> >>
> >> > +config SPI_SUN6I
> >> > +   tristate "Allwinner A31 SPI controller"
> >> > +   depends on ARCH_SUNXI || COMPILE_TEST
> >> > +   select PM_RUNTIME
> >> > +   help
> >> > + This enables using the SPI controller on the Allwinner A31 SoCs.
> >> > +
> >>
> >> A select of PM_RUNTIME is both surprising and odd - why is that there?
> >> The usual idiom is that the device starts out powered up (flagged using
> >> pm_runtime_set_active()) and then runtime PM then suspends it when it's
> >> compiled in.  That way if for some reason people want to avoid runtime
> >> PM they can still use the device.
> >
> > Since pm_runtime_set_active and all the pm_runtime* callbacks in
> > general are defined to pretty much empty functions, how the
> > suspend/resume callbacks are called then? Obviously, we need them to
> > be run, hence why I added the select here, but now I'm seeing a
> > construct like what's following acceptable then?
> 
> Even with your 'select', The runtime PM callbacks will never be called
> in the current driver.  pm_runtime_enable() doesn't do any runtime PM
> transitions.  It just allows transitions to happen when they're
> triggered by _get()/_put()/etc.
> 
> > pm_runtime_enable(&pdev->dev);
> > if (!pm_runtime_enabled(&pdev->dev))
> >sun6i_spi_runtime_resume(&pdev->dev);
> 
> Similarily here, it's not the pm_runtime_enable that will fail when
> runtime PM is disabled (or not built-in), it's a pm_runtime_get_sync()
> that will fail.
> 
> What you want is something like this in ->probe()
> 
>sun6i_spi_runtime_resume();
>/* now, device is always activated whether or not runtime PM is enabled */
>pm_runtime_enable();
>pm_runtime_set_active();  /* tells runtime PM core device is
> already active */

shouldn't this be done before pm_runtime_enable() ?

>pm_runtime_get_sync();
> 
> This 'get' will increase the usecount, but not actually call the
> callbacks because we told the RPM core that the device was already
> activated with _set_active().
> 
> And then, in ->remove(), you'll want
> 
>pm_runtime_put();

in ->remove() you actually want a put_sync() right ? You don't want to
schedule anything since you're just about to disable pm_runtime.

-- 
balbi


signature.asc
Description: Digital signature


vmwgfx: Fix unitialized stack read in vmw_setup_otable_base

2014-01-30 Thread Dave Jones
One of the error paths in vmw_setup_otable_base causes us to return with
'ret' having never been set to anything causing us to return whatever was
on the stack.

Found with Coverity

Signed-off-by: Dave Jones 

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_mob.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_mob.c
index 4910e7b81811..d4a5a19cb8c3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_mob.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_mob.c
@@ -134,6 +134,7 @@ static int vmw_setup_otable_base(struct vmw_private 
*dev_priv,
cmd = vmw_fifo_reserve(dev_priv, sizeof(*cmd));
if (unlikely(cmd == NULL)) {
DRM_ERROR("Failed reserving FIFO space for OTable setup.\n");
+   ret = -ENOMEM;
goto out_no_fifo;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Jan 31

2014-01-30 Thread Stephen Rothwell
Hi all,

There will probable be no linux-next release next Monday (next-20140203).

Please do *not* add material destined for v3.15 to your linux-next
included trees until after v3.14-rc1 is released.

This tree fails (more than usual) the powerpc allyesconfig build.

Changes since 20140130:

The powerpc tree still had its build failure.

Non-merge commits (relative to Linus' tree): 1855
 2391 files changed, 162089 insertions(+), 40509 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final
link) and i386, sparc, sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

I am currently merging 208 trees (counting Linus' and 28 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (53d8ab29f8f6 Merge branch 'for-3.14/drivers' of 
git://git.kernel.dk/linux-block)
Merging fixes/master (b0031f227e47 Merge tag 's2mps11-build' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator)
Merging kbuild-current/rc-fixes (19514fc665ff arm, kbuild: make "make install" 
not depend on vmlinux)
Merging arc-current/for-curr (7e22e91102c6 Linux 3.13-rc8)
Merging arm-current/fixes (d326b65c57d6 ARM: fix building with gcc 4.6.4)
Merging m68k-current/for-linus (56931d73697c m68k/mac: Make SCC reset work more 
reliably)
Merging metag-fixes/fixes (3b2f64d00c46 Linux 3.11-rc2)
Merging powerpc-merge/merge (b3084f4db3ae powerpc/thp: Fix crash on mremap)
Merging sparc/master (9b0cd304f26b Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux)
Merging net/master (9b0cd304f26b Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux)
Merging ipsec/master (965cdea82569 dccp: catch failed request_module call in 
dccp_probe init)
Merging sound-current/for-linus (c083be45bd2a ALSA: hda - Fix inconsistent Mic 
mute LED)
Merging pci-current/for-linus (f0b75693cbb2 MAINTAINERS: Add DesignWare, i.MX6, 
Armada, R-Car PCI host maintainers)
Merging wireless/master (7d0d46da750a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging driver-core.current/driver-core-linus (90804ed61f24 Merge branch 
'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging tty.current/tty-linus (413541dd66d5 Linux 3.13-rc5)
Merging usb.current/usb-linus (90804ed61f24 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging staging.current/staging-linus (77d143de7581 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml)
Merging char-misc.current/char-misc-linus (90804ed61f24 Merge branch 
'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging input-current/for-linus (55df811f2066 Merge branch 'next' into 
for-linus)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (ee97dc7db4cb crypto: s390 - fix des and des3_ede 
ctr concurrency issue)
Merging ide/master (9b0cd304f26b Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux)
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging devicetree-current/devicetree/merg

Re: [PATCH v2 00/21] pinctrl: mvebu: restructure and remove hardcoded addresses from Dove pinctrl

2014-01-30 Thread Sebastian Hesselbarth

On 01/30/2014 09:25 PM, Andrew Lunn wrote:

On Thu, Jan 30, 2014 at 07:50:34PM +0100, Sebastian Hesselbarth wrote:

On 01/30/2014 07:29 PM, Andrew Lunn wrote:

On Tue, Jan 28, 2014 at 01:39:12AM +0100, Sebastian Hesselbarth wrote:

This patch set is one required step for Dove to hop into mach-mvebu.
Until now, pinctrl-dove was hardcoding some registers that do not
directly belong to MPP core registers. This is not compatible with
what we want for mach-mvebu.


I think there might be something wrong here


There _is_ something wrong. I'll have a look at it. For the record,
what SoC are you testing with? From the base address, I guess it is
Kirkwood?


Yes, Kirkwood. Sorry for not saying.


This time I push a branch before sending out the patches. Also, I
think I'll postpone removal of hardcoded addresses until this is
sorted out. The patch set was growing way to quick and I have to
do this step-by-step for me and everybody else to actually understand ;)

So, at least the MVEBU guys should test the following branch on
their SoCs. Again, I have tested Dove and now confirmed that settings
are still correct. The others are compile-tested.

https://github.com/shesselba/linux-dove.git unstable/mvebu-pinctrl-v3.14_v3

@Thomas, Gregory: Do you think that the above branch will be
restructured enough allow support for orion5x and mv78xx0? I had a
quick look at mach-{orion5x,mv78xx0}/mpp.h and didn't see anything
weird.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kmem_cache_alloc panic in 3.10+

2014-01-30 Thread Eric Dumazet
On Wed, 2014-01-29 at 23:05 -0800, dormando wrote:

> We hit the routing code fairly hard. Any hints for what to look at or how
> to instrument it? Or if it's fixed already? It's a real pain to iterate
> since it takes ~30 days to crash, usually. Sometimes.

I really wonder... it looks like a possible in SLUB. (might be already
fixed)

Could you try using SLAB instead ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


drm/radeon/dpm: fix uninitialized read from stack in kv_dpm_late_enable

2014-01-30 Thread Dave Jones
If we take the false branch of the if quoted in the diff below, we
end up doing a return ret, without ever having initialized it.

Picked up by coverity.

Signed-off-by: Dave Jones 
 
diff --git a/drivers/gpu/drm/radeon/kv_dpm.c b/drivers/gpu/drm/radeon/kv_dpm.c
index b6e01d5d2cce..351db361239d 100644
--- a/drivers/gpu/drm/radeon/kv_dpm.c
+++ b/drivers/gpu/drm/radeon/kv_dpm.c
@@ -1223,7 +1223,7 @@ int kv_dpm_enable(struct radeon_device *rdev)
 
 int kv_dpm_late_enable(struct radeon_device *rdev)
 {
-   int ret;
+   int ret = 0;
 
if (rdev->irq.installed &&
r600_is_internal_thermal_sensor(rdev->pm.int_thermal_type)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 19:20 -0500, Mikulas Patocka wrote:
> 
> On Thu, 30 Jan 2014, James Bottomley wrote:
> 
> > On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote:
> > > 
> > > On Thu, 30 Jan 2014, James Bottomley wrote:
> > > 
> > > > Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
> > > > allow 64 bit offsets for block devices on 32 bit.  It sounds like
> > > > there's somewhere not using sector_t ... or using it wrongly which needs
> > > > fixing.
> > > 
> > > The page cache uses unsigned long as a page index. Therefore, if unsigned 
> > > long is 32-bit, the block device may have at most 2^32-1 pages.
> > 
> > Um, that's the index into the mapping, not the device; a device can have
> > multiple mappings and each mapping has a radix tree of pages.  For most
> > filesystems a mapping is equivalent to a file, so we can have large
> > filesystems, but they can't have files over actually 4GB on 32 bits
> > otherwise mmap fails.
> 
> A device may be accessed direcly (by opening /dev/sdX) and it creates a 
> mapping too - thus, the size of a mapping limits the size of a block 
> device.

Right, that's what I suspected below.  We can't damage large block
support on filesystems just because of this corner case.

> The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
> fix it - but there may be some hidden places where pgoff is converted to 
> unsigned long - who knows, if they exist or not?

I don't think we want to do that ... it will make struct page fatter and
have knock on impacts in the radix tree code.  To fix this, we need to
make the corner case (i.e. opening large block devices without a
filesystem) bear the pain.  It sort of looks like we want to do a linear
array of mappings of 64TB for the device so the page cache calculations
don't overflow.

> > Are we running into a problems with struct address_space where we've
> > assumed the inode belongs to the file and lvm is doing something where
> > it's the whole device?
> 
> lvm creates a 64TiB device, udev runs blkid on that device and blkid opens 
> the device and gets stuck because of unsigned long overflow.

well a simple open won't cause this ... it must be trying to read the
end of the device for some reason.  But anyway, the way to fix this is
to fix the large block open as a corner case.

> > > > > On 32-bit architectures, we must limit block device size to
> > > > > PAGE_SIZE*(2^32-1).
> > > > 
> > > > So you're saying CONFIG_LBDAF can never work, why?
> > > > 
> > > > James
> > > 
> > > CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
> > > without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
> > > 16TiB (4096*2^32).
> > 
> > I don't think the people who did the large block device work expected to
> > gain only 3 bits for all their pain.
> > 
> > James
> 
> One could change it to have three choices:
> 2TiB limit - 32-bit sector_t and 32-bit pgoff_t
> 16TiB limit - 64-bit sector_t and 32-bit pgoff_t
> 32PiB limit - 64-bit sector_t and 64-bit pgoff_t
> 
> Though, we need to know if the people who designed memory management agree 
> with changing pgoff_t to 64 bits.

I don't think we can change the size of pgoff_t ... because it won't
just be that, it will be other problems like the radix tree.

However, you also have to bear in mind that truncating large block
device support to 64TB on 32 bits is a technical ABI break.  Hopefully
it is only technical because I don't know of any current consumer block
device that is 64TB yet, but anyone who'd created a filesystem >64TB
would find it no-longer mounted on 32 bits.
James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-01-30 Thread Jesse Gross
On Thu, Jan 30, 2014 at 12:44 PM, Thomas Glanzmann  wrote:
> Hello,
> open vswitch git head with Linus tip OOPses for me reproducable when I
> load the following mininet topology:

This looks like the kernel module included with upstream Linux instead
of from OVS git, is that correct?

Can you please describe what you are doing instead of just giving your script?

It would also be helpful if you could use GDB to find out the source
of the faulting address.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/16] Volatile Ranges v10

2014-01-30 Thread John Stultz
On 01/29/2014 10:30 AM, Johannes Weiner wrote:
> On Tue, Jan 28, 2014 at 05:43:54PM -0800, John Stultz wrote:
>> On 01/28/2014 04:03 PM, Johannes Weiner wrote:
>>> On Thu, Jan 02, 2014 at 04:12:08PM +0900, Minchan Kim wrote:
 o Syscall interface
>>> Why do we need another syscall for this?  Can't we extend madvise to
>>> take MADV_VOLATILE, MADV_NONVOLATILE, and return -ENOMEM if something
>>> in the range was purged?
>> So the madvise interface is insufficient to provide the semantics
>> needed. Not so much for MADV_VOLATILE, but MADV_NONVOLATILE. For the
>> NONVOLATILE call, we have to atomically unmark the volatility status of
>> the byte range and provide the purge status, which informs the caller if
>> any of the data in the specified range was discarded (and thus needs to
>> be regenerated).
>>
>> The problem is that by clearing the range, we may need to allocate
>> memory (possibly by splitting in an existing range segment into two),
>> which possibly could fail. Unfortunately this could happen after we've
>> modified the volatile state of part of that range.  At this point we
>> can't just fail, because we've modified state and we also need to return
>> the purge status of the modified state.
> munmap() can theoretically fail for the same reason (splitting has to
> allocate a new vma) but it's not even documented.  The allocator does
> not fail allocations of that order.
>
> I'm not sure this is good enough, but to me it sounds a bit overkill
> to design a new system call around a non-existent problem.

I still think its problematic design issue. With munmap, I think
re-calling on failure should be fine. But with _NONVOLATILE we could
possibly lose the purge status on a second call (for instance if only
the first page of memory was purged, but we errored out mid-call w/
ENOMEM, on the second call it will seem like the range was successfully
set non-volatile with no memory purged).

And even if the current allocator never ever fails, I worry at some
point in the future that rule might change and then we'd have a broken
interface.



 o Not bind with vma split/merge logic to prevent mmap_sem cost and
 o Not bind with vma split/merge logic to avoid vm_area_struct memory
   footprint.
>>> VMAs are there to track attributes of memory ranges.  Duplicating
>>> large parts of their functionality and co-maintaining both structures
>>> on create, destroy, split, and merge means duplicate code and complex
>>> interactions.
>>>
>>> 1. You need to define semantics and coordinate what happens when the
>>>vma underlying a volatile range changes.
>>>
>>>Either you have to strictly co-maintain both range objects, or you
>>>have weird behavior like volatily outliving a vma and then applying
>>>to a separate vma created in its place.
>> So indeed this is a difficult problem!  My initial approach is simply
>> when any new mapping is made, we clear the volatility of the affected
>> process memory. Admittedly this has extra overhead and Minchan has an
>> alternative here (which I'm not totally sold on yet, but may be ok). 
>> I'm almost convinced that for anonymous volatility, storing the
>> volatility in the vma would be ok, but Minchan is worried about the
>> performance overhead of the required locking for manipulating the vmas.
>>
>> For file volatility, this is more complicated, because since the
>> volatility is shared, the ranges have to be tracked against the
>> address_space structure, and can't be stored in per-process vmas. So
>> this is partially why we've kept range trees hanging off of the mm and
>> address_spaces structures, since it allows the range manipulation logic
>> to be shared in both cases.
> The fs people probably have not noticed yet what you've done to struct
> address_space / struct inode ;-) I doubt that this is mergeable in its
> current form, so we have to think about a separate mechanism for shmem
> page ranges either way.

Yea. But given the semantics will likely be *very* similar, it seems
strange to try to force separate mechanisms.

That said, in an earlier implementation I stored the range tree in a
hash so we wouldn't have to add anything to the address_space structure.
But for now I want to make it clear that the ranges are tied to the
address space (and it gives the fs folks something to notice ;).


>>>Userspace won't get this right, and even in the kernel this is
>>>error prone and adds a lot to the complexity of vma management.
>> Not sure exactly I understand what you mean by "userspace won't get this
>> right" ?
> I meant, userspace being responsible for keeping vranges coherent with
> its mmap and munmap operations, instead of the kernel doing it.
>
>>> 2. If page reclaim discards a page from the upper end of a a range,
>>>you mark the whole range as purged.  If the user later marks the
>>>lower half of the range as non-volatile, the syscall will report
>>>purged=1 even though all requested pages are still there.
>> To me t

[GIT PULL] x86/asmlinkage (LTO) for v3.14

2014-01-30 Thread H. Peter Anvin
Hi Linus,

This patchset adds more infrastructure for link time optimization
(LTO).

This patchset was pulled into my tree late because of a
miscommunication (part of the patchset was picked up by other
maintainers.)  However, the patchset is strictly build-related and
seems to be okay in testing.

The following changes since commit d8ec26d7f8287f5788a494f56e8814210f0e64be:

  Linux 3.13 (2014-01-19 18:40:07 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-asmlinkage-for-linus

for you to fetch changes up to 07ba06d9d293d3c0a512f1cb9189645c6e0424e2:

  x86, asmlinkage, xen: Fix type of NMI (2014-01-29 22:17:18 -0800)


Andi Kleen (6):
  x86, asmlinkage, lguest: Fix C functions used by inline assembler
  x86, asmlinkage, paravirt: Don't rely on local assembler labels
  x86, asmlinkage, paravirt: Make paravirt thunks global
  x86: Use inline assembler instead of global register variable to get sp
  x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visible
  x86, asmlinkage, xen: Fix type of NMI

 arch/x86/include/asm/paravirt.h   |  2 +-
 arch/x86/include/asm/paravirt_types.h |  9 +
 arch/x86/include/asm/thread_info.h|  8 +---
 arch/x86/kernel/kvm.c |  2 +-
 arch/x86/kernel/vsmp_64.c |  8 
 arch/x86/lguest/boot.c| 12 ++--
 arch/x86/xen/irq.c|  8 
 arch/x86/xen/mmu.c| 16 
 arch/x86/xen/setup.c  |  4 ++--
 arch/x86/xen/spinlock.c   |  2 +-
 10 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 401f350ef71b..cd6e1610e29e 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -781,9 +781,9 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 #define PV_CALLEE_SAVE_REGS_THUNK(func)
\
extern typeof(func) __raw_callee_save_##func;   \
-   static void *__##func##__ __used = func;\
\
asm(".pushsection .text;"   \
+   ".globl __raw_callee_save_" #func " ; " \
"__raw_callee_save_" #func ": " \
PV_SAVE_ALL_CALLER_REGS \
"call " #func ";"   \
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index aab8f671b523..7549b8b369e4 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -388,10 +388,11 @@ extern struct pv_lock_ops pv_lock_ops;
_paravirt_alt(insn_string, "%c[paravirt_typenum]", 
"%c[paravirt_clobber]")
 
 /* Simple instruction patching code. */
-#define DEF_NATIVE(ops, name, code)\
-   extern const char start_##ops##_##name[] __visible, \
- end_##ops##_##name[] __visible;   \
-   asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
+#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
+
+#define DEF_NATIVE(ops, name, code)\
+   __visible extern const char start_##ops##_##name[], 
end_##ops##_##name[];   \
+   asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, 
name))
 
 unsigned paravirt_patch_nop(void);
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 3ba3de457d05..e1940c06ed02 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -163,9 +163,11 @@ struct thread_info {
  */
 #ifndef __ASSEMBLY__
 
-
-/* how to get the current stack pointer from C */
-register unsigned long current_stack_pointer asm("esp") __used;
+#define current_stack_pointer ({   \
+   unsigned long sp;   \
+   asm("mov %%esp,%0" : "=g" (sp));\
+   sp; \
+})
 
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6dd802c6d780..cd1b362e4a23 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -673,7 +673,7 @@ static cpumask_t waiting_cpus;
 /* Track spinlock on which a cpu is waiting */
 static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);
 
-static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+__visible void kv

Re: RFC: KEYS: Is this too-big a behavioural change for a system call?

2014-01-30 Thread James Morris
On Thu, 30 Jan 2014, David Howells wrote:

>  (5) Don't implicitly create a new anonymous keyring and don't implicitly set
>  the session keyring to the user-session keyring, but rather just fall 
> back
>  to using the user-session keyring if there isn't a session keyring.
> 

> 
> That said, I do think that the Kerberos people have a valid point.  The 
> current
> behaviour is poor.  I'm inclined to implement (5) or (6), probably (5).
> 
> This won't make any difference to most processes, ie.:
> 
>  (*) Those run from pam_keyinit-managed login shells.
> 
>  (*) Those that don't make use of libkrb5 or keyrings.
> 

So there are existing apps which will see semantic changes?

If so, we can't accept this change.

> In many ways, I'd like to just get rid of the user and user-session keyrings
> from the kernel entirely and have them created and maintained by pam_keyinit.
> The special keyring IDs:
> 
>   KEY_SPEC_USER_KEYRING
>   KEY_SPEC_USER_SESSION_KEYRING
> 
> and:
> 
>   KEY_REQKEY_DEFL_USER_KEYRING
>   KEY_REQKEY_DEFL_USER_SESSION_KEYRING
> 
> would then search your session keyring for keyrings called "_uid" and
> "_uid_ses" and return those.  Unfortunately, I think this is probably a
> much-too-big change at this point.
> 
> Any thoughts?

Getting it right is more important than the size of the change.

What about creating a new system call with the desired behavior, and 
deprecating the current one (or at least, making it a wrapper for the new 
call).

-- 
James Morris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers

2014-01-30 Thread Richard Yao
On Jan 30, 2014, at 7:44 PM, David Miller  wrote:

> From: David Miller 
> Date: Thu, 30 Jan 2014 16:29:26 -0800 (PST)
> 
>> From: Richard Yao 
>> Date: Thu, 30 Jan 2014 13:02:48 -0500
>> 
>>> The 9p-virtio transport does zero copy on things larger than 1024 bytes
>>> in size. It accomplishes this by returning the physical addresses of
>>> pages to the virtio-pci device. At present, the translation is usually a
>>> bit shift.
>>> 
>>> However, that approach produces an invalid page address when we
>>> read/write to vmalloc buffers, such as those used for Linux kernle
>>> modules. This causes QEMU to die printing:
>>> 
>>> qemu-system-x86_64: virtio: trying to map MMIO memory
>>> 
>>> This patch enables 9p-virtio to correctly handle this case. This not
>>> only enables us to load Linux kernel modules off virtfs, but also
>>> enables ZFS file-based vdevs on virtfs to be used without killing QEMU.
>>> 
>>> Also, special thanks to both Avi Kivity and Alexander Graf for their
>>> interpretation of QEMU backtraces. Without their guidence, tracking down
>>> this bug would have taken much longer.
>>> 
>>> Signed-off-by: Richard Yao 
>>> Acked-by: Alexander Graf 
>>> Reviewed-by: Will Deacon 
>> 
>> Applied, thanks.
> 
> Actually I had to revert, is_vmalloc_or_malloc_addr() is not exported to
> modules, so this change breaks the build.

Thanks for catching that. I had originally used is_vmalloc_addr() instead of 
is_vmalloc_or_malloc_addr(), but changed it after realizing this did not 
correct the problem on all architectures. The is_vmalloc_addr() lives in 
headers. I will send out a patch to get that symbol exported and resubmit this 
after it is merged.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rtnetlink: return the newly created link in response to newlink

2014-01-30 Thread Tom Gundersen
Hi Thomas,

Thanks for your reply.

On Thu, Jan 30, 2014 at 3:27 PM, Thomas Graf  wrote:
> On 01/30/14 at 02:05pm, Tom Gundersen wrote:
>> Userspace needs to reliably know the ifindex of the netdevs it creates,
>> as we cannot rely on the ifname staying unchanged.
>>
>> Earlier, a simlpe NLMSG_ERROR would be returned, but this returns the
>> corresponding RTM_NEWLINK on success instead.
>
> This breaks existing Netlink applications in user space. User space
> apps are not prepared to receive both a RTM_NEWLINK reply _and_
> the ACK unless they have set NLM_F_ECHO in the original request.
>
> You can already reliably retrieve the ifindex by listening to
> RTNLGRP_LINK messages and be notified about the link created
> including all follow-up renames.

Ok, we'll keep doing this instead.

Cheers,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to use memmap= option

2014-01-30 Thread Randy Dunlap
On 01/08/2014 06:06 AM, Long Wind wrote:
> On 1/8/14, Long Wind  wrote:
>> I have asked Debian users, they don't seem to know
>> kernel 2.4/2.6 fail to boot on my PC
>> probably because it can't detect my memory
>> so I have to tell kernel memory map
>>
>> the following is copied from kernel-parameters.txt:
>>
>> memmap=exactmap  [KNL,X86] Enable setting of an exact
>>  E820 memory map, as specified by the user.
>>  Such memmap=exactmap lines can be constructed based on
>>  BIOS output or other requirements. See the memmap=nn@ss
>>  option description.
>>
>>
>>
>> Could you tell me how to construct memmap= lines?
>>
> 
> Sorry, it seems that I have posted wrong list
> If you could tell me which list I should join, I will very appreciate
> I'm leaving the list


No, this is the correct mailing list for your question (or use
linux...@kvack.org).

First you obtain a listing of your computer's valid memory map,
either from Linux booting on it (maybe not so likely on your computer)
or maybe from BIOS SETUP.

With that information, you construct a kernel command line with multiple
memmap= options.
You can see an example of this here:
http://lkml.indiana.edu/hypermail/linux/kernel/0809.1/0664.html


Hope that helps you.
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net: core: move core networking work to power efficient workqueue

2014-01-30 Thread Joe Perches
On Fri, 2014-01-31 at 01:48 +0100, Antonio Quartulli wrote:
> On 31/01/14 01:33, Zoran Markovic wrote:
> > From: Shaibal Dutta 
> 
> [...]
> 
> > -   schedule_delayed_work(&linkwatch_work, delay);
> > +   queue_delayed_work(system_power_efficient_wq,
> > +   &linkwatch_work, delay);
> 
> before talking about technical details, here and in other spots of this
> patch the alignment is wrong. I think checkpatch should have said
> something about it. The first parameter on the new line should be
> aligned up to the column after the opening parenthesis.

Using "scripts/checkpatch.pl --strict" would emit an
alignment message, otherwise not.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: set default DEVTYPE for all ethernet based devices

2014-01-30 Thread Tom Gundersen
Hi Veaceslav,

Thanks for your quick reply.

On Thu, Jan 30, 2014 at 4:05 PM, Veaceslav Falico  wrote:
> On Thu, Jan 30, 2014 at 02:20:02PM +0100, Tom Gundersen wrote:
>>
>> In systemd's networkd and udevd, we would like to give the administrator a
>> simple way to filter net devices by their DEVTYPE [0][1]. Other software
>> such as ConnMan and NetworkManager uses a similar filtering already.
>>
>> Currently, plain ethernet devices have DEVTYPE=(null). This patch sets the
>> devtype to "ethernet" instead. This avoids the need for special-casing the
>> DEVTYPE=(null) case in userspace, and also avoids false positives, as
>> there
>> are several other types of netdevs that also have DEVTYPE=(null).
>
>
> There are quite a few users at least in usb and wireless drivers:
>
> net#git grep alloc_etherdev drivers/net/wireless/ drivers/net/usb | wc -l
> 18
>
> In usb, though, there might be some false positives of this grep, as
> there are a few devices which might be considered ethernet.

Ah, yes I missed the #define of alloc_etherdev(). Looking through
these, it shouldn't be too hard to keep this patch and additionally
fix up the false positives to opt-out of setting the DEVTYPE. Does
that sound like something that would be acceptable?

Cheers,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 1/2] mm: add kstrimdup function

2014-01-30 Thread Sebastian Capella
kstrimdup creates a whitespace-trimmed duplicate of the passed
in null-terminated string.  This is useful for strings coming
from sysfs that often include trailing whitespace due to user
input.

Thanks to Joe Perches for this implementation.

Signed-off-by: Sebastian Capella 
Cc: Andrew Morton 
Cc: Joe Perches 
Cc: Mikulas Patocka 
Cc: David Rientjes 
Cc: Rik van Riel  (commit_signer:5/10=50%)
Cc: Michel Lespinasse 
Cc: Shaohua Li 
Cc: Jerome Marchand 
Cc: Mikulas Patocka 
Cc: Joonsoo Kim 
---
 include/linux/string.h |1 +
 mm/util.c  |   30 ++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index ac889c5..f29f9a0 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -114,6 +114,7 @@ void *memchr_inv(const void *s, int c, size_t n);
 
 extern char *kstrdup(const char *s, gfp_t gfp);
 extern char *kstrndup(const char *s, size_t len, gfp_t gfp);
+extern char *kstrimdup(const char *s, gfp_t gfp);
 extern void *kmemdup(const void *src, size_t len, gfp_t gfp);
 
 extern char **argv_split(gfp_t gfp, const char *str, int *argcp);
diff --git a/mm/util.c b/mm/util.c
index 808f375..a8b731c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1,6 +1,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,6 +64,35 @@ char *kstrndup(const char *s, size_t max, gfp_t gfp)
 EXPORT_SYMBOL(kstrndup);
 
 /**
+ * kstrimdup - Trim and copy a %NUL terminated string.
+ * @s: the string to trim and duplicate
+ * @gfp: the GFP mask used in the kmalloc() call when allocating memory
+ *
+ * Returns an address, which the caller must kfree, containing
+ * a duplicate of the passed string with leading and/or trailing
+ * whitespace (as defined by isspace) removed.
+ */
+char *kstrimdup(const char *s, gfp_t gfp)
+{
+   char *buf;
+   char *begin = skip_spaces(s);
+   size_t len = strlen(begin);
+
+   while (len && isspace(begin[len - 1]))
+   len--;
+
+   buf = kmalloc_track_caller(len + 1, gfp);
+   if (!buf)
+   return NULL;
+
+   memcpy(buf, begin, len);
+   buf[len] = '\0';
+
+   return buf;
+}
+EXPORT_SYMBOL(kstrimdup);
+
+/**
  * kmemdup - duplicate region of memory
  *
  * @src: memory region to duplicate
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] ARM: SoC late DT changes for v3.14

2014-01-30 Thread Kevin Hilman
Hi Linus,

These are a few changes that arrived a little late but were considered
self-contained enough to still go in for v3.14.

They are all device tree updtes this time around, and mainly for
Broadcom SoCs.

There is one trivial add/add conflict in one of the device tree files.

Please pull,

Kevin




The following changes since commit 53d8ab29f8f6d67e37857b68189b38fa3d87dd8e:

  Merge branch 'for-3.14/drivers' of git://git.kernel.dk/linux-block

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git 
tags/late-dt-for-linus

for you to fetch changes up to 929267cb3525daf72f730f4d4c4e1e9e2b135e61:

  ARM: moxart: move fixed rate clock child node to board level dts



Alex Elder (1):
  clk: bcm281xx: define kona clock binding

Jonas Jensen (1):
  ARM: moxart: move fixed rate clock child node to board level dts

Kevin Hilman (1):
  Merge tag 'bcm-for-3.14-dt' of git://github.com/broadcom/bcm11351 into 
next/dt

Matt Porter (1):
  ARM: dts: add usb udc support to bcm281xx

Tim Kryger (8):
  ARM: dts: bcm28155-ap: Enable all the i2c busses
  ARM: dts: Declare clocks as fixed on bcm11351
  ARM: dts: bcm281xx: Add i2c busses
  ARM: dts: Specify clocks for UARTs on bcm11351
  Documentation: dt: kona-sdhci: Add clocks property
  ARM: dts: Specify clocks for SDHCIs on bcm11351
  Documentation: dt: kona-timer: Add clocks property
  ARM: dts: Specify clocks for timer on bcm11351


 .../devicetree/bindings/arm/bcm/kona-timer.txt  |   7 +-
 .../bindings/clock/bcm-kona-clock.txt   |  93 ++
 .../devicetree/bindings/mmc/kona-sdhci.txt  |   4 +
 arch/arm/boot/dts/bcm11351-brt.dts  |   6 +
 arch/arm/boot/dts/bcm11351.dtsi | 169 ++-
 arch/arm/boot/dts/bcm28155-ap.dts   |  28 +++
 arch/arm/boot/dts/moxart-uc7112lx.dts   |   8 +
 arch/arm/boot/dts/moxart.dtsi   |   6 -
 8 files changed, 309 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/bcm-kona-clock.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/2] hibernation related patches

2014-01-30 Thread Sebastian Capella
Patchset related to hibernation resume:
  - enhancement to make the use of an existing resume file more general
  - add kstrimdup function which trims and duplicates a string

  Both patches are based on the 3.13 tag.  This was tested on a
  Beaglebone black with partial hibernation support, and compiled for
  x86_64.

[PATCH v6 1/2] mm: add kstrimdup function
  include/linux/string.h |1 +
  mm/util.c  |   30 ++
  2 files changed, 31 insertions(+)

  Adds the kstrimdup function to duplicate and trim whitespace
  from a string.  This is useful for working with user input to
  sysfs.

[PATCH v6 2/2] PM / Hibernate: use name_to_dev_t to parse resume
  kernel/power/hibernate.c |   33 +
  1 file changed, 17 insertions(+), 16 deletions(-)

  Use name_to_dev_t to parse the /sys/power/resume file making the
  syntax more flexible.  It supports the previous use syntax
  and additionally can support other formats such as
  /dev/devicenode and UUID= formats.

  By changing /sys/debug/resume to accept the same syntax as
  the resume=device parameter, we can parse the resume=device
  in the initrd init script and use the resume device directly
  from the kernel command line.

Changes in v6:
--
* Revert tricky / confusing while loop indexing

Changes in v5:
--
* Change kstrimdup to minimize allocated memory.  Now allocates only
  the memory needed for the string instead of using strim.

Changes in v4:
--
* Dropped name_to_dev_t rework in favor of adding kstrimdup
* adjusted resume_store

Changes in v3:
--
* Dropped documentation patch as it went in through trivial
* Added patch for name_to_dev_t to support directly parsing userspace
  buffer

Changes in v2:
--
* Added check for null return of kstrndup in hibernate.c


Thanks,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/2] PM / Hibernate: use name_to_dev_t to parse resume

2014-01-30 Thread Sebastian Capella
Use the name_to_dev_t call to parse the device name echo'd to
to /sys/power/resume.  This imitates the method used in hibernate.c
in software_resume, and allows the resume partition to be specified
using other equivalent device formats as well.  By allowing
/sys/debug/resume to accept the same syntax as the resume=device
parameter, we can parse the resume=device in the init script and
use the resume device directly from the kernel command line.

Signed-off-by: Sebastian Capella 
Acked-by: Pavel Machek 
Cc: Len Brown 
Cc: "Rafael J. Wysocki" 
---
 kernel/power/hibernate.c |   33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 0121dab..49d7a37 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -972,26 +972,27 @@ static ssize_t resume_show(struct kobject *kobj, struct 
kobj_attribute *attr,
 static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
 {
-   unsigned int maj, min;
dev_t res;
-   int ret = -EINVAL;
+   char *name = kstrimdup(buf, GFP_KERNEL);
 
-   if (sscanf(buf, "%u:%u", &maj, &min) != 2)
-   goto out;
+   if (name == NULL)
+   return -ENOMEM;
 
-   res = MKDEV(maj,min);
-   if (maj != MAJOR(res) || min != MINOR(res))
-   goto out;
+   res = name_to_dev_t(name);
 
-   lock_system_sleep();
-   swsusp_resume_device = res;
-   unlock_system_sleep();
-   printk(KERN_INFO "PM: Starting manual resume from disk\n");
-   noresume = 0;
-   software_resume();
-   ret = n;
- out:
-   return ret;
+   if (res != 0) {
+   lock_system_sleep();
+   swsusp_resume_device = res;
+   unlock_system_sleep();
+   printk(KERN_INFO "PM: Starting manual resume from disk\n");
+   noresume = 0;
+   software_resume();
+   } else {
+   n = -EINVAL;
+   }
+
+   kfree(name);
+   return n;
 }
 
 power_attr(resume);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] a Kconfig fix

2014-01-30 Thread nitin . a . kamble
From: Nitin A Kamble 

This is pull request with a fix for Kconfig dependency issue. It fixes
a build failure when the kernel configuration option GENERIC_IRQ_CHIP
is enabled without enabling the IRQ_DOMAIN config option in the kernel
configuration.

Thanks,
Nitin

Nitin A Kamble (1):
  irq: fix a Kconfig dependency

 kernel/irq/Kconfig | 1 +
 1 file changed, 1 insertion(+)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] irq: fix a Kconfig dependency

2014-01-30 Thread nitin . a . kamble
From: Nitin A Kamble 

The generic_chip.c uses interfaces from irq_domain.c which is controlled by
the IRQ_DOMAIN config option.

Add a select statement in the Kconfig to reflect this requirement.

Without this fix, the generic_chip.c compilation fails like this:

linux/kernel/irq/generic-chip.c:400:11:
  error: 'irq_domain_xlate_onetwocell' undeclared here (not in a function)

Signed-off-by: Nitin A Kamble 
---
 kernel/irq/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 4a1fef0..07cbdfe 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -40,6 +40,7 @@ config IRQ_EDGE_EOI_HANDLER
 # Generic configurable interrupt chip implementation
 config GENERIC_IRQ_CHIP
bool
+   select IRQ_DOMAIN
 
 # Generic irq_domain hw <--> linux irq number translation
 config IRQ_DOMAIN
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net: core: move core networking work to power efficient workqueue

2014-01-30 Thread Antonio Quartulli
On 31/01/14 01:33, Zoran Markovic wrote:
> From: Shaibal Dutta 

[...]

> - schedule_delayed_work(&linkwatch_work, delay);
> + queue_delayed_work(system_power_efficient_wq,
> + &linkwatch_work, delay);

before talking about technical details, here and in other spots of this
patch the alignment is wrong. I think checkpatch should have said
something about it. The first parameter on the new line should be
aligned up to the column after the opening parenthesis.

Regards,

>  }
>  
>  
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index c03f3de..2c8f839 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -101,7 +101,8 @@ static void queue_process(struct work_struct *work)
>   __netif_tx_unlock(txq);
>   local_irq_restore(flags);
>  
> - schedule_delayed_work(&npinfo->tx_work, HZ/10);
> + queue_delayed_work(system_power_efficient_wq,
> + &npinfo->tx_work, HZ/10);
>   return;
>   }
>   __netif_tx_unlock(txq);
> @@ -423,7 +424,8 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
> sk_buff *skb,
>  
>   if (status != NETDEV_TX_OK) {
>   skb_queue_tail(&npinfo->txq, skb);
> - schedule_delayed_work(&npinfo->tx_work,0);
> + queue_delayed_work(system_power_efficient_wq,
> + &npinfo->tx_work, 0);
>   }
>  }
>  EXPORT_SYMBOL(netpoll_send_skb_on_dev);
> 


-- 
Antonio Quartulli



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Documentation: fix memmap= language in kernel-parameters.txt

2014-01-30 Thread David Rientjes
On Thu, 30 Jan 2014, Randy Dunlap wrote:

> From: Randy Dunlap 
> 
> Clean up descriptions of memmap= boot options.
> 
> Add periods (full stops), drop commas, change "used" to
> "reserved" or "marked".
> 
> Signed-off-by: Randy Dunlap 
> Cc: Andiry Xu 
> Cc: David Rientjes 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers

2014-01-30 Thread David Miller
From: David Miller 
Date: Thu, 30 Jan 2014 16:29:26 -0800 (PST)

> From: Richard Yao 
> Date: Thu, 30 Jan 2014 13:02:48 -0500
> 
>> The 9p-virtio transport does zero copy on things larger than 1024 bytes
>> in size. It accomplishes this by returning the physical addresses of
>> pages to the virtio-pci device. At present, the translation is usually a
>> bit shift.
>> 
>> However, that approach produces an invalid page address when we
>> read/write to vmalloc buffers, such as those used for Linux kernle
>> modules. This causes QEMU to die printing:
>> 
>> qemu-system-x86_64: virtio: trying to map MMIO memory
>> 
>> This patch enables 9p-virtio to correctly handle this case. This not
>> only enables us to load Linux kernel modules off virtfs, but also
>> enables ZFS file-based vdevs on virtfs to be used without killing QEMU.
>> 
>> Also, special thanks to both Avi Kivity and Alexander Graf for their
>> interpretation of QEMU backtraces. Without their guidence, tracking down
>> this bug would have taken much longer.
>> 
>> Signed-off-by: Richard Yao 
>> Acked-by: Alexander Graf 
>> Reviewed-by: Will Deacon 
> 
> Applied, thanks.

Actually I had to revert, is_vmalloc_or_malloc_addr() is not exported to
modules, so this change breaks the build.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()

2014-01-30 Thread tip-bot for Prarit Bhargava
Commit-ID:  39424e89d64661faa0a2e00c5ad1e6dbeebfa972
Gitweb: http://git.kernel.org/tip/39424e89d64661faa0a2e00c5ad1e6dbeebfa972
Author: Prarit Bhargava 
AuthorDate: Tue, 28 Jan 2014 08:22:11 -0500
Committer:  H. Peter Anvin 
CommitDate: Thu, 30 Jan 2014 16:40:13 -0800

x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()

Further discussion here: http://marc.info/?l=linux-kernel&m=139073901101034&w=2

kbuild, 0day kernel build service, outputs the warning:

arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes
is larger than 2048 bytes [-Wframe-larger-than=]

because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the
stack.   Fix this by moving the two cpumasks to a global file context.

Reported-by: Fengguang Wu 
Tested-by: David Rientjes 
Signed-off-by: Prarit Bhargava 
Link: 
http://lkml.kernel.org/r/1390915331-27375-1-git-send-email-pra...@redhat.com
Cc: Andi Kleen 
Cc: Michel Lespinasse 
Cc: Seiji Aguchi 
Cc: Yang Zhang 
Cc: Paul Gortmaker 
Cc: Janet Morgan 
Cc: Tony Luck 
Cc: Ruiv Wang 
Cc: Gong Chen 
Cc: Yinghai Lu 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/irq.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index dbb6087..d99f31d 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -266,6 +266,14 @@ __visible void smp_trace_x86_platform_ipi(struct pt_regs 
*regs)
 EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
 
 #ifdef CONFIG_HOTPLUG_CPU
+
+/* These two declarations are only used in check_irq_vectors_for_cpu_disable()
+ * below, which is protected by stop_machine().  Putting them on the stack
+ * results in a stack frame overflow.  Dynamically allocating could result in a
+ * failure so declare these two cpumasks as global.
+ */
+static struct cpumask affinity_new, online_new;
+
 /*
  * This cpu is going to be removed and its vectors migrated to the remaining
  * online cpus.  Check to see if there are enough vectors in the remaining 
cpus.
@@ -277,7 +285,6 @@ int check_irq_vectors_for_cpu_disable(void)
unsigned int this_cpu, vector, this_count, count;
struct irq_desc *desc;
struct irq_data *data;
-   struct cpumask affinity_new, online_new;
 
this_cpu = smp_processor_id();
cpumask_copy(&online_new, cpu_online_mask);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Documentation: fix memmap= language in kernel-parameters.txt

2014-01-30 Thread Randy Dunlap
From: Randy Dunlap 

Clean up descriptions of memmap= boot options.

Add periods (full stops), drop commas, change "used" to
"reserved" or "marked".

Signed-off-by: Randy Dunlap 
Cc: Andiry Xu 
Cc: David Rientjes 
---
 Documentation/kernel-parameters.txt |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- lnx-313.orig/Documentation/kernel-parameters.txt
+++ lnx-313/Documentation/kernel-parameters.txt
@@ -1668,16 +1668,16 @@ bytes respectively. Such letter suffixes
option description.
 
memmap=nn[KMG]@ss[KMG]
-   [KNL] Force usage of a specific region of memory
-   Region of memory to be used, from ss to ss+nn.
+   [KNL] Force usage of a specific region of memory.
+   Region of memory to be used is from ss to ss+nn.
 
memmap=nn[KMG]#ss[KMG]
[KNL,ACPI] Mark specific memory as ACPI data.
-   Region of memory to be used, from ss to ss+nn.
+   Region of memory to be marked is from ss to ss+nn.
 
memmap=nn[KMG]$ss[KMG]
[KNL,ACPI] Mark specific memory as reserved.
-   Region of memory to be used, from ss to ss+nn.
+   Region of memory to be reserved is from ss to ss+nn.
Example: Exclude memory from 0x1869-0x1869
 memmap=64K$0x1869
 or
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] net: core: move core networking work to power efficient workqueue

2014-01-30 Thread Zoran Markovic
From: Shaibal Dutta 

This patch moves the following work to the power efficient workqueue:
  - Transmit work of netpoll
  - Destination cache garbage collector work
  - Link watch event handler work

In general, assignment of CPUs to pending work could be deferred to
the scheduler in order to extend idle residency time and improve
power efficiency. I would value community's opinion on the migration
of this work to the power efficient workqueue, with an emphasis on
migration of netpoll's transmit work.

This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.

Cc: "David S. Miller" 
Cc: Jiri Pirko 
Cc: YOSHIFUJI Hideaki 
Cc: Eric Dumazet 
Cc: Julian Anastasov 
Cc: Flavio Leitner 
Cc: Neil Horman 
Cc: Patrick McHardy 
Cc: John Fastabend 
Cc: Amerigo Wang 
Cc: Joe Perches 
Cc: Jason Wang 
Cc: Antonio Quartulli 
Cc: Simon Horman 
Cc: Nikolay Aleksandrov 
Signed-off-by: Shaibal Dutta 
[zoran.marko...@linaro.org: Rebased to latest kernel version. Edited
calls to mod_delayed_work to reference power efficient workqueue.
Added commit message.]
Signed-off-by: Zoran Markovic 
---
 net/core/dst.c|5 +++--
 net/core/link_watch.c |5 +++--
 net/core/netpoll.c|6 --
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index ca4231e..cc28352 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -135,7 +135,8 @@ loop:
 */
if (expires > 4*HZ)
expires = round_jiffies_relative(expires);
-   schedule_delayed_work(&dst_gc_work, expires);
+   queue_delayed_work(system_power_efficient_wq,
+   &dst_gc_work, expires);
}
 
spin_unlock_bh(&dst_garbage.lock);
@@ -223,7 +224,7 @@ void __dst_free(struct dst_entry *dst)
if (dst_garbage.timer_inc > DST_GC_INC) {
dst_garbage.timer_inc = DST_GC_INC;
dst_garbage.timer_expires = DST_GC_MIN;
-   mod_delayed_work(system_wq, &dst_gc_work,
+   mod_delayed_work(system_power_efficient_wq, &dst_gc_work,
 dst_garbage.timer_expires);
}
spin_unlock_bh(&dst_garbage.lock);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 9c3a839..0ae3994 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -135,9 +135,10 @@ static void linkwatch_schedule_work(int urgent)
 * override the existing timer.
 */
if (test_bit(LW_URGENT, &linkwatch_flags))
-   mod_delayed_work(system_wq, &linkwatch_work, 0);
+   mod_delayed_work(system_power_efficient_wq, &linkwatch_work, 0);
else
-   schedule_delayed_work(&linkwatch_work, delay);
+   queue_delayed_work(system_power_efficient_wq,
+   &linkwatch_work, delay);
 }
 
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c03f3de..2c8f839 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -101,7 +101,8 @@ static void queue_process(struct work_struct *work)
__netif_tx_unlock(txq);
local_irq_restore(flags);
 
-   schedule_delayed_work(&npinfo->tx_work, HZ/10);
+   queue_delayed_work(system_power_efficient_wq,
+   &npinfo->tx_work, HZ/10);
return;
}
__netif_tx_unlock(txq);
@@ -423,7 +424,8 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
sk_buff *skb,
 
if (status != NETDEV_TX_OK) {
skb_queue_tail(&npinfo->txq, skb);
-   schedule_delayed_work(&npinfo->tx_work,0);
+   queue_delayed_work(system_power_efficient_wq,
+   &npinfo->tx_work, 0);
}
 }
 EXPORT_SYMBOL(netpoll_send_skb_on_dev);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] afs: proc cells and rootcell are writeable

2014-01-30 Thread David Howells
David Howells  wrote:

> > > I think this is a pretty strong argument. Counter-arguments, anybody?
> > 
> > Yes.  CAP_DAC_READ_SEARCH.
> 
> No, it would seem unlikely it's that, but I guess there's another capability
> override because the process is owned by root.

CAP_DAC_OVERRIDE, I think.

int generic_permission(struct inode *inode, int mask)
{
...
/*
 * Read/write DACs are always overridable.
 * Executable DACs are overridable when there is
 * at least one exec bit set.
 */
if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
if (inode_capable(inode, CAP_DAC_OVERRIDE))
return 0;
...
}

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] afs: proc cells and rootcell are writeable

2014-01-30 Thread David Howells
David Howells  wrote:

> > I think this is a pretty strong argument. Counter-arguments, anybody?
> 
> Yes.  CAP_DAC_READ_SEARCH.

No, it would seem unlikely it's that, but I guess there's another capability
override because the process is owned by root.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers

2014-01-30 Thread David Miller
From: Richard Yao 
Date: Thu, 30 Jan 2014 13:02:48 -0500

> The 9p-virtio transport does zero copy on things larger than 1024 bytes
> in size. It accomplishes this by returning the physical addresses of
> pages to the virtio-pci device. At present, the translation is usually a
> bit shift.
> 
> However, that approach produces an invalid page address when we
> read/write to vmalloc buffers, such as those used for Linux kernle
> modules. This causes QEMU to die printing:
> 
> qemu-system-x86_64: virtio: trying to map MMIO memory
> 
> This patch enables 9p-virtio to correctly handle this case. This not
> only enables us to load Linux kernel modules off virtfs, but also
> enables ZFS file-based vdevs on virtfs to be used without killing QEMU.
> 
> Also, special thanks to both Avi Kivity and Alexander Graf for their
> interpretation of QEMU backtraces. Without their guidence, tracking down
> this bug would have taken much longer.
> 
> Signed-off-by: Richard Yao 
> Acked-by: Alexander Graf 
> Reviewed-by: Will Deacon 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: set default DEVTYPE for all ethernet based devices

2014-01-30 Thread David Miller
From: Tom Gundersen 
Date: Thu, 30 Jan 2014 14:20:02 +0100

> In systemd's networkd and udevd, we would like to give the administrator a
> simple way to filter net devices by their DEVTYPE [0][1]. Other software
> such as ConnMan and NetworkManager uses a similar filtering already.
> 
> Currently, plain ethernet devices have DEVTYPE=(null). This patch sets the
> devtype to "ethernet" instead. This avoids the need for special-casing the
> DEVTYPE=(null) case in userspace, and also avoids false positives, as there
> are several other types of netdevs that also have DEVTYPE=(null).
> 
> Notice that this is done, as suggested by Marcel, in alloc_etherdev_mqs(),
> and as best I can tell it will not give any false positives. I considered
> doing it in ether_setup() instead as that seemed more intuitive, but that
> would give a lot of false positives indeed.
> 
> [0]: 
> 
> [1]: 
> 
> Signed-off-by: Tom Gundersen 

Assuming that all users of alloc_etherdev*() are ethernet devices is
really not going to work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] memcg: Low-limit reclaim

2014-01-30 Thread Greg Thelen
On Thu, Jan 30 2014, Michal Hocko wrote:

> On Wed 29-01-14 11:08:46, Greg Thelen wrote:
> [...]
>> The series looks useful.  We (Google) have been using something similar.
>> In practice such a low_limit (or memory guarantee), doesn't nest very
>> well.
>> 
>> Example:
>>   - parent_memcg: limit 500, low_limit 500, usage 500
>> 1 privately charged non-reclaimable page (e.g. mlock, slab)
>>   - child_memcg: limit 500, low_limit 500, usage 499
>
> I am not sure this is a good example. Your setup basically say that no
> single page should be reclaimed. I can imagine this might be useful in
> some cases and I would like to allow it but it sounds too extreme (e.g.
> a load which would start trashing heavily once the reclaim starts and it
> makes more sense to start it again rather than crowl - think about some
> mathematical simulation which might diverge).

Pages will still be reclaimed the usage_in_bytes is exceeds
limit_in_bytes.  I see the low_limit as a way to tell the kernel: don't
reclaim my memory due to external pressure, but internal pressure is
different.

>> If a streaming file cache workload (e.g. sha1sum) starts gobbling up
>> page cache it will lead to an oom kill instead of reclaiming. 
>
> Does it make any sense to protect all of such memory although it is
> easily reclaimable?

I think protection makes sense in this case.  If I know my workload
needs 500 to operate well, then I reserve 500 using low_limit.  My app
doesn't want to run with less than its reservation.

>> One could argue that this is working as intended because child_memcg
>> was promised 500 but can only get 499.  So child_memcg is oom killed
>> rather than being forced to operate below its promised low limit.
>> 
>> This has led to various internal workarounds like:
>> - don't charge any memory to interior tree nodes (e.g. parent_memcg);
>>   only charge memory to cgroup leafs.  This gets tricky when dealing
>>   with reparented memory inherited to parent from child during cgroup
>>   deletion.
>
> Do those need any protection at all?

Interior tree nodes don't need protection from their children.  But
children and interior nodes need protection from siblings and parents.

>> - don't set low_limit on non leafs (e.g. do not set low limit on
>>   parent_memcg).  This constrains the cgroup layout a bit.  Some
>>   customers want to purchase $MEM and setup their workload with a few
>>   child cgroups.  A system daemon hands out $MEM by setting low_limit
>>   for top-level containers (e.g. parent_memcg).  Thereafter such
>>   customers are able to partition their workload with sub memcg below
>>   child_memcg.  Example:
>>  parent_memcg
>>  \
>>   child_memcg
>> / \
>> server   backup
>
> I think that the low_limit makes sense where you actually want to
> protect something from reclaim. And backup sounds like a bad fit for
> that.

The backup job would presumably have a small low_limit, but it may still
have a minimum working set required to make useful forward progress.

Example:
  parent_memcg
  \
   child_memcg limit 500, low_limit 500, usage 500
 / \
 |   backup   limit 10, low_limit 10, usage 10
 |
  server limit 490, low_limit 490, usage 490

One could argue that problems appear when
server.low_limit+backup.lower_limit=child_memcg.limit.  So the safer
configuration is leave some padding:
  server.low_limit + backup.low_limit + padding = child_memcg.limit
but this just defers the problem.  As memory is reparented into parent,
then padding must grow.

>>   Thereafter customers often want some weak isolation between server and
>>   backup.  To avoid undesired oom kills the server/backup isolation is
>>   provided with a softer memory guarantee (e.g. soft_limit).  The soft
>>   limit acts like the low_limit until priority becomes desperate.
>
> Johannes was already suggesting that the low_limit should allow for a
> weaker semantic as well. I am not very much inclined to that but I can
> leave with a knob which would say oom_on_lowlimit (on by default but
> allowed to be set to 0). We would fallback to the full reclaim if
> no groups turn out to be reclaimable.

I like the strong semantic of your low_limit at least at level:1 cgroups
(direct children of root).  But I have also encountered situations where
a strict guarantee is too strict and a mere preference is desirable.
Perhaps the best plan is to continue with the proposed strict low_limit
and eventually provide an additional mechanism which provides weaker
guarantees (e.g. soft_limit or something else if soft_limit cannot be
altered).  These two would offer good support for a variety of use
cases.

I thinking of something like:

bool mem_cgroup_reclaim_eligible(struct mem_cgroup *memcg,
struct mem_cgroup *root,
int priority)
{
do {
if (memcg == root)
break;
if (!res_counter_low_limit_excess(&mem

Re: [PATCH] afs: proc cells and rootcell are writeable

2014-01-30 Thread David Howells
Linus Torvalds  wrote:

> I think this is a pretty strong argument. Counter-arguments, anybody?

Yes.  CAP_DAC_READ_SEARCH.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

> On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote:
> > 
> > On Thu, 30 Jan 2014, James Bottomley wrote:
> > 
> > > Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
> > > allow 64 bit offsets for block devices on 32 bit.  It sounds like
> > > there's somewhere not using sector_t ... or using it wrongly which needs
> > > fixing.
> > 
> > The page cache uses unsigned long as a page index. Therefore, if unsigned 
> > long is 32-bit, the block device may have at most 2^32-1 pages.
> 
> Um, that's the index into the mapping, not the device; a device can have
> multiple mappings and each mapping has a radix tree of pages.  For most
> filesystems a mapping is equivalent to a file, so we can have large
> filesystems, but they can't have files over actually 4GB on 32 bits
> otherwise mmap fails.

A device may be accessed direcly (by opening /dev/sdX) and it creates a 
mapping too - thus, the size of a mapping limits the size of a block 
device.

The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
fix it - but there may be some hidden places where pgoff is converted to 
unsigned long - who knows, if they exist or not?

> Are we running into a problems with struct address_space where we've
> assumed the inode belongs to the file and lvm is doing something where
> it's the whole device?

lvm creates a 64TiB device, udev runs blkid on that device and blkid opens 
the device and gets stuck because of unsigned long overflow.

> > > > On 32-bit architectures, we must limit block device size to
> > > > PAGE_SIZE*(2^32-1).
> > > 
> > > So you're saying CONFIG_LBDAF can never work, why?
> > > 
> > > James
> > 
> > CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
> > without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
> > 16TiB (4096*2^32).
> 
> I don't think the people who did the large block device work expected to
> gain only 3 bits for all their pain.
> 
> James

One could change it to have three choices:
2TiB limit - 32-bit sector_t and 32-bit pgoff_t
16TiB limit - 64-bit sector_t and 32-bit pgoff_t
32PiB limit - 64-bit sector_t and 64-bit pgoff_t

Though, we need to know if the people who designed memory management agree 
with changing pgoff_t to 64 bits.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] afs: proc cells and rootcell are writeable

2014-01-30 Thread David Howells

Further:

[root@andromeda ~]# touch /tmp/foo
[root@andromeda ~]# chmod 0444 /tmp/foo
[root@andromeda ~]# ls -l /tmp/foo
-r--r--r--. 1 root root 0 Jan 31 00:17 /tmp/foo
[root@andromeda ~]# echo hello >/tmp/foo
[root@andromeda ~]# ls -l /tmp/foo
-r--r--r--. 1 root root 6 Jan 31 00:17 /tmp/foo
[root@andromeda ~]# 

But:

[root@andromeda ~]# su - dhowells
[dhowells@andromeda ~]$ touch /tmp/bar
[dhowells@andromeda ~]$ chmod 0444 /tmp/bar
[dhowells@andromeda ~]$ ls -l /tmp/bar
-r--r--r--. 1 dhowells dhowells 0 Jan 31 00:19 /tmp/bar
[dhowells@andromeda ~]$ echo hello >/tmp/bar
-bash: /tmp/bar: Permission denied

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ceph: fix posix ACL hooks

2014-01-30 Thread Sage Weil
On Thu, 30 Jan 2014, Linus Torvalds wrote:
> On Wed, Jan 29, 2014 at 11:54 PM, Christoph Hellwig  
> wrote:
> >
> > For ->set_acl that's fairly easily doable and I actually had a version
> > doing that to be able to convert 9p.  But for ->get_acl the path walking
> > caller didn't seem easily feasible.  ->get_acl actually is an invention
> > of yours, so if you got a good idea to get the dentry to it I'd love
> > to be able to pass it.
> 
> Yeah, that's pretty annoying, largely because that path is also
> RCU-walk aware, which does *not* need this all (because it will never
> call down into the filesystem - if the acl isn't found in the cached
> acl's, we just abort).
> 
> And we're going through that very common "generic_permission()" thing
> that in turn is also often called from the low-level filesystens, and
> it's all fairly tightly integrated with __inode_permission() etc.
> 
> In the end, all the original call-sites should have a dentry, and none
> of this is "fundamental". But you're right, it looks like an absolute
> nightmare to add the dentry pointer through the whole chain. Damn.
> 
> So I'm not thrilled about it, but maybe that "d_find_alias(inode)" to
> find the dentry is good enough in practice. It feels very much
> incorrect (it could find a dentry with a path that you cannot actually
> access on the server, and result in user-visible errors), but I
> definitely see your argument. It may just not be worth the pain for
> this odd ceph case.
> 
> That said, if the ceph people decide to try to bite the bullet and do
> the required conversions to pass the dentry to the permissions
> functions, I think I'd take it unless it ends up being *too* horribly
> messy.

FWIW the dentry isn't useful in the get case; it's only on put that it is 
currently used.  And now that I look closely, it is only being used by 
ceph_setattr to associate the update with the parent directory for the 
purposes of fsync(dirfd)... which is, I think, incorrect anyway (that 
should only flush out/wait for namespace modifications, not inode attr 
updates).

So I think it's fine as is, and we'll clean this up later.

I do have a couple patches on top of what's in your tree, though, that 
clean up a couple duplicated lines in your fix and apply Christoph's 
cleanup:

 git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] Make vsyscall_gtod_data handling x86 generic

2014-01-30 Thread Andi Kleen
> @@ -1335,7 +1335,6 @@ config ARCH_SPARSEMEM_ENABLE
>  
>  config ARCH_SPARSEMEM_DEFAULT
>   def_bool y
> - depends on X86_64

Is that really needed? Why does the vdso need sparsemem?

>  
>  static inline void __user *arch_compat_alloc_user_space(long len)
>  {
> +#ifdef CONFIG_X86_32
> + struct pt_regs *regs = task_pt_regs(current);
> + return (void __user *)regs->sp - len;
> +#else
>   compat_uptr_t sp;

and that? why does a vdso need to allocate things on the user page?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] afs: proc cells and rootcell are writeable

2014-01-30 Thread David Howells
Eric W. Biederman  wrote:

> These files have been read-only since this code was merged in 2002.
> Over a decade of not being used seems like a strong indication that no
> one cares about the write path.

Actually, things aren't as simple as they seem.  Without the patch applied:

[root@andromeda ~]# ls -l /proc/fs/afs/cells
-r--r--r--. 1 root root 0 Jan 31 00:04 /proc/fs/afs/cells
[root@andromeda ~]# echo add your-file-system.com 204.29.154.37 
>/proc/fs/afs/cells
[root@andromeda ~]# 

You'll observe there is no error reported on the echo command.

Further, looking in dmesg, I see:

kAFS: Added new cell 'your-file-system.com'

So the file *is* writable, *despite* i_mode.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] x86/build additional for v3.14

2014-01-30 Thread H. Peter Anvin
Hi Linus,

Various build-related minor bits.

Most of this is work by David Woodhouse to be able to compile the
early boot code with clang/llvm; we have also managed to push an
actual -m16 option into gcc 4.9 so this makes us use that option if
available instead of hacking it.

The balance is a patch from Michael Davidson to the relocs program to
help manual debugging.

None of these should change the actual compiled binary with currently
released compilers.

The following changes since commit f4bcd8ccddb02833340652e9f46f5127828eb79d:

  Merge branch 'x86-kaslr-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2014-01-20 14:45:50 
-0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-build-for-linus

for you to fetch changes up to de3accdaec88851874c573031de007283e90b199:

  x86, build: Build 16-bit code with -m16 where possible (2014-01-30 08:05:36 
-0800)


David Woodhouse (4):
  x86: Remove duplication of 16-bit CFLAGS
  x86, boot: Use __attribute__((used)) to ensure videocard structs are 
emitted
  x86, boot: Fix word-size assumptions in has_eflag() inline asm
  x86, build: Build 16-bit code with -m16 where possible

H. Peter Anvin (1):
  Merge commit 'f4bcd8ccddb02833340652e9f46f5127828eb79d' into x86/build

Michael Davidson (1):
  x86, relocs: Add manual debug mode

 arch/x86/Makefile  | 22 ++
 arch/x86/boot/Makefile | 15 +--
 arch/x86/boot/cpuflags.c   | 25 -
 arch/x86/boot/video.h  |  2 +-
 arch/x86/realmode/rm/Makefile  | 17 ++---
 arch/x86/tools/relocs.c| 30 +-
 arch/x86/tools/relocs.h|  7 ---
 arch/x86/tools/relocs_common.c | 16 
 8 files changed, 91 insertions(+), 43 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 13b22e0f681d..eeda43abed6e 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -11,6 +11,28 @@ else
 KBUILD_DEFCONFIG := $(ARCH)_defconfig
 endif
 
+# How to compile the 16-bit code.  Note we always compile for -march=i386;
+# that way we can complain to the user if the CPU is insufficient.
+#
+# The -m16 option is supported by GCC >= 4.9 and clang >= 3.5. For
+# older versions of GCC, we need to play evil and unreliable tricks to
+# attempt to ensure that our asm(".code16gcc") is first in the asm
+# output.
+CODE16GCC_CFLAGS := -m32 -include $(srctree)/arch/x86/boot/code16gcc.h \
+   $(call cc-option, -fno-toplevel-reorder,\
+ $(call cc-option, -fno-unit-at-a-time))
+M16_CFLAGS  := $(call cc-option, -m16, $(CODE16GCC_CFLAGS))
+
+REALMODE_CFLAGS:= $(M16_CFLAGS) -g -Os -D__KERNEL__ \
+  -DDISABLE_BRANCH_PROFILING \
+  -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \
+  -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
+  -mno-mmx -mno-sse \
+  $(call cc-option, -ffreestanding) \
+  $(call cc-option, -fno-stack-protector) \
+  $(call cc-option, -mpreferred-stack-boundary=2)
+export REALMODE_CFLAGS
+
 # BITS is used as extension for files which are available in a 32 bit
 # and a 64 bit version to simplify shared Makefiles.
 # e.g.: obj-y += foo_$(BITS).o
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index de7066918005..878df7e88cd4 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -51,20 +51,7 @@ $(obj)/cpustr.h: $(obj)/mkcpustr FORCE
 
 # ---
 
-# How to compile the 16-bit code.  Note we always compile for -march=i386,
-# that way we can complain to the user if the CPU is insufficient.
-KBUILD_CFLAGS  := $(USERINCLUDE) -m32 -g -Os -D_SETUP -D__KERNEL__ \
-  -DDISABLE_BRANCH_PROFILING \
-  -Wall -Wstrict-prototypes \
-  -march=i386 -mregparm=3 \
-  -include $(srctree)/$(src)/code16gcc.h \
-  -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
-  -mno-mmx -mno-sse \
-  $(call cc-option, -ffreestanding) \
-  $(call cc-option, -fno-toplevel-reorder,\
-  $(call cc-option, -fno-unit-at-a-time)) \
-  $(call cc-option, -fno-stack-protector) \
-  $(call cc-option, -mpreferred-stack-boundary=2)
+KBUILD_CFLAGS  := $(USERINCLUDE) $(REALMODE_CFLAGS) -D_SETUP
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
 
diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c
index a9fcb7cfb241..431fa5f84537 100644
--- a/arch/x86/boot/cpuflags.c
+++ b/arch/x86/boot/cpuflags.c
@@ -28,20 +28,35 @@ static int has_fpu(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
 }
 
+/*
+

Re: [PATCH v2 3/5] spi: sunxi: Add Allwinner A31 SPI controller driver

2014-01-30 Thread Kevin Hilman
On Wed, Jan 29, 2014 at 5:32 AM, Maxime Ripard
 wrote:
> On Wed, Jan 29, 2014 at 12:25:20PM +, Mark Brown wrote:
>> On Wed, Jan 29, 2014 at 12:10:48PM +0100, Maxime Ripard wrote:
>>
>> > +config SPI_SUN6I
>> > +   tristate "Allwinner A31 SPI controller"
>> > +   depends on ARCH_SUNXI || COMPILE_TEST
>> > +   select PM_RUNTIME
>> > +   help
>> > + This enables using the SPI controller on the Allwinner A31 SoCs.
>> > +
>>
>> A select of PM_RUNTIME is both surprising and odd - why is that there?
>> The usual idiom is that the device starts out powered up (flagged using
>> pm_runtime_set_active()) and then runtime PM then suspends it when it's
>> compiled in.  That way if for some reason people want to avoid runtime
>> PM they can still use the device.
>
> Since pm_runtime_set_active and all the pm_runtime* callbacks in
> general are defined to pretty much empty functions, how the
> suspend/resume callbacks are called then? Obviously, we need them to
> be run, hence why I added the select here, but now I'm seeing a
> construct like what's following acceptable then?

Even with your 'select', The runtime PM callbacks will never be called
in the current driver.  pm_runtime_enable() doesn't do any runtime PM
transitions.  It just allows transitions to happen when they're
triggered by _get()/_put()/etc.

> pm_runtime_enable(&pdev->dev);
> if (!pm_runtime_enabled(&pdev->dev))
>sun6i_spi_runtime_resume(&pdev->dev);

Similarily here, it's not the pm_runtime_enable that will fail when
runtime PM is disabled (or not built-in), it's a pm_runtime_get_sync()
that will fail.

What you want is something like this in ->probe()

   sun6i_spi_runtime_resume();
   /* now, device is always activated whether or not runtime PM is enabled */
   pm_runtime_enable();
   pm_runtime_set_active();  /* tells runtime PM core device is
already active */
   pm_runtime_get_sync();

This 'get' will increase the usecount, but not actually call the
callbacks because we told the RPM core that the device was already
activated with _set_active().

And then, in ->remove(), you'll want

   pm_runtime_put();
   pm_runtime_disable();

And if runtime PM is not enabled in the kernel, then the device will
be left on (which is kinda what you want if you didn't build runtime
PM into the kernel.)

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: WaitForMultipleObjects/etc. In Kernel

2014-01-30 Thread Network Nut
> -Original Message-
> From: Clemens Ladisch [mailto:clem...@ladisch.de]
> Sent: Wednesday, January 29, 2014 2:31 AM
> To: Network Nut
> Cc: linux-kernel@vger.kernel.org
> Subject: RE: WaitForMultipleObjects/etc. In Kernel
> 
> Network Nut wrote:
> >I was looking at POSIX because it allows naming of the primitives.
> 
> Linux uses two orthogonal mechanisms for synchronization primitives and for
> naming/sharing.
> 
> >I need to epoll_wait on inter-process {mutex, event, semaphore}.
> 
> Use eventfd.
> 
> >I need to reference inter-process {mutex, event, semaphore}, each
> >identified by string, if feasible.
> 
> Send the fd through a Unix domain socket.

Hi Again,

I was thinking that, rather than as for specifics, I should present my general 
problem, and ask how long-time Linux experts would solve it.

I have a master process M, that executes continually, from the birth to death 
of user-session.

I have many (distinct) processes that will be launched, and these processes, 
P1, P2, ...Pn, expect to see that M is executing. These processes:

1. expect to have access to a shared-memory section that already exists because 
M created it
2. expect to use a semaphore that already exists because M created it
3. expect to use a mutex that exists because M created it

P1, P2, ...Pn all know the path of image on disk of M. They are also permitted 
to maintain a fixed string that can be used to "get at" the mutex and semaphore.

How would P1, P2, ...Pn get at the semaphore that M created?

Please note that M cannot have any prior knowledge at all of P1, P2, ...Pn. 
P1...etc. must initiate  communication with M.

[I don't want to misuse/abuse linux-kernel with my personal questions, so if 
there is a more appropriate group, please let me know.]

Regards,

-Net



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Description for memmap in kernel-parameters.txt is wrong

2014-01-30 Thread Randy Dunlap
On 01/30/2014 03:43 PM, Andiry Xu wrote:
> On Thu, Jan 30, 2014 at 2:54 PM, Randy Dunlap  wrote:
>> On 01/30/2014 02:17 PM, David Rientjes wrote:
>>> On Thu, 30 Jan 2014, Randy Dunlap wrote:
>>>
>>> Hi,
>>>
>>> In kernel-parameters.txt, there is following description:
>>>
>>> memmap=nn[KMG]$ss[KMG]
>>> [KNL,ACPI] Mark specific memory as reserved.
>>> Region of memory to be used, from ss to ss+nn.
>>
>> Should be:
>>   Region of memory to be reserved, from ss to 
>> ss+nn.
>>
>> but that doesn't help with the problem that you describe, does it?
>>
>
> Actually it should be:
>  Region of memory to be reserved, from nn to 
> nn+ss.
>
> That is, exchange nn and ss.

 Yes, I understand that that's what you are reporting.  I just haven't yet
 worked out how the code manages to exchange those 2 values.

>>>
>>> It doesn't, the documentation is correct as written and could be improved
>>> by your suggestion of "Region of memory to be reserved, from ss to ss+nn."
>>> I think Andiry probably is having a problem with his bootloader
>>> interpreting the '$' incorrectly (or variable expansion if coming from the
>>> shell) or interpreting the resulting user-defined e820 map incorrectly.
>>> --
>>
>> Yeah, I certainly don't see a problem with the code and I would want to
>> see/understand that before I exchanged the 2 values in the documentation.
>>
>> I'll submit a patch to make the wording a bit better.
>>
> 
> I'm using Ubuntu 13.04 with GRUB2. If it's a bootloader issue, what should I 
> do?

See https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/448413

i.e., use shell escape '\' character.

-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Description for memmap in kernel-parameters.txt is wrong

2014-01-30 Thread Andiry Xu
On Thu, Jan 30, 2014 at 2:54 PM, Randy Dunlap  wrote:
> On 01/30/2014 02:17 PM, David Rientjes wrote:
>> On Thu, 30 Jan 2014, Randy Dunlap wrote:
>>
>> Hi,
>>
>> In kernel-parameters.txt, there is following description:
>>
>> memmap=nn[KMG]$ss[KMG]
>> [KNL,ACPI] Mark specific memory as reserved.
>> Region of memory to be used, from ss to ss+nn.
>
> Should be:
>   Region of memory to be reserved, from ss to 
> ss+nn.
>
> but that doesn't help with the problem that you describe, does it?
>

 Actually it should be:
  Region of memory to be reserved, from nn to 
 nn+ss.

 That is, exchange nn and ss.
>>>
>>> Yes, I understand that that's what you are reporting.  I just haven't yet
>>> worked out how the code manages to exchange those 2 values.
>>>
>>
>> It doesn't, the documentation is correct as written and could be improved
>> by your suggestion of "Region of memory to be reserved, from ss to ss+nn."
>> I think Andiry probably is having a problem with his bootloader
>> interpreting the '$' incorrectly (or variable expansion if coming from the
>> shell) or interpreting the resulting user-defined e820 map incorrectly.
>> --
>
> Yeah, I certainly don't see a problem with the code and I would want to
> see/understand that before I exchanged the 2 values in the documentation.
>
> I'll submit a patch to make the wording a bit better.
>

I'm using Ubuntu 13.04 with GRUB2. If it's a bootloader issue, what should I do?

Thanks,
Andiry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] net: ipv4: move inetpeer garbage collector work to power efficient workqueue

2014-01-30 Thread Zoran Markovic
From: Shaibal Dutta 

Garbage collector work does not have to be bound to the CPU that scheduled
it. By moving work to the power-efficient workqueue, the selection of
CPU executing the work is left to the scheduler. This extends idle
residency times and conserves power.

This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.

Cc: "David S. Miller" 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Signed-off-by: Shaibal Dutta 
[zoran.marko...@linaro.org: Rebased to latest kernel version. Added
commit message.]
Signed-off-by: Zoran Markovic 
---
 net/ipv4/inetpeer.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 48f4244..87155aa 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -161,7 +161,8 @@ static void inetpeer_gc_worker(struct work_struct *work)
list_splice(&list, &gc_list);
spin_unlock_bh(&gc_lock);
 
-   schedule_delayed_work(&gc_work, gc_delay);
+   queue_delayed_work(system_power_efficient_wq,
+   &gc_work, gc_delay);
 }
 
 /* Called from ip_output.c:ip_init  */
@@ -576,7 +577,8 @@ static void inetpeer_inval_rcu(struct rcu_head *head)
list_add_tail(&p->gc_list, &gc_list);
spin_unlock_bh(&gc_lock);
 
-   schedule_delayed_work(&gc_work, gc_delay);
+   queue_delayed_work(system_power_efficient_wq,
+   &gc_work, gc_delay);
 }
 
 void inetpeer_invalidate_tree(struct inet_peer_base *base)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote:
> 
> On Thu, 30 Jan 2014, James Bottomley wrote:
> 
> > Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
> > allow 64 bit offsets for block devices on 32 bit.  It sounds like
> > there's somewhere not using sector_t ... or using it wrongly which needs
> > fixing.
> 
> The page cache uses unsigned long as a page index. Therefore, if unsigned 
> long is 32-bit, the block device may have at most 2^32-1 pages.

Um, that's the index into the mapping, not the device; a device can have
multiple mappings and each mapping has a radix tree of pages.  For most
filesystems a mapping is equivalent to a file, so we can have large
filesystems, but they can't have files over actually 4GB on 32 bits
otherwise mmap fails.

Are we running into a problems with struct address_space where we've
assumed the inode belongs to the file and lvm is doing something where
it's the whole device?

> > > On 32-bit architectures, we must limit block device size to
> > > PAGE_SIZE*(2^32-1).
> > 
> > So you're saying CONFIG_LBDAF can never work, why?
> > 
> > James
> 
> CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
> without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
> 16TiB (4096*2^32).

I don't think the people who did the large block device work expected to
gain only 3 bits for all their pain.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: ensure locality of task_struct allocations

2014-01-30 Thread David Rientjes
On Thu, 30 Jan 2014, Nishanth Aravamudan wrote:

> In the presence of memoryless nodes, numa_node_id() will return the
> current CPU's NUMA node, but that may not be where we expect to allocate
> from memory from. Instead, we should rely on the fallback code in the
> memory allocator itself, by using NUMA_NO_NODE. Also, when calling
> kthread_create_on_node(), use the nearest node with memory to the cpu in
> question, rather than the node it is running on.
> 
> Signed-off-by: Nishanth Aravamudan 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V0] linux PVH: Set CR4 flags

2014-01-30 Thread Mukesh Rathor
On Thu, 30 Jan 2014 11:40:44 +
Roger Pau Monné  wrote:

> On 30/01/14 00:15, Mukesh Rathor wrote:
> > Konrad,
> > 
> > The CR4 settings were dropped from my earlier patch because you
> > didn't wanna enable them. But since you do now, we need to set them
> > in the APs also. If you decide not too again, please apply my prev
> > patch "pvh: disable pse feature for now".
> 
> Hello Mukesh,
> 
> Could you push your CR related patches to a git repo branch? I'm
> currently having a bit of a mess in figuring out which ones should be
> applied and in which order.
> 
> Thanks, Roger.

Hey Roger,

Unfortunately, I don't have them in a tree because my first patch was 
changed during merge, and also the tree was refreshed.  Basically, the end
result, we leave features enabled on linux side, thus setting not only
the cr0 bits, but also the cr4 PSE and PGE for APs (they were already
set for the BSP). 

Konrad only merged the CR0 setting part of my first patch, hence this 
patch to set the CR4 bits. Hope that makes sense. My latest tree is:

http://oss.us.oracle.com/git/mrathor/linux.git  muk2

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipv6: default route for link local address is not added while assigning a address

2014-01-30 Thread Hannes Frederic Sowa
Sorry for replying so late...

On Wed, Jan 29, 2014 at 11:38:47AM +0100, Nicolas Dichtel wrote:
> Le 29/01/2014 07:41, Sohny Thomas a écrit :
> >Resending this on netdev mailing list:
> >Default route for link local address is configured automatically if
> >NETWORKING_IPV6=yes is in ifcfg-eth*.
> >When the route table for the interface is flushed and a new address is 
> >added to
> >the same device with out removing linklocal addr, default route for link 
> >local
> >address has to added by default.
> >
> >I have found the issue to be caused by this checkin
> >
> >http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/ipv6?id=62b54dd91567686a1cb118f76a72d5f4764a86dd
> >
> >
> >According to this change :
> >He removes adding a link local route if any other address is added , 
> >applicable
> >across all interfaces though there's mentioned only lo interface
> >So below patch fixes for other devices
> >
> >Signed-off-by: Sohny THomas 
> Your email client has corrupted the patch, it cannot be applied.
> Please read Documentation/email-clients.txt
> 
> About the patch, I still think that the flush is too agressive. Link local
> routes are marked as 'proto kernel', removing them without the link local
> address is wrong.

Actually I am not so sure, there is no defined semantic of flush. I would
be ok with all three solutions: leave it as is, always add link-local
address (it does not matter if we don't have a link-local address on
that interface, as a global scoped one is just fine enough) or make flush not
remove the link-local address (but this seems a bit too special cased for me).

Greetings,

  Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

> Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
> allow 64 bit offsets for block devices on 32 bit.  It sounds like
> there's somewhere not using sector_t ... or using it wrongly which needs
> fixing.

The page cache uses unsigned long as a page index. Therefore, if unsigned 
long is 32-bit, the block device may have at most 2^32-1 pages.

> > On 32-bit architectures, we must limit block device size to
> > PAGE_SIZE*(2^32-1).
> 
> So you're saying CONFIG_LBDAF can never work, why?
> 
> James

CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
16TiB (4096*2^32).

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] net: wireless: move regulatory timeout work to power efficient workqueue

2014-01-30 Thread Zoran Markovic
From: Shaibal Dutta 

For better use of CPU idle time, allow the scheduler to select the CPU
on which the timeout work of regulatory settings would be executed.
This extends CPU idle residency time and saves power.

This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.

Cc: Johannes Berg 
Cc: "John W. Linville" 
Cc: "David S. Miller" 
Signed-off-by: Shaibal Dutta 
[zoran.marko...@linaro.org: Rebased to latest kernel. Added commit message.]
Signed-off-by: Zoran Markovic 
---
 net/wireless/reg.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 9b897fc..6e21011 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -1703,7 +1703,8 @@ static void reg_process_hint(struct regulatory_request 
*reg_request)
if (treatment == REG_REQ_OK ||
treatment == REG_REQ_ALREADY_SET)
return;
-   schedule_delayed_work(®_timeout, msecs_to_jiffies(3142));
+   queue_delayed_work(system_power_efficient_wq,
+  ®_timeout, msecs_to_jiffies(3142));
return;
case NL80211_REGDOM_SET_BY_DRIVER:
treatment = reg_process_hint_driver(wiphy, reg_request);
@@ -2294,7 +2295,8 @@ static int reg_set_rd_driver(const struct 
ieee80211_regdomain *rd,
 
request_wiphy = wiphy_idx_to_wiphy(driver_request->wiphy_idx);
if (!request_wiphy) {
-   schedule_delayed_work(®_timeout, 0);
+   queue_delayed_work(system_power_efficient_wq,
+  ®_timeout, 0);
return -ENODEV;
}
 
@@ -2354,7 +2356,8 @@ static int reg_set_rd_country_ie(const struct 
ieee80211_regdomain *rd,
 
request_wiphy = wiphy_idx_to_wiphy(country_ie_request->wiphy_idx);
if (!request_wiphy) {
-   schedule_delayed_work(®_timeout, 0);
+   queue_delayed_work(system_power_efficient_wq,
+  ®_timeout, 0);
return -ENODEV;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kthread: ensure locality of task_struct allocations

2014-01-30 Thread Nishanth Aravamudan
On 30.01.2014 [14:47:05 -0800], David Rientjes wrote:
> On Wed, 29 Jan 2014, Eric Dumazet wrote:
> 
> > > Eric, did you try this when writing 207205a2ba26 ("kthread: NUMA aware 
> > > kthread_create_on_node()") or was it always numa_node_id() from the 
> > > beginning?
> > 
> > Hmm, I think I did not try this, its absolutely possible NUMA_NO_NODE
> > was better here.
> > 
> 
> Nishanth, could you change your patch to just return NUMA_NO_NODE for the 
> non-kthreadd case?

Something like the following?


In the presence of memoryless nodes, numa_node_id() will return the
current CPU's NUMA node, but that may not be where we expect to allocate
from memory from. Instead, we should rely on the fallback code in the
memory allocator itself, by using NUMA_NO_NODE. Also, when calling
kthread_create_on_node(), use the nearest node with memory to the cpu in
question, rather than the node it is running on.

Signed-off-by: Nishanth Aravamudan 
Cc: Anton Blanchard 
Cc: Christoph Lameter 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: Oleg Nesterov 
Cc: Jan Kara 
Cc: David Rientjes 
Cc: Thomas Gleixner 
Cc: Tetsuo Handa 
Cc: linux-kernel@vger.kernel.org
Cc: Wanpeng Li 
Cc: Joonsoo Kim 
Cc: Ben Herrenschmidt 

---
Note that I haven't yet tested this change on the system that reproduce
the original problem yet.

diff --git a/kernel/kthread.c b/kernel/kthread.c
index b5ae3ee..9a130ec 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -217,7 +217,7 @@ int tsk_fork_get_node(struct task_struct *tsk)
if (tsk == kthreadd_task)
return tsk->pref_node_fork;
 #endif
-   return numa_node_id();
+   return NUMA_NO_NODE;
 }
 
 static void create_kthread(struct kthread_create_info *create)
@@ -369,7 +369,7 @@ struct task_struct *kthread_create_on_cpu(int 
(*threadfn)(void *data),
 {
struct task_struct *p;
 
-   p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
+   p = kthread_create_on_node(threadfn, data, cpu_to_mem(cpu), namefmt,
   cpu);
if (IS_ERR(p))
return p;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, cpu hotplug, Fix stack frame warning in check_irq_vectors_for_cpu_disable()

2014-01-30 Thread David Rientjes
On Tue, 28 Jan 2014, Prarit Bhargava wrote:

> Further discussion here: 
> http://marc.info/?l=linux-kernel&m=139073901101034&w=2
> 
> kbuild, 0day kernel build service, outputs the warning:
> 
> arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes
> is larger than 2048 bytes [-Wframe-larger-than=]
> 
> because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the
> stack.   Fix this by moving the two cpumasks to a global file context.
> 
> Signed-off-by: Prarit Bhargava 

Reported-by: Fengguang Wu 
Tested-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, boot: Fix word-size assumptions in has_eflag () inline asm

2014-01-30 Thread David Rientjes
On Thu, 30 Jan 2014, David Woodhouse wrote:

> Commit dd78b97367bd575918204cc89107c1479d3fc1a7 ("x86, boot: Move CPU
> flags out of cpucheck") introduced ambiguous inline asm in the
> has_eflag() function. In 16-bit mode want the instruction to be
> 'pushfl', but we just say 'pushf' and hope the compiler does what we
> wanted.
> 
> When building with 'clang -m16', it won't, because clang doesn't use
> the horrid '.code16gcc' hack that even 'gcc -m16' uses internally.
> 
> Say what we mean and don't make the compiler make assumptions.
> 
> Signed-off-by: David Woodhouse 

Fixes the x86-build-for-linus build error for me, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Description for memmap in kernel-parameters.txt is wrong

2014-01-30 Thread Randy Dunlap
On 01/30/2014 02:17 PM, David Rientjes wrote:
> On Thu, 30 Jan 2014, Randy Dunlap wrote:
> 
> Hi,
>
> In kernel-parameters.txt, there is following description:
>
> memmap=nn[KMG]$ss[KMG]
> [KNL,ACPI] Mark specific memory as reserved.
> Region of memory to be used, from ss to ss+nn.

 Should be:
   Region of memory to be reserved, from ss to 
 ss+nn.

 but that doesn't help with the problem that you describe, does it?

>>>
>>> Actually it should be:
>>>  Region of memory to be reserved, from nn to 
>>> nn+ss.
>>>
>>> That is, exchange nn and ss.
>>
>> Yes, I understand that that's what you are reporting.  I just haven't yet
>> worked out how the code manages to exchange those 2 values.
>>
> 
> It doesn't, the documentation is correct as written and could be improved 
> by your suggestion of "Region of memory to be reserved, from ss to ss+nn."  
> I think Andiry probably is having a problem with his bootloader 
> interpreting the '$' incorrectly (or variable expansion if coming from the 
> shell) or interpreting the resulting user-defined e820 map incorrectly.
> --

Yeah, I certainly don't see a problem with the code and I would want to
see/understand that before I exchanged the 2 values in the documentation.

I'll submit a patch to make the wording a bit better.

Thanks.

-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 15:40 -0500, Mikulas Patocka wrote:
> When running the LVM2 testsuite on 32-bit kernel, there are unkillable
> processes stuck in the kernel consuming 100% CPU:
> blkid   R running  0  2005   1409 0x0004
> ce009d00 0082 ffcf c11280ba 0060 560b5dfd 3111 00fe41cb
>  ce009d00  d51cfeb0  001e 0002 
> 0002 c10748c1 0002 c106cca4    
> Call Trace:
> [] ? radix_tree_next_chunk+0xda/0x2c0
> [] ? release_pages+0x61/0x160
> [] ? find_get_pages+0x84/0x100
> [] ? _cond_resched+0x1e/0x40
> [] ? truncate_inode_pages_range+0x12b/0x440
> [] ? truncate_inode_pages+0x17/0x20
> [] ? __blkdev_put+0x3a/0x140
> [] ? blkdev_close+0x1b/0x40
> [] ? __fput+0x72/0x1c0
> [] ? task_work_run+0x61/0xa0
> [] ? work_notifysig+0x24/0x35
> 
> This is caused by the fact that the LVM2 testsuite creates 64TB device.
> The kernel uses "unsigned long" to index pages in files and block devices,
> on 64TB device "unsigned long" overflows (it can address up to 16TB with
> 4k pages), causing the infinite loop.

Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
allow 64 bit offsets for block devices on 32 bit.  It sounds like
there's somewhere not using sector_t ... or using it wrongly which needs
fixing.

> On 32-bit architectures, we must limit block device size to
> PAGE_SIZE*(2^32-1).

So you're saying CONFIG_LBDAF can never work, why?

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: ensure locality of task_struct allocations

2014-01-30 Thread David Rientjes
On Wed, 29 Jan 2014, Eric Dumazet wrote:

> > Eric, did you try this when writing 207205a2ba26 ("kthread: NUMA aware 
> > kthread_create_on_node()") or was it always numa_node_id() from the 
> > beginning?
> 
> Hmm, I think I did not try this, its absolutely possible NUMA_NO_NODE
> was better here.
> 

Nishanth, could you change your patch to just return NUMA_NO_NODE for the 
non-kthreadd case?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >