Re: [PATCH net] net: flow_dissector: fail on evil iph->ihl

2013-11-01 Thread David Miller
From: Jason Wang 
Date: Fri,  1 Nov 2013 15:01:10 +0800

> We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
> skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
> is evil (less than 5).
> 
> This issue were introduced by commit ec5efe7946280d1e84603389a1030ccec0a767ae
> (rps: support IPIP encapsulation).
> 
> Cc: Eric Dumazet 
> Cc: Petr Matousek 
> Cc: Michael S. Tsirkin 
> Cc: Daniel Borkmann 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-01 Thread Gao feng
Hi Eric,

On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
> 
> Rely on the fact that another flavor of the filesystem is already
> mounted and do not rely on state in the user namespace.
> 
> Verify that the mounted filesystem is not covered in any significant
> way.  I would love to verify that the previously mounted filesystem
> has no mounts on top but there are at least the directories
> /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
> for other filesystems to mount on top of.
> 
> Refactor the test into a function named fs_fully_visible and call that
> function from the mount routines of proc and sysfs.  This makes this
> test local to the filesystems involved and the results current of when
> the mounts take place, removing a weird threading of the user
> namespace, the mount namespace and the filesystems themselves.
> 
> Signed-off-by: "Eric W. Biederman" 
> ---
>  fs/namespace.c |   37 +
>  fs/proc/root.c |7 +--
>  fs/sysfs/mount.c   |3 ++-
>  include/linux/fs.h |1 +
>  include/linux/user_namespace.h |4 
>  kernel/user.c  |2 --
>  kernel/user_namespace.c|2 --
>  7 files changed, 33 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 64627f8..877e427 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
>   return chrooted;
>  }
>  
> -void update_mnt_policy(struct user_namespace *userns)
> +bool fs_fully_visible(struct file_system_type *type)
>  {
>   struct mnt_namespace *ns = current->nsproxy->mnt_ns;
>   struct mount *mnt;
> + bool visible = false;
>  
> - down_read(&namespace_sem);
> + if (unlikely(!ns))
> + return false;
> +
> + namespace_lock();
>   list_for_each_entry(mnt, &ns->list, mnt_list) {
> - switch (mnt->mnt.mnt_sb->s_magic) {
> - case SYSFS_MAGIC:
> - userns->may_mount_sysfs = true;
> - break;
> - case PROC_SUPER_MAGIC:
> - userns->may_mount_proc = true;
> - break;
> + struct mount *child;
> + if (mnt->mnt.mnt_sb->s_type != type)
> + continue;
> +
> + /* This mount is not fully visible if there are any child mounts
> +  * that cover anything except for empty directories.
> +  */
> + list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
> + struct inode *inode = child->mnt_mountpoint->d_inode;
> + if (!S_ISDIR(inode->i_mode))
> + goto next;
> + if (inode->i_nlink != 2)
> + goto next;


I met a problem that proc filesystem failed to mount in user namespace,
The problem is the i_nlink of sysctl entries under proc filesystem is not
2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
1.

btw, I'm not quite understand what's the inode->i_nlink != 2 here means?
is this directory empty? as I know, when we create a file(not dir) under
a dir, the i_nlink of this dir will not increase.

And another question, it looks like if we don't have proc/sys fs mounted,
then proc/sys will be failed to be mounted?

Thanks!

>   }
> - if (userns->may_mount_sysfs && userns->may_mount_proc)
> - break;
> + visible = true;
> + goto found;
> + next:   ;
>   }
> - up_read(&namespace_sem);
> +found:
> + namespace_unlock();
> + return visible;
>  }
>  
>  static void *mntns_get(struct task_struct *task)
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 38bd5d4..45e5fb7 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -110,8 +110,11 @@ static struct dentry *proc_mount(struct file_system_type 
> *fs_type,
>   ns = task_active_pid_ns(current);
>   options = data;
>  
> - if (!current_user_ns()->may_mount_proc ||
> - !ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> + if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
> + return ERR_PTR(-EPERM);
> +
> + /* Does the mounter have privilege over the pid namespace? */
> + if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
>   return ERR_PTR(-EPERM);
>   }
>  
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index afd8327..4a2da3a 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -112,7 +112,8 @@ static struct dentry *sysfs_mount(struct file_system_type 
> *fs_type,
>  

Re: [PATCH] bnx2: Use dev_kfree_skb_any() in bnx2_tx_int()

2013-11-01 Thread David Miller
From: Ben Hutchings 
Date: Fri, 1 Nov 2013 23:34:50 +

> As you've said, the ndo_start_xmit and NAPI poll operations are intended
> to be called in softirq context, so everything that interlocks with them
> will use spin_lock_bh().  Calling them from hardirq context obviously
> opens the possibility of a deadlock.  How do you expect anyone to solve
> that?

That's not what I said.

I did not say that it must be invoked in softirq context.

I said that it MUST LOOK like it is being invoked in softirq
context as far as the ->poll() code paths can tell.

But yes, that's hard.

The thing is, ->poll() is atomic.  You never will have poll calls
recurse into eachother.  This is why we strongly encourage all driver
authors to make their ->poll() implementations lockless.

And, wouldn't you know it, tg3 is a driver that does this
properly.

Therefore, there are no netpoll no locking problems, because the poll
implementation takes no locks and therefore doesn't care.

That's why tg3 doesn't have any of these netpoll issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] perf/x86/amd: AMD support for bp_len > HW_BREAKPOINT_LEN_8

2013-11-01 Thread Borislav Petkov
On Thu, Oct 31, 2013 at 12:23:30PM +0100, Frederic Weisbecker wrote:
> Ok we can keep that naming then, at least on the feature symbol. But
> add a comment on it.

Great, in the latest F16h BKDG the CPUID bit is called
"DataBreakpointExtension". So BPEXT could mean anything :)

So the comment is with the definition of the bit:

+#define X86_FEATURE_BPEXT  (6*32+26) /* data breakpoint extension */

Oh well...

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 4/9] of/irq: Refactor interrupt-map parsing

2013-11-01 Thread Ming Lei
On Sat, Nov 2, 2013 at 2:54 AM, Grant Likely  wrote:
>
> That one was broken. Try this instead.
>
> From bcbffc3d16f49451ef505dc021480aa061465a15 Mon Sep 17 00:00:00 2001
> From: Grant Likely 
> Date: Fri, 1 Nov 2013 10:50:50 -0700
> Subject: [PATCH] of: Fixup interrupt parsing failure.
>
> Signed-off-by: Grant Likely 
> ---
>  drivers/of/irq.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)

This one does fix boot hanging on Arndale board.

Tested-by: Ming Lei 

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers: w1: make w1_slave::flags long to avoid casts

2013-11-01 Thread Рустафа Джамурахметов
Hi

01.11.2013, 23:30, "Andrew Morton" :

> set_bit() operates on longs.  So if we do
>
> struct foo {
> u32 a;
> u32 b;
> } f;
>
> ...
> set_bit(0, (long *)&f.a);
> ...
>
> then we'll scribble on f.b on a big-endian 64-bit machine.

Argh, why would we just don't do that? Its in-memory field, it can be anything,
I wouldn't be surprised if it even can be non-atomic because of proper locks 
already
being held
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 05/13] uprobes: add arch write opcode hook

2013-11-01 Thread David Long

On 10/29/13 15:59, Oleg Nesterov wrote:

On 10/28, Oleg Nesterov wrote:


Yes, yes, sorry for confusion. What I actually tried to suggest is
something like the trivial patch below.

Then arm can do:

uprobe_opcode_t arch_uprobe_swbp_insn(struct arch_uprobe *auprobe)
{
return __opcode_to_mem_arm(auprobe->bpinsn);
}

No?


I notice there don't seem to be any alternative set_swbp functions
in the (rc6) kernel tree


Yes... I think we should simply make it "static". And set_orig_insn()
too.


Or. arm can actually reimplement set_swbp(). This doesn't mean the
duplication of write_opcode() code, we can simply export this helper.



That actually looks to me like the cleanest approach.  I have changed 
the static write_opcode() to a global uprobe_write_opcode(), and now 
call it from an arm set_swbp().


Please do *not* make set_swbp() (and set_orig_insn()) static's.  It 
looks like we now have a use for at least one of them.


Thanks,
-dl

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] per anon_vma lock and turn anon_vma rwsem lock to rwlock_t

2013-11-01 Thread Davidlohr Bueso
On Fri, 2013-11-01 at 11:55 -0700, Linus Torvalds wrote:
> On Fri, Nov 1, 2013 at 11:47 AM, Michel Lespinasse  wrote:
> >
> > Should copy Andrea on this. I talked with him during KS, and there are
> > no current in-tree users who are doing such sleeping; however there
> > are prospective users for networking (RDMA) or GPU stuff who want to
> > use this to let hardware directly copy data into user mappings.
> 
> Tough.
> 
> I spoke up the first time this came up and I'll say the same thing
> again: we're not screwing over the VM subsystem because some crazy
> user might want to do crazy and stupid things that nobody sane cares
> about.
> 
> The whole "somebody might want to .." argument is just irrelevant.

Ok, I was under the impression that this was something already in the
kernel and hence "too late to go back". Based on the results I'm
definitely in favor of the whole rwlock conversion.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] per anon_vma lock and turn anon_vma rwsem lock to rwlock_t

2013-11-01 Thread Davidlohr Bueso
On Fri, 2013-11-01 at 18:16 +0800, Yuanhan Liu wrote:
> On Fri, Nov 01, 2013 at 09:21:46AM +0100, Ingo Molnar wrote:
> > 
> > * Yuanhan Liu  wrote:
> > 
> > > > Btw., another _really_ interesting comparison would be against 
> > > > the latest rwsem patches. Mind doing such a comparison?
> > > 
> > > Sure. Where can I get it? Are they on some git tree?
> > 
> > I've Cc:-ed Tim Chen who might be able to point you to the latest 
> > version.
> > 
> > The last on-lkml submission was in this thread:
> > 
> >   Subject: [PATCH v8 0/9] rwsem performance optimizations
> > 
> 
> Thanks.
> 
> I queued bunchs of tests about one hour ago, and already got some
> results(If necessary, I can add more data tomorrow when those tests are
> finished):

What kind of system are you using to run these workloads on?

> 
> 
>v3.12-rc7  fe001e3de090e179f95d  
>     
> -9.3%   
> brickland1/micro/aim7/shared
> +4.3%   
> lkp-ib03/micro/aim7/fork_test
> +2.2%   lkp-ib03/micro/aim7/shared
> -2.6%   TOTAL 
> aim7.2000.jobs-per-min
> 

Sorry if I'm missing something, but could you elaborate more on what
these percentages represent? Are they anon vma rwsem + optimistic
spinning patches vs anon vma rwlock?

Also, I see your running aim7, you might be interested in some of the
results I found when trying out Ingo's rwlock conversion patch on a
largish 80 core system: https://lkml.org/lkml/2013/9/29/280

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: zram/zsmalloc issues in very low memory conditions

2013-11-01 Thread Bob Liu
Hi Olav,

On 11/02/2013 08:59 AM, Olav Haugan wrote:

> 
> I tried the above suggestion but it does not seem to have any noticeable
> impact. The system is still trying to swap out at a very high rate after
> zram reported failure to swap out. The error logging is actually so much
> that my system crashed due to excessive logging (we have a watchdog that
> is not getting pet because the kernel is busy logging kernel messages).
> 

I have a question that why the low memory killer didn't get triggered in
this situation?
Is it possible to set the LMK a bit more aggressive?

> There isn't anything that can be set to tell the fs layer to back off
> completely for a while (congestion control)?
> 

The other way I think might fix your issue is the same as your mentioned
in your previous email.
Set the congested bit for swap device also.
Like:

diff --git a/drivers/staging/zram/zram_drv.c
b/drivers/staging/zram/zram_drv.c
index 91d94b5..c4fc63e 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -474,6 +474,7 @@ static int zram_bvec_write(struct zram *zram, struct
bio_vec *bvec, u32 index,
if (!handle) {
pr_info("Error allocating memory for compressed page:
%u, size=%zu\n",
index, clen);
+   blk_set_queue_congested(zram->disk->queue, BLK_RW_ASYNC);
ret = -ENOMEM;
goto out;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8ed1b77..1c790ee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -394,8 +394,6 @@ static inline int is_page_cache_freeable(struct page
*page)
 static int may_write_to_queue(struct backing_dev_info *bdi,
  struct scan_control *sc)
 {
-   if (current->flags & PF_SWAPWRITE)
-   return 1;

--

For the update of the congested state of zram, I think you can clear it
from use space eg. after LMK triggered and reclaimed some memory.

Of course this depends on zram driver to export a sysfs node like
"/sys/block/zram0/clear_congested".

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] trace/trace_stat: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Steven Rostedt
On Fri,  1 Nov 2013 15:38:46 -0700
Cody P Schafer  wrote:

> Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
> of opencoding an alternate postorder iteration that modifies the tree
> 
> Signed-off-by: Cody P Schafer 
> ---
>  kernel/trace/trace_stat.c | 42 ++
>  1 file changed, 6 insertions(+), 36 deletions(-)
> 
> diff --git a/kernel/trace/trace_stat.c b/kernel/trace/trace_stat.c
> index 847f88a..fa53acc 100644
> --- a/kernel/trace/trace_stat.c
> +++ b/kernel/trace/trace_stat.c
> @@ -43,46 +43,16 @@ static DEFINE_MUTEX(all_stat_sessions_mutex);
>  /* The root directory for all stat files */
>  static struct dentry *stat_dir;
>  
> -/*
> - * Iterate through the rbtree using a post order traversal path
> - * to release the next node.
> - * It won't necessary release one at each iteration
> - * but it will at least advance closer to the next one
> - * to be released.
> - */
> -static struct rb_node *release_next(struct tracer_stat *ts,
> - struct rb_node *node)
> +static void __reset_stat_session(struct stat_session *session)
>  {
> - struct stat_node *snode;
> - struct rb_node *parent = rb_parent(node);
> -
> - if (node->rb_left)
> - return node->rb_left;
> - else if (node->rb_right)
> - return node->rb_right;
> - else {
> - if (!parent)
> - ;
> - else if (parent->rb_left == node)
> - parent->rb_left = NULL;
> - else
> - parent->rb_right = NULL;
> + struct stat_node *snode, *n;
>  
> - snode = container_of(node, struct stat_node, node);
> - if (ts->stat_release)
> - ts->stat_release(snode->stat);
> + rbtree_postorder_for_each_entry_safe(snode, n, &session->stat_root,
> + node) {

This is one of those cases that a line break is uglier than keeping it
on the same line. Heck, it's only 4 characters over the 80 char limit.

Other than that, I'm fine with this patch. Want me to take this
separately?

-- Steve


> + if (session->ts->stat_release)
> + session->ts->stat_release(snode->stat);
>   kfree(snode);
> -
> - return parent;
>   }
> -}
> -
> -static void __reset_stat_session(struct stat_session *session)
> -{
> - struct rb_node *node = session->stat_root.rb_node;
> -
> - while (node)
> - node = release_next(session->ts, node);
>  
>   session->stat_root = RB_ROOT;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.11 00/66] 3.11.7-stable review

2013-11-01 Thread Guenter Roeck
On Fri, Nov 01, 2013 at 03:06:36PM -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.11.7 release.
> There are 66 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Nov  3 22:04:49 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.11.7-rc1.gz
> and the diffstat can be found below.
> 
Build results look good:
total: 110 pass: 108 skipped: 2 fail: 0

qemu tests all pass.

Details are at http://server.roeck-us.net:8010/builders.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.10 00/54] 3.10.18-stable review

2013-11-01 Thread Guenter Roeck
On Fri, Nov 01, 2013 at 03:03:28PM -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.10.18 release.
> There are 54 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Nov  3 22:00:53 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.18-rc1.gz
> and the diffstat can be found below.
> 
Build results look good:
total: 110 pass: 110 skipped: 0 fail: 0

qemu tests all pass.

Details are at http://server.roeck-us.net:8010/builders.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/32] 3.4.68-stable review

2013-11-01 Thread Guenter Roeck
On Fri, Nov 01, 2013 at 02:43:11PM -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.4.68 release.
> There are 32 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Nov  3 21:41:40 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.68-rc1.gz
> and the diffstat can be found below.
> 
Looks like PATCH in the headline got lost.

Test results look good:
total: 103 pass: 89 skipped: 10 fail: 4

qemu tests all pass. 

The result matches results seen with the previous release.

Details are at http://server.roeck-us.net:8010/builders.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] PCI: export MSI mode using attributes, not kobjects

2013-11-01 Thread Neil Horman
On Fri, Nov 01, 2013 at 05:40:02PM -0600, Bjorn Helgaas wrote:
> On Tue, Oct 29, 2013 at 3:46 PM, Greg Kroah-Hartman
>  wrote:
> > From: Greg Kroah-Hartman 
> >
> > The PCI MSI sysfs code is a mess with kobjects for things that don't
> > really need to be kobjects.  This patch creates attributes dynamically
> > for the MSI interrupts instead of using kobjects.
> >
> > Note, this does not delete the existing sysfs MSI code, but puts the
> > attributes under a "msi_irqs_2" directory for testing / example.
> >
> > Also note, this removes a directory from the current MSI interrupt sysfs
> > code:
> >
> > old MSI kobjects:
> > pci_device
> >└── msi_irqs
> >└── 40
> >└── mode
> >
> > new MSI attributes:
> > pci_device
> >└── msi_irqs_2
> >└── 40
> >
> > As there was only one file "mode" with the kobject model, the interrupt
> > number is now a file that returns the "mode" of the interrupt (msi vs.
> > msix).
> >
> > Signed-off-by: Greg Kroah-Hartman 
> > ---
> >
> > Bjorn, I can make up a patch that rips out the existing kobject code
> > here, but I figured this patch would make things easier to follow
> > instead of having to dig through the removed logic at the same time.
> >
> > I'll clean up the error handling path for the create attribute logic as
> > well, this was just a proof of concept that this could be done.
> >
> > Do you think that anyone cares about the current mode files in sysfs to
> > move things in this manner?
> 
> I like this a lot better than trying to fix all the holes in the
> current kobject code.
> 
> I have no idea who, if anybody, cares about the "mode" files.  I
> assume there's a way to create the "mode" files with attributes, too?
> If so, we could replicate the existing structure with one patch, and
> simplify it with a second patch, so it would be easier to revert the
> directory change while keeping the fix.
> 
> Bjorn
> 
FWIW, I created the mode files because I wanted to be able to add other
attributes to an irq in the future, in case we needed them.  I think time has
show that additional attributes seem unnecessecary, so it makes sense to me to
just include the mode attribute in the irq number file
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-11-01 Thread Neil Horman
On Fri, Nov 01, 2013 at 01:26:52PM -0700, Joe Perches wrote:
> On Fri, 2013-11-01 at 15:58 -0400, Neil Horman wrote:
> > On Fri, Nov 01, 2013 at 12:45:29PM -0700, Joe Perches wrote:
> > > On Fri, 2013-11-01 at 13:37 -0400, Neil Horman wrote:
> > > 
> > > > I think it would be better if we just did the prefetch here
> > > > and re-addressed this area when AVX (or addcx/addox) instructions were 
> > > > available
> > > > for testing on hardware.
> > > 
> > > Could there be a difference if only a single software
> > > prefetch was done at the beginning of transfer before
> > > the while loop and hardware prefetches did the rest?
> > > 
> > I wouldn't think so.  If hardware was going to do any prefetching based on
> > memory access patterns it will do so regardless of the leading prefetch, and
> > that first prefetch isn't helpful because we still wind up stalling on the 
> > adds
> > while its completing
> 
> I imagine one benefit to be helping prevent
> prefetching beyond the actual data required.
> 
> Maybe some hardware optimizes prefetch stride
> better than 5*64.
> 
> I wonder also if using
> 
>   if (count > some_length)
>   prefetch
>   while (...)
> 
> helps small lengths more than the test/jump cost.
> 
We've already done this and it is in fact the best performing.  I'll be posting
that patch along with ingos request to add do_csum to the perf bench code when I
have that done
Best
Neil

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PULL] symbol fix for ARM kallsyms

2013-11-01 Thread Rusty Russell
The following changes since commit 12aee278b50c4a94a93fa0b4d201ae35d792c696:

  Merge branch 'akpm' (fixes from Andrew Morton) (2013-10-30 14:27:10 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git 
tags/fixes-for-linus

for you to fetch changes up to f6537f2f0eba4eba3354e48dbe3047db6d8b6254:

  scripts/kallsyms: filter symbols not in kernel address space (2013-11-02 
09:13:02 +1030)


Last minute perf unbreakage for ARM modules; spent a day in linux-next.


Ming Lei (1):
  scripts/kallsyms: filter symbols not in kernel address space

 scripts/kallsyms.c  | 12 +++-
 scripts/link-vmlinux.sh |  2 ++
 2 files changed, 13 insertions(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make efi-pstore return a unique id

2013-11-01 Thread Madper Xie

rich...@nod.at writes:

> Am 01.11.2013 20:22, schrieb Seiji Aguchi:
> +{
> +   char id_str[64];
> +   u64 id = 0;
> +
> +   sprintf(id_str, "%lu%u%d", timestamp, part, count);
> +   if (kstrtoull(id_str, 10, &id))
> +   pr_warn("efi-pstore: failed to generate id\n");
> +   return id;
> +}

 This is just odd. You make a string from three ints and then a parse
 it to a int again.
>>>
>>> Agreed.  I liked your ((timestamp * 100 + part) * 100 + count function much
>>> more than this.
>> 
>> I was worried that the part and count could be more than 100.
>> If it happens, the id may not be unique...
>> 
>> But, currently, size of nvram storage is limited, so it is a corner case.
>> I respect your opinion.
>
Is it really safe? for now I have more than 100 entries:
[root@dhcp-13-41 rhel6]# ls -l /dev/pstore/ | wc -l
124
The maximum part of my records is 16. But I not sure if overflow will
happen in some special case, like a very long dmesg output. or a server
never reboot, and too many warnings make count++...
So is it necessary to check count < 100 or 100 =< count < 1000 ?
> What about feeding the bytes of all three integers into a non-cryptographic 
> hash function?
> Using this way you get a cheap unique id.
>
> Thanks,
> //richard
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Best,
Madper Xie.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: IOMMU: hva align mapping page size

2013-11-01 Thread Marcelo Tosatti
On Fri, Nov 01, 2013 at 10:08:55AM -0600, Greg Edwards wrote:
> When determining the page size we could use to map with the IOMMU, the
> page size should be aligned with the hva, not the gfn.  The gfn may not
> reflect the real alignment within the hugetlbfs file.
> 
> Most of the time, this works fine.  However, if the hugetlbfs file is
> backed by non-contiguous huge pages, a multi-huge page memslot starts at
> an unaligned offset within the hugetlbfs file, and the gfn is aligned
> with respect to the huge page size, kvm_host_page_size() will return the
> huge page size and we will use that to map with the IOMMU.
> 
> When we later unpin that same memslot, the IOMMU returns the unmap size
> as the huge page size, and we happily unpin that many pfns in
> monotonically increasing order, not realizing we are spanning
> non-contiguous huge pages and partially unpin the wrong huge page.
> 
> Instead, ensure the IOMMU mapping page size is aligned with the hva
> corresponding to the gfn, which does reflect the alignment within the
> hugetlbfs file.
> 
> Signed-off-by: Greg Edwards 
> Cc: sta...@vger.kernel.org
> ---
> This resolves the bug previously reported (and misdiagnosed) here:
> 
>  http://www.spinics.net/lists/kvm/msg97599.html
> 
>  virt/kvm/iommu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index 72a130b..0e2ff32 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -99,8 +99,8 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
> kvm_memory_slot *slot)
>   while ((gfn + (page_size >> PAGE_SHIFT)) > end_gfn)
>   page_size >>= 1;
>  
> - /* Make sure gfn is aligned to the page size we want to map */
> - while ((gfn << PAGE_SHIFT) & (page_size - 1))
> + /* Make sure hva is aligned to the page size we want to map */
> + while (__gfn_to_hva_memslot(slot, gfn) & (page_size - 1))
>   page_size >>= 1;

gfn should be aligned to page size as well (IOMMU requirement), so don't
drop that check.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/11] devicetree: bindings: Document Qualcomm cpus and enable-method

2013-11-01 Thread Rob Herring
On Fri, Nov 1, 2013 at 5:08 PM, Stephen Boyd  wrote:
> From: Rohit Vaswani 
>
> Scorpion and Krait are Qualcomm cpus. These cpus don't use the
> spin-table enable-method. Instead they rely on mmio register
> accesses to enable power and clocks to bring CPUs out of reset.
>
> Cc: 
> Signed-off-by: Rohit Vaswani 
> [sboyd: Split off into separate patch, renamed method to
> qcom,mmio]
> Signed-off-by: Stephen Boyd 
> ---
>
> This slightly conflicts with my krait EDAC series.
>
>  Documentation/devicetree/bindings/arm/cpus.txt | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
> b/Documentation/devicetree/bindings/arm/cpus.txt
> index 37258f9..e2969fa2 100644
> --- a/Documentation/devicetree/bindings/arm/cpus.txt
> +++ b/Documentation/devicetree/bindings/arm/cpus.txt
> @@ -44,6 +44,8 @@ For the ARM architecture every CPU node must contain the 
> following properties:
> "marvell,mohawk"
> "marvell,xsc3"
> "marvell,xscale"
> +   "qcom,scorpion"
> +   "qcom,krait"
>
>  And the following optional properties:
>
> @@ -52,6 +54,7 @@ And the following optional properties:
>  different types of cpus.
>  This should be one of:
>  "spin-table"
> +"qcom,mmio"

Not exactly specific. How would you handle variations in the enable
method? The mmio method to enable is tied to the core type or SOC
type?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/11] devicetree: bindings: Document cpu enable-method for ARM CPUs

2013-11-01 Thread Rob Herring
On Fri, Nov 1, 2013 at 5:08 PM, Stephen Boyd  wrote:
> From: Rohit Vaswani 
>
> According to the ePAPR CPUs should have an enable method. On ARM
> the enable-method property has not been used so far, so document
> this property as an optional property and add the spin-table
> method as one value
>
> Cc: 
> Signed-off-by: Rohit Vaswani 
> [sboyd: Split off into separate patch]
> Signed-off-by: Stephen Boyd 

Acked-by: Rob Herring 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: zram/zsmalloc issues in very low memory conditions

2013-11-01 Thread Olav Haugan
On 10/25/2013 2:19 AM, Minchan Kim wrote:
> Hello,
> 
> I had no enough time to think over your great questions since I should enjoy
> in Edinburgh so if I miss something, Sorry!
> 
> On Wed, Oct 23, 2013 at 02:51:34PM -0700, Olav Haugan wrote:
>> I am trying to use zram in very low memory conditions and I am having
>> some issues. zram is in the reclaim path. So if the system is very low
>> on memory the system is trying to reclaim pages by swapping out (in this
>> case to zram). However, since we are very low on memory zram fails to
>> get a page from zsmalloc and thus zram fails to store the page. We get
>> into a cycle where the system is low on memory so it tries to swap out
>> to get more memory but swap out fails because there is not enough memory
>> in the system! The major problem I am seeing is that there does not seem
>> to be a way for zram to tell the upper layers to stop swapping out
> 
> True. The zram is block device so at a moment, I don't want to make zram
> swap-specific if it's possible.
> 
>> because the swap device is essentially "full" (since there is no more
>> memory available for zram pages). Has anyone thought about this issue
>> already and have ideas how to solve this or am I missing something and I
>> should not be seeing this issue?
> 
> It's true. We might need feedback loop and it shoudn't be specific for
> zram-swap. One think I can imagine is that we could move failed victim
> pages into LRU active list when the swapout failed so VM will have more
> weight for file pages than anon ones. For detail, you could see
> AOP_WRITEPAGE_ACTIVATE and get_scan_count for detail.
> 
> The problem is it's on fs layer while zram is on block layer so what I
> can think at a moment is follwing as
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8ed1b77..c80b0b4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -502,6 +502,8 @@ static pageout_t pageout(struct page *page, struct 
> address_space *mapping,
> if (!PageWriteback(page)) {
> /* synchronous write or broken a_ops? */
> ClearPageReclaim(page);
> +   if (PageError(page))
> +   return PAGE_ACTIVATE;
> }
> trace_mm_vmscan_writepage(page, trace_reclaim_flags(page));
> inc_zone_page_state(page, NR_VMSCAN_WRITE);
> 
> 
> It doesn't prevent swapout at all but it should throttle pick up anonymous
> pages for reclaiming so file-backed pages will be preferred by VM so sometime,
> zsmalloc succeed to allocate a free page and swapout will resume again.

I tried the above suggestion but it does not seem to have any noticeable
impact. The system is still trying to swap out at a very high rate after
zram reported failure to swap out. The error logging is actually so much
that my system crashed due to excessive logging (we have a watchdog that
is not getting pet because the kernel is busy logging kernel messages).

There isn't anything that can be set to tell the fs layer to back off
completely for a while (congestion control)?


Olav Haugan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH 1/4] i2c: i2c-bcm-kona: Introduce Broadcom I2C Driver

2013-11-01 Thread Tim Kryger
On Fri, Nov 1, 2013 at 5:49 AM, Wolfram Sang  wrote:
> On Wed, Oct 16, 2013 at 03:01:46PM -0700, Tim Kryger wrote:
>> Introduce support for Broadcom Serial Controller (BSC) I2C bus found
>> in the Kona family of Mobile SoCs.  FIFO hardware is utilized but only
>> standard mode (100kHz), fast mode (400kHz), and fast mode plus (1MHz)
>> bus speeds are supported.
>>
>> Signed-off-by: Tim Kryger 
>> Reviewed-by: Matt Porter 
>> Reviewed-by: Markus Mayer 
>
> Looks mostly good, some remarks:
>
>> +struct bcm_kona_i2c_dev {
>> + /* Pointer to linux device struct */
>> + struct device *device;
>> +
>> + /* Virtual address where registers are mapped */
>> + void __iomem *base;
>> +
>> + /* Interrupt */
>> + int irq;
>> +
>> + /* Standard Speed configuration */
>> + const struct bus_speed_cfg *std_cfg;
>> +
>> + /* Linux I2C adapter struct */
>> + struct i2c_adapter adapter;
>> +
>> + /* Lock for the I2C device */
>> + struct mutex i2c_bcm_lock;
>> +
>> + /* Completion to signal an operation finished */
>> + struct completion done;
>> +
>> + /* Handle for external clock */
>> + struct clk *external_clk;
>> +};
>
> IMO most of the comments could go. Kind of stating the obvious :)

Sure thing.

>
>> +/* Read any amount of data using the RX FIFO from the i2c bus */
>> +static int bcm_kona_i2c_read_fifo(struct bcm_kona_i2c_dev *dev,
>> +   struct i2c_msg *msg)
>> +{
>> + unsigned int bytes_to_read = MAX_RX_FIFO_SIZE;
>> + unsigned int last_byte_nak = 0;
>> + unsigned int bytes_read = 0;
>> + unsigned int rc;
>
> Should be signed.

Absolutely.  Thanks for catching this one.

>
>> +/* Write any amount of data using TX FIFO to the i2c bus */
>> +static int bcm_kona_i2c_write_fifo(struct bcm_kona_i2c_dev *dev,
>> +struct i2c_msg *msg)
>> +{
>> + unsigned int bytes_to_write = MAX_TX_FIFO_SIZE;
>> + unsigned int bytes_written = 0;
>> + unsigned int rc;
>
> Ditto signed.

I shall fix it.

>
>> +/* Master transfer function */
>> +static int bcm_kona_i2c_xfer(struct i2c_adapter *adapter,
>> +  struct i2c_msg msgs[], int num)
>> +{
>> + struct bcm_kona_i2c_dev *dev = i2c_get_adapdata(adapter);
>> + struct i2c_msg *pmsg;
>> + int rc = 0;
>> + int i;
>> +
>> + mutex_lock(&dev->i2c_bcm_lock);
>
> Huh? Why do you need that? The core has locks per bus.

In that case, I will remove the local mutex.

>> +static int bcm_kona_i2c_assign_bus_speed(struct bcm_kona_i2c_dev *dev)
>> +{
>> + unsigned int bus_speed;
>> + int ret = of_property_read_u32(dev->device->of_node, "clock-frequency",
>> +&bus_speed);
>> + if (ret < 0) {
>> + dev_err(dev->device, "missing clock-frequency property\n");
>> + return -ENODEV;
>> + }
>> +
>> + switch (bus_speed) {
>> + case 10:
>> + dev->std_cfg = &std_cfg_table[BCM_SPD_100K];
>> + break;
>> + case 40:
>> + dev->std_cfg = &std_cfg_table[BCM_SPD_400K];
>> + break;
>> + case 100:
>> + dev->std_cfg = &std_cfg_table[BCM_SPD_1MHZ];
>> + break;
>> + default:
>> + pr_err("%d hz bus speed not supported\n", bus_speed);
>> + return -EINVAL;
>> + };
>
> Unneeded semicolon.
>

Sure.  It will be removed.

>> +
>> + return 0;
>> +}
>> +
>> +static int bcm_kona_i2c_probe(struct platform_device *pdev)
>> +{
>> + int rc = 0;
>> + struct bcm_kona_i2c_dev *dev;
>> + struct i2c_adapter *adap;
>> + struct resource *iomem;
>> +
>> + /* Allocate memory for private data structure */
>> + dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
>> + if (!dev) {
>> + rc = -ENOMEM;
>> + goto probe_return;
>
> I'd prefer simply return -Esomething. The printout at probe_return is
> also available via driver core.

Agreed.

> ...
>
>> + /* Add the i2c adapter */
>> + adap = &dev->adapter;
>> + i2c_set_adapdata(adap, dev);
>> + adap->owner = THIS_MODULE;
>> + adap->class = UINT_MAX; /* can be used by any I2C device */
>
> Why do you need class based instantiation. It will most likely cost
> boot-time and you have devicetree means for doing instantiation.
>
>

Agreed.

>> + strlcpy(adap->name, "Broadcom I2C adapter", sizeof(adap->name));
>> + adap->algo = &bcm_algo;
>> + adap->dev.parent = &pdev->dev;
>> + adap->nr = pdev->id;
>> + adap->dev.of_node = pdev->dev.of_node;
>> +
>> + rc = i2c_add_numbered_adapter(adap);
>
> Maybe you want simply i2c_add_adapter here since the core has no
> built-in support for of aliases?
>

Okay.  I will switch to i2c_add_adapter and remove the adap->nr assignment.

>> +
>> +MODULE_AUTHOR("Broadcom");
>
> Some email address please.
>
>> +MODULE_DESCRIPTION("Broadcom Kona I2C Driver");
>> +MODULE_LICENSE("GPL v2");

Correct parameter size for BLKSSZGET ioctl.

2013-11-01 Thread Jason Cipriani
In blkdiscard in util-linux, at least since version 2.23, the
following code is used to retrieve a device's physical sector size:

  uint64_t secsize;
  ioctl(fd, BLKSSZGET, &secsize);

On my machine (Ubuntu 12.04 -- 3.2.0-55-generic-pae #85-Ubuntu SMP Wed
Oct 2 14:03:15 UTC 2013 i686 i686 i386 GNU/Linux) this yields
incorrect results as it seems a 32-bit int is expected, this causes
subsequent sector alignment calculations in blkdiscard to be
incorrect, which in turn causes blkdiscards trim ioctl's to fail in
certain situations (or even worse, to trim the wrong blocks).

I have seen BLKSSZGET implemented in two places. In block/ioctl.c it
is implemented using put_int
(http://lxr.free-electrons.com/source/block/ioctl.c#L365) which works
on an "int" (which doesn't actually have a fixed size -- it is defined
by the particular C compiler). In block/compat_ioctl.c it is
implemented using compat_put_int, which ultimately operates on a
compat_int_t, which, for all platforms I looked at, is explicitly
defined to be a 32-bit type (e.g. "typedef s32 compat_int_t").

My goal is to determine if blkdiscard is incorrectly using a uint64_t,
or if the Ubuntu kernel is incorrectly using some 32-bit type.
Therefore I would like to know precisely what parameter ioctl 0x1268
(BLKSSZGET) is specified to take.

My question, then, is where is this specified? Currently I have only
found scattered implementations (many using the vague "int") of
ioctl.c and compat_ioctl.c, uncommented header declarations (e.g.
fs.h), and anecdotal evidence and claims.

In other words: What is the parameter size for BLKSSZGET and, more
importantly, *how do you know that*? Is blkdiscard broken, or is my
kernel's implementation broken? If I wrote a device driver that
supported this ioctl, what data type would I use? Surely at some point
in the past somebody decided that 0x1268 would retrieve the physical
sector size of a device, and documented that somewhere.

Please CC me on replies; I am not subscribed to this list.

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make efi-pstore return a unique id

2013-11-01 Thread Tony Luck
On Fri, Nov 1, 2013 at 1:57 PM, Seiji Aguchi  wrote:
>> What about feeding the bytes of all three integers into a non-cryptographic 
>> hash function?
>> Using this way you get a cheap unique id.
>
> It is reasonable to me.

How does efivars backend handle "unlink(2)" in the pstore file system.
pstore will call the backend->erase function passing the "id".  The
backend should then erase the right record from persistent storage.

With the  ((timestamp * 100 + part) * 100 + count function - you can
easily reverse it to find timestamp, part and count - would that make life
easier for the backend to find the record to be erased?  If you use a
hash function you will need to check each record and compute the
hash to see if it matches (probably not a big deal because the backend
will generally only hold a handful of records).

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange location and name for platform devices when device-tree is used.

2013-11-01 Thread NeilBrown
On Sat, 02 Nov 2013 10:10:25 +1100 Benjamin Herrenschmidt
 wrote:

> On Fri, 2013-11-01 at 13:47 -0700, Greg Kroah-Hartman wrote:
> 
> > > > On my device I seem to have some platform devices registered through
> > > > device-tree, and some registered through platform_device_add (e.g.
> > > > 'alarmtimer').  Guaranteeing they remain disjoint sets if the kernel is
> > > > allowed to evolve independently of the devicetree might be tricky
> > > > Maybe we need "/sys/devices/platform" and "/sys/devices/dt_platform" ??
> > > 
> > > No, I think device-tree created platform devices should go
> > > to /sys/devices/platform like the "classic" ones.
> > > 
> > > The problem is really how to deal with potential name duplication. We
> > > could try to register, if we get -EEXIST (assuming sysfs returns the
> > > right stuff), try again with ".1" etc...
> > 
> > How can there be device name collisions?  All platform devices _should_
> > be named uniquely, if not, you have bigger problems...
> 
> The problem is how to create a unique name for a platform device created
> from a device-tree node.
> 
> Device tree nodes aren't necessarily uniquely named. They are unique
> under a given parent but that hierarchy isn't preserved when creating
> corresponding platform devices (and it would be very tricky to do so).
> 
> Currently, we simply append a number to the name when creating them,
> which is obtained from a global counter.
> 
> Neil is unhappy about that because on his specific hardware, the device
> has a unique name and thus we introduce a naming difference between
> device-tree usage and old-style "hard coded" board file usage.

It occurs to me that a different approach could solve my problem.

My problem stems from the fact that the name of the device on the
platform-bus is used as the name of the device in the "backlight" class.

As Greg writes elsewhere, depending on names with /sys/devices is not
supported - we need to accept that bus-names might change.
However names in class devices tend to be a lot more stable.
Several devices allow these to be explicitly set.
 leds have 'label'
 regulators has "regulator-name"
 gpio-keys has 'label'.

I could just teach pwm_bl to allow a 'label' property which would be used in
place of the platform-bus device name when creating the class/backlight
device.

The maxim "you cannot trust names to remain stable in /sys/devices" can
justify both the movement of platform devices into /sys/devices/platform, and
the use of "label" rather than the device-name for creating the class device.

Does that sound convincing?

Thanks,
NeilBrown

> 
> It would be nice if we could do something that only appends the "global
> number" at the end of the name if the name isn't already unique. Thus my
> proposal of trying first with the base name, and trying again if that
> returns -EEXIST in some kind of loop.
> 
> Do you have a better idea ?
> 
> Cheers,
> Ben.
> 



signature.asc
Description: PGP signature


Re: [RFC PATCH] PCI: export MSI mode using attributes, not kobjects

2013-11-01 Thread Bjorn Helgaas
On Tue, Oct 29, 2013 at 3:46 PM, Greg Kroah-Hartman
 wrote:
> From: Greg Kroah-Hartman 
>
> The PCI MSI sysfs code is a mess with kobjects for things that don't
> really need to be kobjects.  This patch creates attributes dynamically
> for the MSI interrupts instead of using kobjects.
>
> Note, this does not delete the existing sysfs MSI code, but puts the
> attributes under a "msi_irqs_2" directory for testing / example.
>
> Also note, this removes a directory from the current MSI interrupt sysfs
> code:
>
> old MSI kobjects:
> pci_device
>└── msi_irqs
>└── 40
>└── mode
>
> new MSI attributes:
> pci_device
>└── msi_irqs_2
>└── 40
>
> As there was only one file "mode" with the kobject model, the interrupt
> number is now a file that returns the "mode" of the interrupt (msi vs.
> msix).
>
> Signed-off-by: Greg Kroah-Hartman 
> ---
>
> Bjorn, I can make up a patch that rips out the existing kobject code
> here, but I figured this patch would make things easier to follow
> instead of having to dig through the removed logic at the same time.
>
> I'll clean up the error handling path for the create attribute logic as
> well, this was just a proof of concept that this could be done.
>
> Do you think that anyone cares about the current mode files in sysfs to
> move things in this manner?

I like this a lot better than trying to fix all the holes in the
current kobject code.

I have no idea who, if anybody, cares about the "mode" files.  I
assume there's a way to create the "mode" files with attributes, too?
If so, we could replicate the existing structure with one patch, and
simplify it with a second patch, so it would be easier to revert the
directory change while keeping the fix.

Bjorn

>  drivers/pci/msi.c   |   85 
> 
>  include/linux/pci.h |1
>  2 files changed, 86 insertions(+)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index d5f90d63..53848ab9 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -353,6 +353,9 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg)
>  static void free_msi_irqs(struct pci_dev *dev)
>  {
> struct msi_desc *entry, *tmp;
> +   struct attribute **msi_attrs;
> +   struct device_attribute *dev_attr;
> +   int count = 0;
>
> list_for_each_entry(entry, &dev->msi_list, list) {
> int i, nvec;
> @@ -388,6 +391,22 @@ static void free_msi_irqs(struct pci_dev *dev)
> list_del(&entry->list);
> kfree(entry);
> }
> +
> +   if (dev->msi_irq_groups) {
> +   sysfs_remove_groups(&dev->dev.kobj, dev->msi_irq_groups);
> +   msi_attrs = dev->msi_irq_groups[0]->attrs;
> +   list_for_each_entry(entry, &dev->msi_list, list) {
> +   dev_attr = container_of(msi_attrs[count],
> +   struct device_attribute, 
> attr);
> +   kfree(dev_attr->attr.name);
> +   kfree(dev_attr);
> +   ++count;
> +   }
> +   kfree(msi_attrs);
> +   kfree(dev->msi_irq_groups[0]);
> +   kfree(dev->msi_irq_groups);
> +   dev->msi_irq_groups = NULL;
> +   }
>  }
>
>  static struct msi_desc *alloc_msi_entry(struct pci_dev *dev)
> @@ -517,13 +536,79 @@ static struct kobj_type msi_irq_ktype = {
> .default_attrs = msi_irq_default_attrs,
>  };
>
> +static ssize_t msi_mode_show(struct device *dev, struct device_attribute 
> *attr,
> +char *buf)
> +{
> +   struct pci_dev *pdev = to_pci_dev(dev);
> +   struct msi_desc *entry;
> +   unsigned long irq;
> +   int retval;
> +
> +   retval = kstrtoul(attr->attr.name, 10, &irq);
> +   if (retval)
> +   return retval;
> +
> +   list_for_each_entry(entry, &pdev->msi_list, list) {
> +   if (entry->irq == irq) {
> +   return sprintf(buf, "%s\n",
> +  entry->msi_attrib.is_msix ? "msix" : 
> "msi");
> +   }
> +   }
> +   return -ENODEV;
> +}
> +
>  static int populate_msi_sysfs(struct pci_dev *pdev)
>  {
> +   struct attribute **msi_attrs;
> +   struct device_attribute *msi_dev_attr;
> +   struct attribute_group *msi_irq_group;
> +   const struct attribute_group **msi_irq_groups;
> struct msi_desc *entry;
> struct kobject *kobj;
> int ret;
> +   int num_msi = 0;
> int count = 0;
>
> +   /* Determine how many msi entries we have */
> +   list_for_each_entry(entry, &pdev->msi_list, list) {
> +   ++num_msi;
> +   }
> +   if (!num_msi)
> +   return 0;
> +
> +   /* Dynamically create the MSI attributes for the PCI device */
> +   msi_attrs = kzalloc(sizeof(void *) * (num_msi + 1), GFP_KERNEL);
> +   if (!msi_attrs)
> 

Re: [PATCH] bnx2: Use dev_kfree_skb_any() in bnx2_tx_int()

2013-11-01 Thread Ben Hutchings
On Fri, 2013-11-01 at 18:01 -0400, David Miller wrote:
> From: Cong Wang 
> Date: Thu, 31 Oct 2013 21:19:16 -0700
> 
> > 2013年10月30日 下午9:26于 "David Miller" 写道:
> >>
> >> We have to provide a softint compatible environment for this callback
> >> to run in else everything is completely broken.
> >>
> >> All these drivers can safely assume softirq safe locking is
> >> sufficient, you're suggesting we need to take this hardirq safety and
> >> I'm really not willing to allow things to go that far.  A lot of
> >> effort has been expended precisely to avoid that kind of overhead and
> >> cost.
> > 
> > Alright, I am thinking to move netpoll_poll_dev() to a delayed work.
>  
> What if the printk is outputting a message that will help us discover
> that work queues are deadlocked?
> 
> You can't delay the message, because every layer of indirection you
> add increases the possibility that the message it never seen.  You
> have to do it synchronously.

As you've said, the ndo_start_xmit and NAPI poll operations are intended
to be called in softirq context, so everything that interlocks with them
will use spin_lock_bh().  Calling them from hardirq context obviously
opens the possibility of a deadlock.  How do you expect anyone to solve
that?

I think that most of the time netpoll doesn't actually call the NAPI
poll function, and the driver ndo_start_xmit function doesn't take any
locks, so we don't actually hit the deadlock in practice (on mainline
kernels - RT is a different story).

Obviously, the less machinery netpoll relies on continuing to work, the
better, so it should preferably defer to sofirq context rather than
workqueue context.  I think this means hooking queue_process() into
net_tx_action(), and then cutting out much of the rest of netpoll.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] extcon-gpio: add devicetree support.

2013-11-01 Thread NeilBrown
On Fri, 1 Nov 2013 10:16:44 -0700 Mark Rutland  wrote:

> Hi Neil,
> 
> While I'm not fundamentally opposed to this binding, I have some issues with
> its current form and would not want to see this version hit mainline.
> 

Thanks for the review.

> On Fri, Nov 01, 2013 at 09:50:05AM +, NeilBrown wrote:
> > 
> > As this device is not vendor specific, I haven't included any "vendor,"
> > prefixes.  For my model I used "regulator-gpio" which takes a similar
> > approach.
> > 
> > Signed-off-by: NeilBrown 
> > 
> > diff --git a/Documentation/devicetree/bindings/extcon/extcon-gpio.txt 
> > b/Documentation/devicetree/bindings/extcon/extcon-gpio.txt
> > new file mode 100644
> > index ..2346b61cc620
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/extcon/extcon-gpio.txt
> > @@ -0,0 +1,26 @@
> > +* EXTCON detector using GPIO
> 
> EXTCON is _extremely_ Linux-specific. The binding document needs a description
> of the class of device it's inteded to describe that does not just refer to
> Linux internals.
> 
> I would prefer if we could have a better name for this that was not tied to 
> the
> Linux driver name. Perhaps "gpio-presence-detector"?

Maybe "cable-presence-detector" as in this case the GPIO is just an
implementation detail.  Which isn't much different from "external-connector"
which is where "extcon" comes from...

I propose "external-connector" if you don't like "extcon".


> 
> > +
> > +Required Properties:
> > + - compatible: "extcon-gpio"
> > + - gpios: gpio line that detects connector
> > + - interrupts: interrupt generated by that gpio
> 
> We don't need this. If we need the interrupt a gpio generates, we should ask
> the gpio controller driver to map the gpio to an interrupt.
> 
> We have gpiod_to_irq for this in Linux.

The reason I did this was that the pre-existing platform_data wants
'irq_flags'.  I could have an 'irq-flags' property, but it seems to make more
sense to use "interrupts" as that already provides a way to pass irq-flags to
a device.

On reflection though, I cannot imagine why any extcon-gpio would use anything
other than  IRQ_TYPE_EDGE_BOTH.  Maybe MyungJoo Ham can explain that???

If there is no need for specifying irq-flags per-platform, the "interrupts"
property can definitely go.


> 
> > + - debounce-delay-ms: debouncing delay
> > +
> > +Optional Properties:
> > + - label: name for connector.  If not given, device name is used.
> > + - state-on: string to report when GPIO is high (else '0')
> > + - state-off: string to report when GPIO is low (else '1')
> 
> I do not like these properties, they are very much a Linux implementation
> detail.
> 
> Are extcon devices ever used standalone? If so, why?

I'm not sure what you mean by stand alone - it is part of a mobile-phone
motherboard so it certainly isn't alone :-)

The board has a GPS which is connected to a serial port.  So the kernel
doesn't really need to know much about it.
There is an internal antenna and a connector for an external antenna.  There
is some clever electronics that detects when the external antenna is plugged
in and re-routes the antenna power to the external (and way from the
internal).
A gpio can read the state of this electronic switch.  It seems sensible to
present this to user-space as an 'extcon' device.  It is stand-alone only in
that the kernel doesn't "know" that it is related to anything else.  I and
the user-space software know that it isn't "alone".

Given that there a two antennas, internal and external, and always one is
connected, it seems sensible to present it that way.

However I don't object to the connection being called  "external-antenna"
which reports either "0" or "1".
That would make "state-on" and "state-off" unnecessary and I see no
problem with removing them.

I do think it is necessary to have a "label" for the  external connector
though.  It is a specific connector with a specific purpose and deserves to
have a "label", in exactly the same way that "leds" devices and provide a
label for each LED.

> 
> If not I see _no_ reason at all for the label property. If a userspace
> application needs to detect the presence of a particular external connector, 
> it
> will need to know this in relation to the device the external connectors are
> attached to. In that case the application should find that device and traverse
> its set of extcon devices. The names for the external connections will be a
> property of the device, not the extcon devices themselves (along hte same 
> lines
> as clocks), and need not be a property of the extcon device.

This sounds interesting but I don't follow exactly what you mean.
In particular, where would an application "find that device" and how would it
"traverse the set of extcon devices"?
And if there are multiple extcons in a parent, how can the name of the extcon
be a property of the parent?

Confused... maybe I should explore the 'clocks' to which you refer...

> 
> As for state-on and state-off, we are exposing a binary

Re: linux-next: manual merge of the dt-rh tree with the powerpc tree

2013-11-01 Thread Benjamin Herrenschmidt
On Fri, 2013-11-01 at 17:24 -0500, Rob Herring wrote:
> On 11/01/2013 12:20 AM, Stephen Rothwell wrote:
> > Hi Rob,
> > 
> > Today's linux-next merge of the dt-rh tree got a conflict in 
> > arch/powerpc/include/asm/prom.h between commit a3e31b458844 ("of:
> > Move definition of of_find_next_cache_node into common code") from
> > the powerpc tree and commit 0c3f061c195c ("of: implement
> > of_node_to_nid as a weak function") from the dt-rh tree.
> 
> Ben, I can pick these 2 patches up instead if you want to drop them
> and avoid the conflict.

I'd rather not rebase my tree, the conflict seems to be rather trivial
to solve.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange location and name for platform devices when device-tree is used.

2013-11-01 Thread Benjamin Herrenschmidt
On Fri, 2013-11-01 at 13:47 -0700, Greg Kroah-Hartman wrote:

> > > On my device I seem to have some platform devices registered through
> > > device-tree, and some registered through platform_device_add (e.g.
> > > 'alarmtimer').  Guaranteeing they remain disjoint sets if the kernel is
> > > allowed to evolve independently of the devicetree might be tricky
> > > Maybe we need "/sys/devices/platform" and "/sys/devices/dt_platform" ??
> > 
> > No, I think device-tree created platform devices should go
> > to /sys/devices/platform like the "classic" ones.
> > 
> > The problem is really how to deal with potential name duplication. We
> > could try to register, if we get -EEXIST (assuming sysfs returns the
> > right stuff), try again with ".1" etc...
> 
> How can there be device name collisions?  All platform devices _should_
> be named uniquely, if not, you have bigger problems...

The problem is how to create a unique name for a platform device created
from a device-tree node.

Device tree nodes aren't necessarily uniquely named. They are unique
under a given parent but that hierarchy isn't preserved when creating
corresponding platform devices (and it would be very tricky to do so).

Currently, we simply append a number to the name when creating them,
which is obtained from a global counter.

Neil is unhappy about that because on his specific hardware, the device
has a unique name and thus we introduce a naming difference between
device-tree usage and old-style "hard coded" board file usage.

It would be nice if we could do something that only appends the "global
number" at the end of the name if the name isn't already unique. Thus my
proposal of trying first with the base name, and trying again if that
returns -EEXIST in some kind of loop.

Do you have a better idea ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange location and name for platform devices when device-tree is used.

2013-11-01 Thread Benjamin Herrenschmidt
On Fri, 2013-11-01 at 13:47 -0700, Greg Kroah-Hartman wrote:

> > > On my device I seem to have some platform devices registered through
> > > device-tree, and some registered through platform_device_add (e.g.
> > > 'alarmtimer').  Guaranteeing they remain disjoint sets if the kernel is
> > > allowed to evolve independently of the devicetree might be tricky
> > > Maybe we need "/sys/devices/platform" and "/sys/devices/dt_platform" ??
> > 
> > No, I think device-tree created platform devices should go
> > to /sys/devices/platform like the "classic" ones.
> > 
> > The problem is really how to deal with potential name duplication. We
> > could try to register, if we get -EEXIST (assuming sysfs returns the
> > right stuff), try again with ".1" etc...
> 
> How can there be device name collisions?  All platform devices _should_
> be named uniquely, if not, you have bigger problems...

The problem is how to create a unique name from a platform device
created from a device-tree node.

Device tree nodes aren't necessarily uniquely named. They are unique
under a given parent but that hierarchy isn't preserved when creating
corresponding platform devices (and it would be very tricky to do so).

Currently, we simply append a number to the name when creating them,
which is obtained from a global counter.

Neil is unhappy about that because on his specific hardware, the device
has a unique name and thus we introduce a naming difference between
device-tree usage and old-style "hard coded" board file usage.

It would be nice if we could do something that only appends the "global
number" at the end of the name if the name isn't already unique. Thus my
proposal of trying first with the base name, and trying again if that
returns -EEXIST in some kind of loop.

Do you have a better idea ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/4] ARM: EXYNOS: Add support for EXYNOS5410 SoC

2013-11-01 Thread Tomasz Figa
Hi Rob,

On Friday 01 of November 2013 16:52:44 Rob Herring wrote:
> On 10/14/2013 10:08 AM, Vyacheslav Tyrtov wrote:
> > From: Tarek Dakhran 
> > 
> > EXYNOS5410 is SoC in Samsung's Exynos5 SoC series.
> > Add initial support for this SoC.
> 
> I think this entire patch is mostly unnecessary and this information
> should all be coming from DT. I'll leave it to arm-soc maintainers
> whether they want to accept this addition rather than see some clean-up
> here.

The clean-up here is already in plans, but this is much more than can be 
seen just from this patch, so it needs some time.

> 
> "samsung,exynos5410" does need to be documented though.

Right.

Best regards,
Tomasz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 7/9] phy: add Broadcom Kona USB2 PHY DT binding

2013-11-01 Thread Matt Porter
On Fri, Nov 01, 2013 at 09:54:10PM +0100, Arend van Spriel wrote:
> On 11/01/2013 08:45 PM, Matt Porter wrote:
> >Add a binding that describes the Broadcom Kona USB2 PHY found
> >on the BCM281xx family of SoCs.
> >
> >Signed-off-by: Matt Porter 
> >---
> >  .../devicetree/bindings/phy/bcm-kona-usb2-phy.txt | 15 
> > +++
> >  1 file changed, 15 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/phy/bcm-kona-usb2-phy.txt
> >
> >diff --git a/Documentation/devicetree/bindings/phy/bcm-kona-usb2-phy.txt 
> >b/Documentation/devicetree/bindings/phy/bcm-kona-usb2-phy.txt
> >new file mode 100644
> >index 000..db309e2
> >--- /dev/null
> >+++ b/Documentation/devicetree/bindings/phy/bcm-kona-usb2-phy.txt
> >@@ -0,0 +1,15 @@
> >+BROADCOM KONA USB2 PHY
> >+
> >+Required properties:
> >+ - compatible: brcm,kona-usb2-phy
> >+ - regs: offset and length of the PHY registers
> >+ - #phy-cells: must be 0
> >+Refer to phy/phy-bindings.txt for the generic PHY binding properties
> >+
> >+Example:
> >+
> >+usbphy: usbphy@3f13 {
> >+compatible = "brcm,kona-usb2-phy";
> >+reg = <0x3f13 0x28>;
> 
> I expect 'regs' iso 'reg' in this example.

Yes, will fix the typo in that property for v3.

Thanks,
Matt

> >+#phy-cells = <0>;
> >+};
> >
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] power: Add legacy pm ops usage warning

2013-11-01 Thread Rafael J. Wysocki
On Friday, November 01, 2013 09:07:04 AM Shuah Khan wrote:
> Add legacy pm_ops usage checks to device_pm_add() when a device gets added
> to PM core's list of active devices. If legacy pm_ops usage is found at its
> class, bus, driver level, print warning message to indicate the driver code
> needs updating to use dev pm ops interfaces. This will help serve as a way
> to track drivers that still use legacy pm ops and fix them.

I think it would be much better to do these checks during bus type, class or
driver registration, because if you register a bus type with legacy PM, for
example, the check in device_pm_add() will trigger for all devices with that
bus type.

Thanks!

> Signed-off-by: Shuah Khan 
> ---
>  drivers/base/power/main.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 9f098a8..4dc26dc 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -112,6 +112,23 @@ void device_pm_unlock(void)
>  }
>  
>  /**
> + * check for lgeacy pm_ops usage and warn
> + */
> +static void device_legacy_pm_ops_check(struct device *dev)
> +{
> + char *info = "Please update driver to use dev pm_ops";
> +
> + if (dev->class && (dev->class->suspend || dev->class->resume))
> + dev_warn(dev, "Driver uses legacy class pm ops - %s\n", info);
> +
> + if (dev->bus && (dev->bus->suspend || dev->bus->resume))
> + dev_warn(dev, "Driver uses legacy bus pm ops - %s\n", info);
> +
> + if (dev->driver && (dev->driver->suspend || dev->driver->resume))
> + dev_warn(dev, "Driver uses legacy pm ops - %s\n", info);
> +}
> +
> +/**
>   * device_pm_add - Add a device to the PM core's list of active devices.
>   * @dev: Device to add to the list.
>   */
> @@ -123,6 +140,7 @@ void device_pm_add(struct device *dev)
>   if (dev->parent && dev->parent->power.is_prepared)
>   dev_warn(dev, "parent %s should not be sleeping\n",
>   dev_name(dev->parent));
> + device_legacy_pm_ops_check(dev);
>   list_add_tail(&dev->power.entry, &dpm_list);
>   mutex_unlock(&dpm_list_mtx);
>  }
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 9/9] ARM: dts: add usb udc support to bcm281xx

2013-11-01 Thread Matt Porter
On Fri, Nov 01, 2013 at 11:56:33PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 11/01/2013 10:45 PM, Matt Porter wrote:
> 
> >Adds USB OTG/PHY and clock support to BCM281xx and enables
> >UDC support on the bcm11351-brt and bcm28155-ap boards.
> 
> >Signed-off-by: Matt Porter 
> >Reviewed-by: Markus Mayer 
> >Reviewed-by: Tim Kryger 
> >---
> >  arch/arm/boot/dts/bcm11351-brt.dts |  6 ++
> >  arch/arm/boot/dts/bcm11351.dtsi| 18 ++
> >  arch/arm/boot/dts/bcm28155-ap.dts  |  8 
> >  3 files changed, 32 insertions(+)
> 
> [...]
> >diff --git a/arch/arm/boot/dts/bcm11351.dtsi 
> >b/arch/arm/boot/dts/bcm11351.dtsi
> >index 0755f43..247f9fd 100644
> >--- a/arch/arm/boot/dts/bcm11351.dtsi
> >+++ b/arch/arm/boot/dts/bcm11351.dtsi
> >@@ -284,4 +284,22 @@
> > #clock-cells = <0>;
> > };
> > };
> >+
> >+usbotg: usbotg@3f12 {
> 
>According to ePAPR spec [1], the node name should be "usb@3f12".

Will address in v3.

> >+compatible = "snps,dwc2";
> >+reg = <0x3f12 0x1>;
> >+interrupts = ;
> >+clocks = <&usb_otg_ahb_clk>;
> >+clock-names = "otg";
> >+phys = <&usbphy>;
> >+phy-names = "usb2-phy";
> >+status = "disabled";
> >+};
> >+
> >+usbphy: usbphy@3f13 {
> 
>This one should probably be named "usb-phy@3f13", just like
> "ethernet-phy" from the ePAPR spec.

Yes, agreed that will follow the same naming convention. I'll roll this
in v3. Thanks for the review.

-Matt

> >+compatible = "brcm,kona-usb2-phy";
> >+reg = <0x3f13 0x28>;
> >+#phy-cells = <0>;
> >+status = "disabled";
> >+};
> >  };
> 
> [1] http://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.0.pdf
> 
> WBR, Sergei
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 01/54] tcp: TSO packets automatic sizing

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commits 6d36824e730f247b602c90e8715a792003e3c5a7,
  02cf4ebd82ff0ac7254b88e466820a290ed8289a, and parts of
  7eec4174ff29cd42f2acfae8112f51c228545d40 ]

After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Van Jacobson 
Cc: Tom Herbert 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/networking/ip-sysctl.txt |9 
 include/net/sock.h |2 +
 include/net/tcp.h  |1 
 net/core/sock.c|1 
 net/ipv4/sysctl_net_ipv4.c |   10 +
 net/ipv4/tcp.c |   28 ++-
 net/ipv4/tcp_input.c   |   34 -
 net/ipv4/tcp_output.c  |2 -
 8 files changed, 80 insertions(+), 7 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -478,6 +478,15 @@ tcp_syn_retries - INTEGER
 tcp_timestamps - BOOLEAN
Enable timestamps as defined in RFC1323.
 
+tcp_min_tso_segs - INTEGER
+   Minimal number of segments per TSO frame.
+   Since linux-3.12, TCP does an automatic sizing of TSO frames,
+   depending on flow rate, instead of filling 64Kbytes packets.
+   For specific usages, it's possible to force TCP to build big
+   TSO frames. Note that TCP stack might split too big TSO packets
+   if available window is too small.
+   Default: 2
+
 tcp_tso_win_divisor - INTEGER
This allows control over what percentage of the congestion window
can be consumed by a single TSO frame.
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -230,6 +230,7 @@ struct cg_proto;
   *@sk_wmem_queued: persistent queue size
   *@sk_forward_alloc: space allocated forward
   *@sk_allocation: allocation mode
+  *@sk_pacing_rate: Pacing rate (if supported by transport/packet 
scheduler)
   *@sk_sndbuf: size of send buffer in bytes
   *@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
   *   %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
@@ -355,6 +356,7 @@ struct sock {
kmemcheck_bitfield_end(flags);
int sk_wmem_queued;
gfp_t   sk_allocation;
+   u32 sk_pacing_rate; /* bytes per second */
netdev_features_t   sk_route_caps;
netdev_features_t   sk_route_nocaps;
int sk_gso_type;
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -287,6 +287,7 @@ extern int sysctl_tcp_thin_dupack;
 extern int sysctl_tcp_early_retrans;
 extern int sysctl_tcp_limit_output_bytes;
 extern int sysctl_tcp_challenge_ack_limit;
+extern int sysctl_tcp_min_tso_segs;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2271,6 +2271,7 @@ void sock_init_data(struct socket *sock,
 
sk->sk_stamp = ktime_set(-1L, 0);
 
+   sk->sk_pacing_rate = ~0U;
/*
 * Before updating sk_refcnt, we must commit prior changes to memory
 * (Documentation/RCU/rculist_nulls.txt for details)
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -29,6 +29,7 @@
 static int zero;
 static int one = 

[PATCH 3.10 00/54] 3.10.18-stable review

2013-11-01 Thread Greg Kroah-Hartman
This is the start of the stable review cycle for the 3.10.18 release.
There are 54 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Nov  3 22:00:53 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.18-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
Linux 3.10.18-rc1

Enrico Mioso 
usb: serial: option: blacklist Olivetti Olicard200

Greg Kroah-Hartman 
USB: serial: option: add support for Inovia SEW858 device

Diego Elio Pettenò 
USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table 
as well.

Roel Kluin 
serial: vt8500: add missing braces

Johannes Berg 
wireless: radiotap: fix parsing buffer overrun

Fengguang Wu 
writeback: fix negative bdi max pause

David Henningsson 
ALSA: hda - Fix inverted internal mic not indicated on some machines

Takashi Iwai 
ALSA: us122l: Fix pcm_usb_stream mmapping regression

Hugh Dickins 
mm: fix BUG in __split_huge_page_pmd

James Ralston 
i2c: ismt: initialize DMA buffer

Mikulas Patocka 
dm snapshot: fix data corruption

Mika Westerberg 
gpio/lynxpoint: check if the interrupt is enabled in IRQ handler

Linus Walleij 
ARM: integrator: deactivate timer0 on the Integrator/CP

AKASHI Takahiro 
ARM: 7851/1: check for number of arguments in syscall_get/set_arguments()

Mariusz Ceier 
davinci_emac.c: Fix IFF_ALLMULTI setup

Hannes Frederic Sowa 
ipv6: probe routes asynchronous in rt6_probe

Julian Anastasov 
netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper

Julian Anastasov 
ipv6: fill rt6i_gateway with nexthop address

Julian Anastasov 
ipv6: always prefer rt6i_gateway if present

Hannes Frederic Sowa 
inet: fix possible memory corruption with UDP_CORK and UFO

Seif Mazareeb 
net: fix cipso packet validation when !NETLABEL

Daniel Borkmann 
net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race

Vasundhara Volam 
be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd

Salva Peiró 
wanxl: fix info leak in ioctl

Vlad Yasevich 
sctp: Perform software checksum if packet has to be fragmented.

Fan Du 
sctp: Use software crc32 checksum when xfrm transform will happen.

Vlad Yasevich 
net: dst: provide accessor function to dst->xfrm

Vlad Yasevich 
bridge: Correctly clamp MAX forward_delay when enabling STP

Jason Wang 
virtio-net: refill only when device is up during setting queues

Jason Wang 
virtio-net: fix the race between channels setting and refill

Jason Wang 
virtio-net: don't respond to cpu hotplug notifier if we're not ready

Eric Dumazet 
bnx2x: record rx queue for LRO packets

Mathias Krause 
connector: use nlmsg_len() to check message length

Mathias Krause 
unix_diag: fix info leak

Salva Peiró 
farsync: fix info leak in ioctl

Eric Dumazet 
l2tp: must disable bh before calling l2tp_xmit_skb()

Christophe Gouault 
vti: get rid of nf mark rule in prerouting

Marc Kleine-Budde 
net: vlan: fix nlmsg size calculation in vlan_get_size()

Paul Durrant 
xen-netback: Don't destroy the netdev until the vif is shut down

Fabio Estevam 
net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not 
selected

Marc Kleine-Budde 
can: dev: fix nlmsg size calculation in can_get_size()

Jiri Benc 
ipv4: fix ineffective source address selection

Mathias Krause 
proc connector: fix info leaks

Dan Carpenter 
net: heap overflow in __audit_sockaddr()

Sebastian Hesselbarth 
net: mv643xx_eth: fix orphaned statistics timer crash

Sebastian Hesselbarth 
net: mv643xx_eth: update statistics timer from timer context only

David S. Miller 
l2tp: Fix build warning with ipv6 disabled.

François CACHEREUL 
l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses

Eric Dumazet 
net: do not call sock_put() on TIMEWAIT sockets

Yuchung Cheng 
tcp: fix incorrect ca_state in tail loss probe

Eric Dumazet 
tcp: do not forget FIN in tcp_shifted_skb()

Eric Dumazet 
tcp: must unclone packets before mangling them

Eric Dumazet 
tcp: TSQ can use a dynamic limit

Eric Dumazet 
tcp: TSO packets automatic sizing


-

Diffstat:

 Documentation/networking/ip-sysctl.txt  |  9 +
 Makefile|  4 +--
 arch/arm/boot/dts/integratorcp.dts  |  9 +++--
 arch/arm/include/asm/syscall.h  |  6 
 drivers/connector/cn_proc.c | 18 ++
 drivers/connector/connector.c   |  7 ++--
 drivers/gpio/gpio-lynxpoint.c   |  5 +--
 drivers/i2c/busses/i2c-ismt.c

[PATCH 3.10 02/54] tcp: TSQ can use a dynamic limit

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit c9eeec26e32e087359160406f96e0949b3cc6f10 ]

When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 1p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
 rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
  2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
  2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet 
Cc: Wei Liu 
Cc: Cong Wang 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_output.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -887,8 +887,7 @@ static int tcp_transmit_skb(struct sock
 
skb_orphan(skb);
skb->sk = sk;
-   skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ?
- tcp_wfree : sock_wfree;
+   skb->destructor = tcp_wfree;
atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
/* Build TCP header and checksum it. */
@@ -1832,7 +1831,6 @@ static bool tcp_write_xmit(struct sock *
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
 
-
tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
BUG_ON(!tso_segs);
 
@@ -1861,13 +1859,20 @@ static bool tcp_write_xmit(struct sock *
break;
}
 
-   /* TSQ : sk_wmem_alloc accounts skb truesize,
-* including skb overhead. But thats OK.
+   /* TCP Small Queues :
+* Control number of packets in qdisc/devices to two packets / 
or ~1 ms.
+* This allows for :
+*  - better RTT estimation and ACK scheduling
+*  - faster recovery
+*  - high rates
 */
-   if (atomic_read(&sk->sk_wmem_alloc) >= 
sysctl_tcp_limit_output_bytes) {
+   limit = max(skb->truesize, sk->sk_pacing_rate >> 10);
+
+   if (atomic_read(&sk->sk_wmem_alloc) > limit) {
set_bit(TSQ_THROTTLED, &tp->tsq_flags);
break;
}
+
limit = mss_now;
if (tso_segs > 1 && !tcp_urg_mode(tp))
limit = tcp_mss_split_point(sk, skb, mss_now,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 14/54] can: dev: fix nlmsg size calculation in can_get_size()

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Marc Kleine-Budde 

[ Upstream commit fe119a05f8ca481623a8d02efcc984332e612528 ]

This patch fixes the calculation of the nlmsg size, by adding the missing
nla_total_size().

Signed-off-by: Marc Kleine-Budde 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/can/dev.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -705,14 +705,14 @@ static size_t can_get_size(const struct
size_t size;
 
size = nla_total_size(sizeof(u32));   /* IFLA_CAN_STATE */
-   size += sizeof(struct can_ctrlmode);  /* IFLA_CAN_CTRLMODE */
+   size += nla_total_size(sizeof(struct can_ctrlmode));  /* 
IFLA_CAN_CTRLMODE */
size += nla_total_size(sizeof(u32));  /* IFLA_CAN_RESTART_MS */
-   size += sizeof(struct can_bittiming); /* IFLA_CAN_BITTIMING */
-   size += sizeof(struct can_clock); /* IFLA_CAN_CLOCK */
+   size += nla_total_size(sizeof(struct can_bittiming)); /* 
IFLA_CAN_BITTIMING */
+   size += nla_total_size(sizeof(struct can_clock)); /* IFLA_CAN_CLOCK 
*/
if (priv->do_get_berr_counter)/* IFLA_CAN_BERR_COUNTER */
-   size += sizeof(struct can_berr_counter);
+   size += nla_total_size(sizeof(struct can_berr_counter));
if (priv->bittiming_const)/* IFLA_CAN_BITTIMING_CONST */
-   size += sizeof(struct can_bittiming_const);
+   size += nla_total_size(sizeof(struct can_bittiming_const));
 
return size;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] trace/trace_stat: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 kernel/trace/trace_stat.c | 42 ++
 1 file changed, 6 insertions(+), 36 deletions(-)

diff --git a/kernel/trace/trace_stat.c b/kernel/trace/trace_stat.c
index 847f88a..fa53acc 100644
--- a/kernel/trace/trace_stat.c
+++ b/kernel/trace/trace_stat.c
@@ -43,46 +43,16 @@ static DEFINE_MUTEX(all_stat_sessions_mutex);
 /* The root directory for all stat files */
 static struct dentry   *stat_dir;
 
-/*
- * Iterate through the rbtree using a post order traversal path
- * to release the next node.
- * It won't necessary release one at each iteration
- * but it will at least advance closer to the next one
- * to be released.
- */
-static struct rb_node *release_next(struct tracer_stat *ts,
-   struct rb_node *node)
+static void __reset_stat_session(struct stat_session *session)
 {
-   struct stat_node *snode;
-   struct rb_node *parent = rb_parent(node);
-
-   if (node->rb_left)
-   return node->rb_left;
-   else if (node->rb_right)
-   return node->rb_right;
-   else {
-   if (!parent)
-   ;
-   else if (parent->rb_left == node)
-   parent->rb_left = NULL;
-   else
-   parent->rb_right = NULL;
+   struct stat_node *snode, *n;
 
-   snode = container_of(node, struct stat_node, node);
-   if (ts->stat_release)
-   ts->stat_release(snode->stat);
+   rbtree_postorder_for_each_entry_safe(snode, n, &session->stat_root,
+   node) {
+   if (session->ts->stat_release)
+   session->ts->stat_release(snode->stat);
kfree(snode);
-
-   return parent;
}
-}
-
-static void __reset_stat_session(struct stat_session *session)
-{
-   struct rb_node *node = session->stat_root.rb_node;
-
-   while (node)
-   node = release_next(session->ts, node);
 
session->stat_root = RB_ROOT;
 }
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] fs/ubifs: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 fs/ubifs/debug.c| 22 +++---
 fs/ubifs/log.c  | 21 ++---
 fs/ubifs/orphan.c   | 21 ++---
 fs/ubifs/recovery.c | 21 +++--
 fs/ubifs/super.c| 24 
 fs/ubifs/tnc.c  | 22 +++---
 6 files changed, 17 insertions(+), 114 deletions(-)

diff --git a/fs/ubifs/debug.c b/fs/ubifs/debug.c
index 6e025e0..378c179 100644
--- a/fs/ubifs/debug.c
+++ b/fs/ubifs/debug.c
@@ -2118,26 +2118,10 @@ out_free:
  */
 static void free_inodes(struct fsck_data *fsckd)
 {
-   struct rb_node *this = fsckd->inodes.rb_node;
-   struct fsck_inode *fscki;
+   struct fsck_inode *fscki, *n;
 
-   while (this) {
-   if (this->rb_left)
-   this = this->rb_left;
-   else if (this->rb_right)
-   this = this->rb_right;
-   else {
-   fscki = rb_entry(this, struct fsck_inode, rb);
-   this = rb_parent(this);
-   if (this) {
-   if (this->rb_left == &fscki->rb)
-   this->rb_left = NULL;
-   else
-   this->rb_right = NULL;
-   }
-   kfree(fscki);
-   }
-   }
+   rbtree_postorder_for_each_entry_safe(fscki, n, &fsckd->inodes, rb)
+   kfree(fscki);
 }
 
 /**
diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
index 36bd4ef..a902c59 100644
--- a/fs/ubifs/log.c
+++ b/fs/ubifs/log.c
@@ -574,27 +574,10 @@ static int done_already(struct rb_root *done_tree, int 
lnum)
  */
 static void destroy_done_tree(struct rb_root *done_tree)
 {
-   struct rb_node *this = done_tree->rb_node;
-   struct done_ref *dr;
+   struct done_ref *dr, *n;
 
-   while (this) {
-   if (this->rb_left) {
-   this = this->rb_left;
-   continue;
-   } else if (this->rb_right) {
-   this = this->rb_right;
-   continue;
-   }
-   dr = rb_entry(this, struct done_ref, rb);
-   this = rb_parent(this);
-   if (this) {
-   if (this->rb_left == &dr->rb)
-   this->rb_left = NULL;
-   else
-   this->rb_right = NULL;
-   }
+   rbtree_postorder_for_each_entry_safe(dr, n, done_tree, rb)
kfree(dr);
-   }
 }
 
 /**
diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index ba32da3..f1c3e5a1 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -815,27 +815,10 @@ static int dbg_find_check_orphan(struct rb_root *root, 
ino_t inum)
 
 static void dbg_free_check_tree(struct rb_root *root)
 {
-   struct rb_node *this = root->rb_node;
-   struct check_orphan *o;
+   struct check_orphan *o, *n;
 
-   while (this) {
-   if (this->rb_left) {
-   this = this->rb_left;
-   continue;
-   } else if (this->rb_right) {
-   this = this->rb_right;
-   continue;
-   }
-   o = rb_entry(this, struct check_orphan, rb);
-   this = rb_parent(this);
-   if (this) {
-   if (this->rb_left == &o->rb)
-   this->rb_left = NULL;
-   else
-   this->rb_right = NULL;
-   }
+   rbtree_postorder_for_each_entry_safe(o, n, root, rb)
kfree(o);
-   }
 }
 
 static int dbg_orphan_check(struct ubifs_info *c, struct ubifs_zbranch *zbr,
diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index 065096e..c14adb2 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -1335,29 +1335,14 @@ static void remove_ino(struct ubifs_info *c, ino_t inum)
  */
 void ubifs_destroy_size_tree(struct ubifs_info *c)
 {
-   struct rb_node *this = c->size_tree.rb_node;
-   struct size_entry *e;
+   struct size_entry *e, *n;
 
-   while (this) {
-   if (this->rb_left) {
-   this = this->rb_left;
-   continue;
-   } else if (this->rb_right) {
-   this = this->rb_right;
-   continue;
-   }
-   e = rb_entry(this, struct size_entry, rb);
+   rbtree_postorder_for_each_entry_safe(e, n, &c->size_tree, rb) {
if (e->inode)
iput(e->inode);
-   this = rb_parent(this);
-   if (this) {
-   if (this->rb_left == &e->rb)
-  

[PATCH 4/8] fs/ext4: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 fs/ext4/block_validity.c | 33 -
 fs/ext4/dir.c| 35 +--
 2 files changed, 9 insertions(+), 59 deletions(-)

diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c
index 3f11656..41eb9dc 100644
--- a/fs/ext4/block_validity.c
+++ b/fs/ext4/block_validity.c
@@ -180,37 +180,12 @@ int ext4_setup_system_zone(struct super_block *sb)
 /* Called when the filesystem is unmounted */
 void ext4_release_system_zone(struct super_block *sb)
 {
-   struct rb_node  *n = EXT4_SB(sb)->system_blks.rb_node;
-   struct rb_node  *parent;
-   struct ext4_system_zone *entry;
+   struct ext4_system_zone *entry, *n;
 
-   while (n) {
-   /* Do the node's children first */
-   if (n->rb_left) {
-   n = n->rb_left;
-   continue;
-   }
-   if (n->rb_right) {
-   n = n->rb_right;
-   continue;
-   }
-   /*
-* The node has no children; free it, and then zero
-* out parent's link to it.  Finally go to the
-* beginning of the loop and try to free the parent
-* node.
-*/
-   parent = rb_parent(n);
-   entry = rb_entry(n, struct ext4_system_zone, node);
+   rbtree_postorder_for_each_entry_safe(entry, n,
+   &EXT4_SB(sb)->system_blks, node)
kmem_cache_free(ext4_system_zone_cachep, entry);
-   if (!parent)
-   EXT4_SB(sb)->system_blks = RB_ROOT;
-   else if (parent->rb_left == n)
-   parent->rb_left = NULL;
-   else if (parent->rb_right == n)
-   parent->rb_right = NULL;
-   n = parent;
-   }
+
EXT4_SB(sb)->system_blks = RB_ROOT;
 }
 
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 680bb33..d638c57 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -353,41 +353,16 @@ struct fname {
  */
 static void free_rb_tree_fname(struct rb_root *root)
 {
-   struct rb_node  *n = root->rb_node;
-   struct rb_node  *parent;
-   struct fname*fname;
-
-   while (n) {
-   /* Do the node's children first */
-   if (n->rb_left) {
-   n = n->rb_left;
-   continue;
-   }
-   if (n->rb_right) {
-   n = n->rb_right;
-   continue;
-   }
-   /*
-* The node has no children; free it, and then zero
-* out parent's link to it.  Finally go to the
-* beginning of the loop and try to free the parent
-* node.
-*/
-   parent = rb_parent(n);
-   fname = rb_entry(n, struct fname, rb_hash);
+   struct fname *fname, *next;
+
+   rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
while (fname) {
struct fname *old = fname;
fname = fname->next;
kfree(old);
}
-   if (!parent)
-   *root = RB_ROOT;
-   else if (parent->rb_left == n)
-   parent->rb_left = NULL;
-   else if (parent->rb_right == n)
-   parent->rb_right = NULL;
-   n = parent;
-   }
+
+   *root = RB_ROOT;
 }
 
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/8] net ipset: use rbtree postorder iteration instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 net/netfilter/ipset/ip_set_hash_netiface.c | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c 
b/net/netfilter/ipset/ip_set_hash_netiface.c
index 7d798d5..99dba4c 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -45,31 +45,12 @@ struct iface_node {
 static void
 rbtree_destroy(struct rb_root *root)
 {
-   struct rb_node *p, *n = root->rb_node;
-   struct iface_node *node;
-
-   /* Non-recursive destroy, like in ext3 */
-   while (n) {
-   if (n->rb_left) {
-   n = n->rb_left;
-   continue;
-   }
-   if (n->rb_right) {
-   n = n->rb_right;
-   continue;
-   }
-   p = rb_parent(n);
-   node = rb_entry(n, struct iface_node, node);
-   if (!p)
-   *root = RB_ROOT;
-   else if (p->rb_left == n)
-   p->rb_left = NULL;
-   else if (p->rb_right == n)
-   p->rb_right = NULL;
+   struct iface_node *node, *next;
 
+   rbtree_postorder_for_each_entry_safe(node, next, root, node)
kfree(node);
-   n = p;
-   }
+
+   *root = RB_ROOT;
 }
 
 static int
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] fs/ext3: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 fs/ext3/dir.c | 36 +---
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index bafdd48..a331ad1 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -309,43 +309,17 @@ struct fname {
  */
 static void free_rb_tree_fname(struct rb_root *root)
 {
-   struct rb_node  *n = root->rb_node;
-   struct rb_node  *parent;
-   struct fname*fname;
-
-   while (n) {
-   /* Do the node's children first */
-   if (n->rb_left) {
-   n = n->rb_left;
-   continue;
-   }
-   if (n->rb_right) {
-   n = n->rb_right;
-   continue;
-   }
-   /*
-* The node has no children; free it, and then zero
-* out parent's link to it.  Finally go to the
-* beginning of the loop and try to free the parent
-* node.
-*/
-   parent = rb_parent(n);
-   fname = rb_entry(n, struct fname, rb_hash);
+   struct fname *fname, *next;
+
+   rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
while (fname) {
struct fname * old = fname;
fname = fname->next;
kfree (old);
}
-   if (!parent)
-   *root = RB_ROOT;
-   else if (parent->rb_left == n)
-   parent->rb_left = NULL;
-   else if (parent->rb_right == n)
-   parent->rb_right = NULL;
-   n = parent;
-   }
-}
 
+   *root = RB_ROOT;
+}
 
 static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
   loff_t pos)
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] sh/dwarf: use rbtree postorder iteration helper instead of solution using repeated rb_erase()

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of using repeated rb_erase() calls

Signed-off-by: Cody P Schafer 
---
 arch/sh/kernel/dwarf.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/sh/kernel/dwarf.c b/arch/sh/kernel/dwarf.c
index 49c09c7..67a049e 100644
--- a/arch/sh/kernel/dwarf.c
+++ b/arch/sh/kernel/dwarf.c
@@ -995,29 +995,19 @@ static struct unwinder dwarf_unwinder = {
 
 static void dwarf_unwinder_cleanup(void)
 {
-   struct rb_node **fde_rb_node = &fde_root.rb_node;
-   struct rb_node **cie_rb_node = &cie_root.rb_node;
+   struct dwarf_fde *fde, *next_fde;
+   struct dwarf_cie *cie, *next_cie;
 
/*
 * Deallocate all the memory allocated for the DWARF unwinder.
 * Traverse all the FDE/CIE lists and remove and free all the
 * memory associated with those data structures.
 */
-   while (*fde_rb_node) {
-   struct dwarf_fde *fde;
-
-   fde = rb_entry(*fde_rb_node, struct dwarf_fde, node);
-   rb_erase(*fde_rb_node, &fde_root);
+   rbtree_postorder_for_each_entry_safe(fde, next_fde, &fde_root, node)
kfree(fde);
-   }
 
-   while (*cie_rb_node) {
-   struct dwarf_cie *cie;
-
-   cie = rb_entry(*cie_rb_node, struct dwarf_cie, node);
-   rb_erase(*cie_rb_node, &cie_root);
+   rbtree_postorder_for_each_entry_safe(cie, next_cie, &cie_root, node)
kfree(cie);
-   }
 
kmem_cache_destroy(dwarf_reg_cachep);
kmem_cache_destroy(dwarf_frame_cachep);
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] mtd/ubi: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 drivers/mtd/ubi/attach.c | 49 +++-
 drivers/mtd/ubi/wl.c | 25 +++-
 2 files changed, 10 insertions(+), 64 deletions(-)

diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c
index c071d41..6de0786 100644
--- a/drivers/mtd/ubi/attach.c
+++ b/drivers/mtd/ubi/attach.c
@@ -1133,27 +1133,11 @@ static int late_analysis(struct ubi_device *ubi, struct 
ubi_attach_info *ai)
  */
 static void destroy_av(struct ubi_attach_info *ai, struct ubi_ainf_volume *av)
 {
-   struct ubi_ainf_peb *aeb;
-   struct rb_node *this = av->root.rb_node;
-
-   while (this) {
-   if (this->rb_left)
-   this = this->rb_left;
-   else if (this->rb_right)
-   this = this->rb_right;
-   else {
-   aeb = rb_entry(this, struct ubi_ainf_peb, u.rb);
-   this = rb_parent(this);
-   if (this) {
-   if (this->rb_left == &aeb->u.rb)
-   this->rb_left = NULL;
-   else
-   this->rb_right = NULL;
-   }
+   struct ubi_ainf_peb *aeb, *next;
+
+   rbtree_postorder_for_each_entry_safe(aeb, next, &av->root, u.rb)
+   kmem_cache_free(ai->aeb_slab_cache, aeb);
 
-   kmem_cache_free(ai->aeb_slab_cache, aeb);
-   }
-   }
kfree(av);
 }
 
@@ -1164,8 +1148,7 @@ static void destroy_av(struct ubi_attach_info *ai, struct 
ubi_ainf_volume *av)
 static void destroy_ai(struct ubi_attach_info *ai)
 {
struct ubi_ainf_peb *aeb, *aeb_tmp;
-   struct ubi_ainf_volume *av;
-   struct rb_node *rb;
+   struct ubi_ainf_volume *av, *next;
 
list_for_each_entry_safe(aeb, aeb_tmp, &ai->alien, u.list) {
list_del(&aeb->u.list);
@@ -1185,26 +1168,8 @@ static void destroy_ai(struct ubi_attach_info *ai)
}
 
/* Destroy the volume RB-tree */
-   rb = ai->volumes.rb_node;
-   while (rb) {
-   if (rb->rb_left)
-   rb = rb->rb_left;
-   else if (rb->rb_right)
-   rb = rb->rb_right;
-   else {
-   av = rb_entry(rb, struct ubi_ainf_volume, rb);
-
-   rb = rb_parent(rb);
-   if (rb) {
-   if (rb->rb_left == &av->rb)
-   rb->rb_left = NULL;
-   else
-   rb->rb_right = NULL;
-   }
-
-   destroy_av(ai, av);
-   }
-   }
+   rbtree_postorder_for_each_entry_safe(av, next, &ai->volumes, rb)
+   destroy_av(ai, av);
 
if (ai->aeb_slab_cache)
kmem_cache_destroy(ai->aeb_slab_cache);
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index c95bfb1..1af3899 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -1760,29 +1760,10 @@ int ubi_wl_flush(struct ubi_device *ubi, int vol_id, 
int lnum)
  */
 static void tree_destroy(struct rb_root *root)
 {
-   struct rb_node *rb;
-   struct ubi_wl_entry *e;
-
-   rb = root->rb_node;
-   while (rb) {
-   if (rb->rb_left)
-   rb = rb->rb_left;
-   else if (rb->rb_right)
-   rb = rb->rb_right;
-   else {
-   e = rb_entry(rb, struct ubi_wl_entry, u.rb);
-
-   rb = rb_parent(rb);
-   if (rb) {
-   if (rb->rb_left == &e->u.rb)
-   rb->rb_left = NULL;
-   else
-   rb->rb_right = NULL;
-   }
+   struct ubi_wl_entry *e, *next;
 
-   kmem_cache_free(ubi_wl_entry_slab, e);
-   }
-   }
+   rbtree_postorder_for_each_entry_safe(e, next, root, u.rb)
+   kmem_cache_free(ubi_wl_entry_slab, e);
 }
 
 /**
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 07/54] l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: François CACHEREUL 

[ Upstream commit e18503f41f9b12132c95d7c31ca6ee5155e44e5c ]

IPv4 mapped addresses cause kernel panic.
The patch juste check whether the IPv6 address is an IPv4 mapped
address. If so, use IPv4 API instead of IPv6.

[  940.026915] general protection fault:  [#1]
[  940.026915] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppox 
ppp_generic slhc loop psmouse
[  940.026915] CPU: 0 PID: 3184 Comm: memcheck-amd64- Not tainted 3.11.0+ #1
[  940.026915] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  940.026915] task: 880007130e20 ti: 88000737e000 task.ti: 
88000737e000
[  940.026915] RIP: 0010:[]  [] 
ip6_xmit+0x276/0x326
[  940.026915] RSP: 0018:88000737fd28  EFLAGS: 00010286
[  940.026915] RAX: c748521a75ceff48 RBX: 88c30800 RCX: 
[  940.026915] RDX: 8875cc4e RSI: 0028 RDI: 8800060e5a40
[  940.026915] RBP: 8800060e5a40 R08:  R09: 8875cc90
[  940.026915] R10:  R11:  R12: 88000737fda0
[  940.026915] R13:  R14: 2000 R15: 880005d3b580
[  940.026915] FS:  7f163dc5e800() GS:81623000() 
knlGS:
[  940.026915] CS:  0010 DS:  ES:  CR0: 80050033
[  940.026915] CR2: 0004032dc940 CR3: 05c25000 CR4: 06f0
[  940.026915] Stack:
[  940.026915]  8875cc4e 81694e90 88c30b38 
0020
[  940.026915]  1100523c4bac 88000737fdb4  
88c30800
[  940.026915]  880005d3b580 88c30b38 8800060e5a40 
0020
[  940.026915] Call Trace:
[  940.026915]  [] ? inet6_csk_xmit+0xa4/0xc4
[  940.026915]  [] ? l2tp_xmit_skb+0x503/0x55a [l2tp_core]
[  940.026915]  [] ? pskb_expand_head+0x161/0x214
[  940.026915]  [] ? pppol2tp_xmit+0xf2/0x143 [l2tp_ppp]
[  940.026915]  [] ? ppp_channel_push+0x36/0x8b [ppp_generic]
[  940.026915]  [] ? ppp_write+0xaf/0xc5 [ppp_generic]
[  940.026915]  [] ? vfs_write+0xa2/0x106
[  940.026915]  [] ? SyS_write+0x56/0x8a
[  940.026915]  [] ? system_call_fastpath+0x16/0x1b
[  940.026915] Code: 00 49 8b 8f d8 00 00 00 66 83 7c 11 02 00 74 60 49
8b 47 58 48 83 e0 fe 48 8b 80 18 01 00 00 48 85 c0 74 13 48 8b 80 78 02
00 00 <48> ff 40 28 41 8b 57 68 48 01 50 30 48 8b 54 24 08 49 c7 c1 51
[  940.026915] RIP  [] ip6_xmit+0x276/0x326
[  940.026915]  RSP 
[  940.057945] ---[ end trace be8aba9a61c8b7f3 ]---
[  940.058583] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: François CACHEREUL 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/l2tp/l2tp_core.c |   27 +++
 net/l2tp/l2tp_core.h |3 +++
 2 files changed, 26 insertions(+), 4 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -499,6 +499,7 @@ out:
 static inline int l2tp_verify_udp_checksum(struct sock *sk,
   struct sk_buff *skb)
 {
+   struct l2tp_tunnel *tunnel = (struct l2tp_tunnel *)sk->sk_user_data;
struct udphdr *uh = udp_hdr(skb);
u16 ulen = ntohs(uh->len);
__wsum psum;
@@ -507,7 +508,7 @@ static inline int l2tp_verify_udp_checks
return 0;
 
 #if IS_ENABLED(CONFIG_IPV6)
-   if (sk->sk_family == PF_INET6) {
+   if (sk->sk_family == PF_INET6 && !tunnel->v4mapped) {
if (!uh->check) {
LIMIT_NETDEBUG(KERN_INFO "L2TP: IPv6: checksum is 0\n");
return 1;
@@ -1071,7 +1072,7 @@ static int l2tp_xmit_core(struct l2tp_se
/* Queue the packet to IP for output */
skb->local_df = 1;
 #if IS_ENABLED(CONFIG_IPV6)
-   if (skb->sk->sk_family == PF_INET6)
+   if (skb->sk->sk_family == PF_INET6 && !tunnel->v4mapped)
error = inet6_csk_xmit(skb, NULL);
else
 #endif
@@ -1198,7 +1199,7 @@ int l2tp_xmit_skb(struct l2tp_session *s
 
/* Calculate UDP checksum if configured to do so */
 #if IS_ENABLED(CONFIG_IPV6)
-   if (sk->sk_family == PF_INET6)
+   if (sk->sk_family == PF_INET6 && !tunnel->v4mapped)
l2tp_xmit_ipv6_csum(sk, skb, udp_len);
else
 #endif
@@ -1647,6 +1648,24 @@ int l2tp_tunnel_create(struct net *net,
if (cfg != NULL)
tunnel->debug = cfg->debug;
 
+#if IS_ENABLED(CONFIG_IPV6)
+   if (sk->sk_family == PF_INET6) {
+   struct ipv6_pinfo *np = inet6_sk(sk);
+
+   if (ipv6_addr_v4mapped(&np->saddr) &&
+   ipv6_addr_v4mapped(&np->daddr)) {
+   struct inet_sock *inet = inet_sk(sk);
+
+   tunnel->v4mapped = true;
+   inet->inet_saddr = np->saddr.s6_addr32[3];
+   inet->inet_rcv_saddr = np->rcv_saddr.s6_addr32[3];
+

[PATCH 5/8] fs/jffs2: use rbtree postorder iteration helper instead of opencoding

2013-11-01 Thread Cody P Schafer
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Signed-off-by: Cody P Schafer 
---
 fs/jffs2/nodelist.c  | 28 ++--
 fs/jffs2/readinode.c | 26 +++---
 2 files changed, 5 insertions(+), 49 deletions(-)

diff --git a/fs/jffs2/nodelist.c b/fs/jffs2/nodelist.c
index 975a1f5..9a5449b 100644
--- a/fs/jffs2/nodelist.c
+++ b/fs/jffs2/nodelist.c
@@ -564,25 +564,10 @@ struct jffs2_node_frag *jffs2_lookup_node_frag(struct 
rb_root *fragtree, uint32_
they're killed. */
 void jffs2_kill_fragtree(struct rb_root *root, struct jffs2_sb_info *c)
 {
-   struct jffs2_node_frag *frag;
-   struct jffs2_node_frag *parent;
-
-   if (!root->rb_node)
-   return;
+   struct jffs2_node_frag *frag, *next;
 
dbg_fragtree("killing\n");
-
-   frag = (rb_entry(root->rb_node, struct jffs2_node_frag, rb));
-   while(frag) {
-   if (frag->rb.rb_left) {
-   frag = frag_left(frag);
-   continue;
-   }
-   if (frag->rb.rb_right) {
-   frag = frag_right(frag);
-   continue;
-   }
-
+   rbtree_postorder_for_each_entry_safe(frag, next, root, rb) {
if (frag->node && !(--frag->node->frags)) {
/* Not a hole, and it's the final remaining frag
   of this node. Free the node */
@@ -591,17 +576,8 @@ void jffs2_kill_fragtree(struct rb_root *root, struct 
jffs2_sb_info *c)
 
jffs2_free_full_dnode(frag->node);
}
-   parent = frag_parent(frag);
-   if (parent) {
-   if (frag_left(parent) == frag)
-   parent->rb.rb_left = NULL;
-   else
-   parent->rb.rb_right = NULL;
-   }
 
jffs2_free_node_frag(frag);
-   frag = parent;
-
cond_resched();
}
 }
diff --git a/fs/jffs2/readinode.c b/fs/jffs2/readinode.c
index ae81b01..386303d 100644
--- a/fs/jffs2/readinode.c
+++ b/fs/jffs2/readinode.c
@@ -543,33 +543,13 @@ static int jffs2_build_inode_fragtree(struct 
jffs2_sb_info *c,
 
 static void jffs2_free_tmp_dnode_info_list(struct rb_root *list)
 {
-   struct rb_node *this;
-   struct jffs2_tmp_dnode_info *tn;
-
-   this = list->rb_node;
+   struct jffs2_tmp_dnode_info *tn, *next;
 
-   /* Now at bottom of tree */
-   while (this) {
-   if (this->rb_left)
-   this = this->rb_left;
-   else if (this->rb_right)
-   this = this->rb_right;
-   else {
-   tn = rb_entry(this, struct jffs2_tmp_dnode_info, rb);
+   rbtree_postorder_for_each_entry_safe(tn, next, list, rb) {
jffs2_free_full_dnode(tn->fn);
jffs2_free_tmp_dnode_info(tn);
-
-   this = rb_parent(this);
-   if (!this)
-   break;
-
-   if (this->rb_left == &tn->rb)
-   this->rb_left = NULL;
-   else if (this->rb_right == &tn->rb)
-   this->rb_right = NULL;
-   else BUG();
-   }
}
+
*list = RB_ROOT;
 }
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 04/54] tcp: do not forget FIN in tcp_shifted_skb()

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 5e8a402f831dbe7ee831340a91439e46f0d38acd ]

Yuchung found following problem :

 There are bugs in the SACK processing code, merging part in
 tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
 skbs FIN flag. When a receiver first SACK the FIN sequence, and later
 throw away ofo queue (e.g., sack-reneging), the sender will stop
 retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 < S 0:0(0) win 32792 
+.000 > S. 0:0(0) ack 1 
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(1) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257 
+.050 < . 1:1(0) ack 2001 win 257 
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%

First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd076ab
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Cc: Ilpo Järvinen 
Acked-by: Ilpo Järvinen 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1292,7 +1292,10 @@ static bool tcp_shifted_skb(struct sock
tp->lost_cnt_hint -= tcp_skb_pcount(prev);
}
 
-   TCP_SKB_CB(skb)->tcp_flags |= TCP_SKB_CB(prev)->tcp_flags;
+   TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;
+   if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+   TCP_SKB_CB(prev)->end_seq++;
+
if (skb == tcp_highest_sack(sk))
tcp_advance_highest_sack(sk, skb);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 13/54] ipv4: fix ineffective source address selection

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Benc 

[ Upstream commit 0a7e22609067ff524fc7bbd45c6951dd08561667 ]

When sending out multicast messages, the source address in inet->mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").

Signed-off-by: Jiri Benc 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/route.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2020,7 +2020,7 @@ struct rtable *__ip_route_output_key(str
  RT_SCOPE_LINK);
goto make_route;
}
-   if (fl4->saddr) {
+   if (!fl4->saddr) {
if (ipv4_is_multicast(fl4->daddr))
fl4->saddr = inet_select_addr(dev_out, 0,
  
fl4->flowi4_scope);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 08/54] l2tp: Fix build warning with ipv6 disabled.

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: "David S. Miller" 

[ Upstream commit 8d8a51e26a6d415e1470759f2cf5f3ee3ee86196 ]

net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’:
net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ 
[-Wunused-variable]

Create a helper "l2tp_tunnel()" to facilitate this, and as a side
effect get rid of a bunch of unnecessary void pointer casts.

Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/l2tp/l2tp_core.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -115,6 +115,11 @@ struct l2tp_net {
 static void l2tp_session_set_header_len(struct l2tp_session *session, int 
version);
 static void l2tp_tunnel_free(struct l2tp_tunnel *tunnel);
 
+static inline struct l2tp_tunnel *l2tp_tunnel(struct sock *sk)
+{
+   return sk->sk_user_data;
+}
+
 static inline struct l2tp_net *l2tp_pernet(struct net *net)
 {
BUG_ON(!net);
@@ -499,7 +504,6 @@ out:
 static inline int l2tp_verify_udp_checksum(struct sock *sk,
   struct sk_buff *skb)
 {
-   struct l2tp_tunnel *tunnel = (struct l2tp_tunnel *)sk->sk_user_data;
struct udphdr *uh = udp_hdr(skb);
u16 ulen = ntohs(uh->len);
__wsum psum;
@@ -508,7 +512,7 @@ static inline int l2tp_verify_udp_checks
return 0;
 
 #if IS_ENABLED(CONFIG_IPV6)
-   if (sk->sk_family == PF_INET6 && !tunnel->v4mapped) {
+   if (sk->sk_family == PF_INET6 && !l2tp_tunnel(sk)->v4mapped) {
if (!uh->check) {
LIMIT_NETDEBUG(KERN_INFO "L2TP: IPv6: checksum is 0\n");
return 1;
@@ -1248,10 +1252,9 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
  */
 static void l2tp_tunnel_destruct(struct sock *sk)
 {
-   struct l2tp_tunnel *tunnel;
+   struct l2tp_tunnel *tunnel = l2tp_tunnel(sk);
struct l2tp_net *pn;
 
-   tunnel = sk->sk_user_data;
if (tunnel == NULL)
goto end;
 
@@ -1619,7 +1622,7 @@ int l2tp_tunnel_create(struct net *net,
}
 
/* Check if this socket has already been prepped */
-   tunnel = (struct l2tp_tunnel *)sk->sk_user_data;
+   tunnel = l2tp_tunnel(sk);
if (tunnel != NULL) {
/* This socket has already been prepped */
err = -EBUSY;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 05/54] tcp: fix incorrect ca_state in tail loss probe

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Yuchung Cheng 

[ Upstream commit 031afe4990a7c9dbff41a3a742c44d3e740ea0a1 ]

On receiving an ACK that covers the loss probe sequence, TLP
immediately sets the congestion state to Open, even though some packets
are not recovered and retransmisssion are on the way.  The later ACks
may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g.,
https://bugzilla.redhat.com/show_bug.cgi?id=989251

The fix is to follow the similar procedure in recovery by calling
tcp_try_keep_open(). The sender switches to Open state if no packets
are retransmissted. Otherwise it goes to Disorder and let subsequent
ACKs move the state to Recovery or Open.

Reported-By: Michael Sterrett 
Tested-By: Dormando 
Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3345,7 +3345,7 @@ static void tcp_process_tlp_ack(struct s
tcp_init_cwnd_reduction(sk, true);
tcp_set_ca_state(sk, TCP_CA_CWR);
tcp_end_cwnd_reduction(sk);
-   tcp_set_ca_state(sk, TCP_CA_Open);
+   tcp_try_keep_open(sk);
NET_INC_STATS_BH(sock_net(sk),
 LINUX_MIB_TCPLOSSPROBERECOVERY);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 09/54] net: mv643xx_eth: update statistics timer from timer context only

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Sebastian Hesselbarth 

[ Upstream commit 041b4ddb84989f06ff1df0ca869b950f1ee3cb1c ]

Each port driver installs a periodic timer to update port statistics
by calling mib_counters_update. As mib_counters_update is also called
from non-timer context, we should not reschedule the timer there but
rather move it to timer-only context.

Signed-off-by: Sebastian Hesselbarth 
Acked-by: Jason Cooper 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/ethernet/marvell/mv643xx_eth.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -1125,15 +1125,13 @@ static void mib_counters_update(struct m
p->rx_discard += rdlp(mp, RX_DISCARD_FRAME_CNT);
p->rx_overrun += rdlp(mp, RX_OVERRUN_FRAME_CNT);
spin_unlock_bh(&mp->mib_counters_lock);
-
-   mod_timer(&mp->mib_counters_timer, jiffies + 30 * HZ);
 }
 
 static void mib_counters_timer_wrapper(unsigned long _mp)
 {
struct mv643xx_eth_private *mp = (void *)_mp;
-
mib_counters_update(mp);
+   mod_timer(&mp->mib_counters_timer, jiffies + 30 * HZ);
 }
 
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 11/54] net: heap overflow in __audit_sockaddr()

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Dan Carpenter 

[ Upstream commit 1661bf364ae9c506bc8795fef70d1532931be1e8 ]

We need to cap ->msg_namelen or it leads to a buffer overflow when we
to the memcpy() in __audit_sockaddr().  It requires CAP_AUDIT_CONTROL to
exploit this bug.

The call tree is:
___sys_recvmsg()
  move_addr_to_user()
audit_sockaddr()
  __audit_sockaddr()

Reported-by: Jüri Aedla 
Signed-off-by: Dan Carpenter 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/compat.c |2 ++
 net/socket.c |   24 
 2 files changed, 22 insertions(+), 4 deletions(-)

--- a/net/compat.c
+++ b/net/compat.c
@@ -71,6 +71,8 @@ int get_compat_msghdr(struct msghdr *kms
__get_user(kmsg->msg_controllen, &umsg->msg_controllen) ||
__get_user(kmsg->msg_flags, &umsg->msg_flags))
return -EFAULT;
+   if (kmsg->msg_namelen > sizeof(struct sockaddr_storage))
+   return -EINVAL;
kmsg->msg_name = compat_ptr(tmp1);
kmsg->msg_iov = compat_ptr(tmp2);
kmsg->msg_control = compat_ptr(tmp3);
--- a/net/socket.c
+++ b/net/socket.c
@@ -1956,6 +1956,16 @@ struct used_address {
unsigned int name_len;
 };
 
+static int copy_msghdr_from_user(struct msghdr *kmsg,
+struct msghdr __user *umsg)
+{
+   if (copy_from_user(kmsg, umsg, sizeof(struct msghdr)))
+   return -EFAULT;
+   if (kmsg->msg_namelen > sizeof(struct sockaddr_storage))
+   return -EINVAL;
+   return 0;
+}
+
 static int ___sys_sendmsg(struct socket *sock, struct msghdr __user *msg,
 struct msghdr *msg_sys, unsigned int flags,
 struct used_address *used_address)
@@ -1974,8 +1984,11 @@ static int ___sys_sendmsg(struct socket
if (MSG_CMSG_COMPAT & flags) {
if (get_compat_msghdr(msg_sys, msg_compat))
return -EFAULT;
-   } else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
-   return -EFAULT;
+   } else {
+   err = copy_msghdr_from_user(msg_sys, msg);
+   if (err)
+   return err;
+   }
 
if (msg_sys->msg_iovlen > UIO_FASTIOV) {
err = -EMSGSIZE;
@@ -2183,8 +2196,11 @@ static int ___sys_recvmsg(struct socket
if (MSG_CMSG_COMPAT & flags) {
if (get_compat_msghdr(msg_sys, msg_compat))
return -EFAULT;
-   } else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
-   return -EFAULT;
+   } else {
+   err = copy_msghdr_from_user(msg_sys, msg);
+   if (err)
+   return err;
+   }
 
if (msg_sys->msg_iovlen > UIO_FASTIOV) {
err = -EMSGSIZE;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 12/54] proc connector: fix info leaks

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Mathias Krause 

[ Upstream commit e727ca82e0e9616ab4844301e6bae60ca7327682 ]

Initialize event_data for all possible message types to prevent leaking
kernel stack contents to userland (up to 20 bytes). Also set the flags
member of the connector message to 0 to prevent leaking two more stack
bytes this way.

Signed-off-by: Mathias Krause 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/connector/cn_proc.c |   18 ++
 1 file changed, 18 insertions(+)

--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -65,6 +65,7 @@ void proc_fork_connector(struct task_str
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -80,6 +81,7 @@ void proc_fork_connector(struct task_str
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
/*  If cn_netlink_send() failed, the data is not sent */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
@@ -96,6 +98,7 @@ void proc_exec_connector(struct task_str
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -106,6 +109,7 @@ void proc_exec_connector(struct task_str
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -122,6 +126,7 @@ void proc_id_connector(struct task_struc
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->what = which_id;
ev->event_data.id.process_pid = task->pid;
ev->event_data.id.process_tgid = task->tgid;
@@ -145,6 +150,7 @@ void proc_id_connector(struct task_struc
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -160,6 +166,7 @@ void proc_sid_connector(struct task_stru
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -170,6 +177,7 @@ void proc_sid_connector(struct task_stru
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -185,6 +193,7 @@ void proc_ptrace_connector(struct task_s
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -203,6 +212,7 @@ void proc_ptrace_connector(struct task_s
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -218,6 +228,7 @@ void proc_comm_connector(struct task_str
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -229,6 +240,7 @@ void proc_comm_connector(struct task_str
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -244,6 +256,7 @@ void proc_coredump_connector(struct task
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&

[PATCH 3.10 06/54] net: do not call sock_put() on TIMEWAIT sockets

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ]

commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.

We should instead use inet_twsk_put()

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/inet_hashtables.c  |2 +-
 net/ipv6/inet6_hashtables.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -287,7 +287,7 @@ begintw:
if (unlikely(!INET_TW_MATCH(sk, net, acookie,
saddr, daddr, ports,
dif))) {
-   sock_put(sk);
+   inet_twsk_put(inet_twsk(sk));
goto begintw;
}
goto out;
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -116,7 +116,7 @@ begintw:
}
if (unlikely(!INET6_TW_MATCH(sk, net, saddr, daddr,
 ports, dif))) {
-   sock_put(sk);
+   inet_twsk_put(inet_twsk(sk));
goto begintw;
}
goto out;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 10/54] net: mv643xx_eth: fix orphaned statistics timer crash

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Sebastian Hesselbarth 

[ Upstream commit f564412c935111c583b787bcc18157377b208e2e ]

The periodic statistics timer gets started at port _probe() time, but
is stopped on _stop() only. In a modular environment, this can cause
the timer to access already deallocated memory, if the module is unloaded
without starting the eth device. To fix this, we add the timer right
before the port is started, instead of at _probe() time.

Signed-off-by: Sebastian Hesselbarth 
Acked-by: Jason Cooper 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/ethernet/marvell/mv643xx_eth.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -2229,6 +2229,7 @@ static int mv643xx_eth_open(struct net_d
mp->int_mask |= INT_TX_END_0 << i;
}
 
+   add_timer(&mp->mib_counters_timer);
port_start(mp);
 
wrlp(mp, INT_MASK_EXT, INT_EXT_LINK_PHY | INT_EXT_TX);
@@ -2737,7 +2738,6 @@ static int mv643xx_eth_probe(struct plat
mp->mib_counters_timer.data = (unsigned long)mp;
mp->mib_counters_timer.function = mib_counters_timer_wrapper;
mp->mib_counters_timer.expires = jiffies + 30 * HZ;
-   add_timer(&mp->mib_counters_timer);
 
spin_lock_init(&mp->mib_counters_lock);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 16/54] xen-netback: Dont destroy the netdev until the vif is shut down

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Paul Durrant 

[ upstream commit id: 279f438e36c0a70b23b86d2090aeec50155034a9 ]

Without this patch, if a frontend cycles through states Closing
and Closed (which Windows frontends need to do) then the netdev
will be destroyed and requires re-invocation of hotplug scripts
to restore state before the frontend can move to Connected. Thus
when udev is not in use the backend gets stuck in InitWait.

With this patch, the netdev is left alone whilst the backend is
still online and is only de-registered and freed just prior to
destroying the vif (which is also nicely symmetrical with the
netdev allocation and registration being done during probe) so
no re-invocation of hotplug scripts is required.

Signed-off-by: Paul Durrant 
Cc: David Vrabel 
Cc: Wei Liu 
Cc: Ian Campbell 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/xen-netback/common.h|1 +
 drivers/net/xen-netback/interface.c |   12 ++--
 drivers/net/xen-netback/xenbus.c|   17 -
 3 files changed, 23 insertions(+), 7 deletions(-)

--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -115,6 +115,7 @@ struct xenvif *xenvif_alloc(struct devic
 int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
   unsigned long rx_ring_ref, unsigned int evtchn);
 void xenvif_disconnect(struct xenvif *vif);
+void xenvif_free(struct xenvif *vif);
 
 void xenvif_get(struct xenvif *vif);
 void xenvif_put(struct xenvif *vif);
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -304,6 +304,9 @@ struct xenvif *xenvif_alloc(struct devic
}
 
netdev_dbg(dev, "Successfully created xenvif\n");
+
+   __module_get(THIS_MODULE);
+
return vif;
 }
 
@@ -369,9 +372,14 @@ void xenvif_disconnect(struct xenvif *vi
if (vif->irq)
unbind_from_irqhandler(vif->irq, vif);
 
-   unregister_netdev(vif->dev);
-
xen_netbk_unmap_frontend_rings(vif);
+}
+
+void xenvif_free(struct xenvif *vif)
+{
+   unregister_netdev(vif->dev);
 
free_netdev(vif->dev);
+
+   module_put(THIS_MODULE);
 }
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -42,7 +42,7 @@ static int netback_remove(struct xenbus_
if (be->vif) {
kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-   xenvif_disconnect(be->vif);
+   xenvif_free(be->vif);
be->vif = NULL;
}
kfree(be);
@@ -203,9 +203,18 @@ static void disconnect_backend(struct xe
 {
struct backend_info *be = dev_get_drvdata(&dev->dev);
 
+   if (be->vif)
+   xenvif_disconnect(be->vif);
+}
+
+static void destroy_backend(struct xenbus_device *dev)
+{
+   struct backend_info *be = dev_get_drvdata(&dev->dev);
+
if (be->vif) {
+   kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-   xenvif_disconnect(be->vif);
+   xenvif_free(be->vif);
be->vif = NULL;
}
 }
@@ -237,14 +246,11 @@ static void frontend_changed(struct xenb
case XenbusStateConnected:
if (dev->state == XenbusStateConnected)
break;
-   backend_create_xenvif(be);
if (be->vif)
connect(be);
break;
 
case XenbusStateClosing:
-   if (be->vif)
-   kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
disconnect_backend(dev);
xenbus_switch_state(dev, XenbusStateClosing);
break;
@@ -253,6 +259,7 @@ static void frontend_changed(struct xenb
xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
+   destroy_backend(dev);
/* fall through if not online */
case XenbusStateUnknown:
device_unregister(&dev->dev);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 03/54] tcp: must unclone packets before mangling them

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit c52e2421f7368fd36cbe330d2cf41b10452e39a9 ]

TCP stack should make sure it owns skbs before mangling them.

We had various crashes using bnx2x, and it turned out gso_size
was cleared right before bnx2x driver was populating TC descriptor
of the _previous_ packet send. TCP stack can sometime retransmit
packets that are still in Qdisc.

Of course we could make bnx2x driver more robust (using
ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack.

We have identified two points where skb_unclone() was needed.

This patch adds a WARN_ON_ONCE() to warn us if we missed another
fix of this kind.

Kudos to Neal for finding the root cause of this bug. Its visible
using small MSS.

Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_output.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -976,6 +976,9 @@ static void tcp_queue_skb(struct sock *s
 static void tcp_set_skb_tso_segs(const struct sock *sk, struct sk_buff *skb,
 unsigned int mss_now)
 {
+   /* Make sure we own this skb before messing gso_size/gso_segs */
+   WARN_ON_ONCE(skb_cloned(skb));
+
if (skb->len <= mss_now || !sk_can_gso(sk) ||
skb->ip_summed == CHECKSUM_NONE) {
/* Avoid the costly divide in the normal
@@ -1057,9 +1060,7 @@ int tcp_fragment(struct sock *sk, struct
if (nsize < 0)
nsize = 0;
 
-   if (skb_cloned(skb) &&
-   skb_is_nonlinear(skb) &&
-   pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+   if (skb_unclone(skb, GFP_ATOMIC))
return -ENOMEM;
 
/* Get a new skb... force flag on. */
@@ -2334,6 +2335,8 @@ int __tcp_retransmit_skb(struct sock *sk
int oldpcount = tcp_skb_pcount(skb);
 
if (unlikely(oldpcount > 1)) {
+   if (skb_unclone(skb, GFP_ATOMIC))
+   return -ENOMEM;
tcp_init_tso_segs(sk, skb, cur_mss);
tcp_adjust_pcount(sk, skb, oldpcount - 
tcp_skb_pcount(skb));
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 19/54] l2tp: must disable bh before calling l2tp_xmit_skb()

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 455cc32bf128e114455d11ad919321ab89a2c312 ]

François Cachereul made a very nice bug report and suspected
the bh_lock_sock() / bh_unlok_sock() pair used in l2tp_xmit_skb() from
process context was not good.

This problem was added by commit 6af88da14ee284aaad6e4326da09a89191ab6165
("l2tp: Fix locking in l2tp_core.c").

l2tp_eth_dev_xmit() runs from BH context, so we must disable BH
from other l2tp_xmit_skb() users.

[  452.060011] BUG: soft lockup - CPU#1 stuck for 23s! [accel-pppd:6662]
[  452.061757] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox
ppp_generic slhc ipv6 ext3 mbcache jbd virtio_balloon xfs exportfs dm_mod
virtio_blk ata_generic virtio_net floppy ata_piix libata virtio_pci virtio_ring 
virtio [last unloaded: scsi_wait_scan]
[  452.064012] CPU 1
[  452.080015] BUG: soft lockup - CPU#2 stuck for 23s! [accel-pppd:6643]
[  452.080015] CPU 2
[  452.080015]
[  452.080015] Pid: 6643, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs 
Bochs
[  452.080015] RIP: 0010:[]  [] 
do_raw_spin_lock+0x17/0x1f
[  452.080015] RSP: 0018:88007125fc18  EFLAGS: 0293
[  452.080015] RAX: aba9 RBX: 811d0703 RCX: 
[  452.080015] RDX: 00ab RSI: 8800711f6896 RDI: 8800745c8110
[  452.080015] RBP: 88007125fc18 R08: 0020 R09: 
[  452.080015] R10:  R11: 0280 R12: 0286
[  452.080015] R13: 0020 R14: 0240 R15: 
[  452.080015] FS:  7fdc0cc24700() GS:8800b6f0() 
knlGS:
[  452.080015] CS:  0010 DS:  ES:  CR0: 80050033
[  452.080015] CR2: 7fdb054899b8 CR3: 74404000 CR4: 06a0
[  452.080015] DR0:  DR1:  DR2: 
[  452.080015] DR3:  DR6: 0ff0 DR7: 0400
[  452.080015] Process accel-pppd (pid: 6643, threadinfo 88007125e000, task 
8800b27e6dd0)
[  452.080015] Stack:
[  452.080015]  88007125fc28 81256559 88007125fc98 
a01b2bd1
[  452.080015]  88007125fc58 000c 029490d0 
009c71dbe25e
[  452.080015]  005c 0008000e  
880071170600
[  452.080015] Call Trace:
[  452.080015]  [] _raw_spin_lock+0xe/0x10
[  452.080015]  [] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
[  452.080015]  [] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.080015]  [] __sock_sendmsg_nosec+0x22/0x24
[  452.080015]  [] sock_sendmsg+0xa1/0xb6
[  452.080015]  [] ? __schedule+0x5c1/0x616
[  452.080015]  [] ? __dequeue_signal+0xb7/0x10c
[  452.080015]  [] ? fget_light+0x75/0x89
[  452.080015]  [] ? sockfd_lookup_light+0x20/0x56
[  452.080015]  [] sys_sendto+0x10c/0x13b
[  452.080015]  [] system_call_fastpath+0x16/0x1b
[  452.080015] Code: 81 48 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d 
c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 
eb f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3
[  452.080015] Call Trace:
[  452.080015]  [] _raw_spin_lock+0xe/0x10
[  452.080015]  [] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
[  452.080015]  [] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.080015]  [] __sock_sendmsg_nosec+0x22/0x24
[  452.080015]  [] sock_sendmsg+0xa1/0xb6
[  452.080015]  [] ? __schedule+0x5c1/0x616
[  452.080015]  [] ? __dequeue_signal+0xb7/0x10c
[  452.080015]  [] ? fget_light+0x75/0x89
[  452.080015]  [] ? sockfd_lookup_light+0x20/0x56
[  452.080015]  [] sys_sendto+0x10c/0x13b
[  452.080015]  [] system_call_fastpath+0x16/0x1b
[  452.064012]
[  452.064012] Pid: 6662, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs 
Bochs
[  452.064012] RIP: 0010:[]  [] 
do_raw_spin_lock+0x19/0x1f
[  452.064012] RSP: 0018:8800b6e83ba0  EFLAGS: 0297
[  452.064012] RAX: aaa9 RBX: 8800b6e83b40 RCX: 0002
[  452.064012] RDX: 00aa RSI: 000a RDI: 8800745c8110
[  452.064012] RBP: 8800b6e83ba0 R08: c802 R09: 001c
[  452.064012] R10: 880071096c4e R11: 0006 R12: 8800b6e83b18
[  452.064012] R13: 8125d51e R14: 8800b6e83ba0 R15: 880072a589c0
[  452.064012] FS:  7fdc0b81e700() GS:8800b6e8() 
knlGS:
[  452.064012] CS:  0010 DS:  ES:  CR0: 80050033
[  452.064012] CR2: 00625208 CR3: 74404000 CR4: 06a0
[  452.064012] DR0:  DR1:  DR2: 
[  452.064012] DR3:  DR6: 0ff0 DR7: 0400
[  452.064012] Process accel-pppd (pid: 6662, threadinfo 88007129a000, task 
8800744f7410)
[  452.064012] Stack:
[  452.064012]  8800b6e83bb0 81256559 8800b6e83bc0 
8121c64a
[  452.064012]  8800

[PATCH 3.10 18/54] vti: get rid of nf mark rule in prerouting

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Christophe Gouault 

[ Upstream commit 7263a5187f9e9de45fcb51349cf0e031142c19a1 ]

This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).

Currently:

- it is necessary to identify and mark inbound IPsec
  packets destined to each vti interface, via netfilter rules in
  the mangle table at prerouting hook.

- the vti module cannot retrieve the right tunnel in input since
  commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
  is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.

- the i_key is used by the outbound processing as a mark to lookup
  for the right SP and SA bundle.

This patch uses the o_key to store the vti mark (instead of i_key) and
enables:

- to avoid the need for previously marking the inbound skbuffs via a
  netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
  packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
  as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
  o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
  used nor changed by the vti interface. Only the vti interface o_key
  is used.

SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.

The vti interface must be created as follows (i_key = 0, o_key = mark):

   ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1

The SPs attached to vti1 must be created as follows (mark = vti1 o_key):

   ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
  proto esp mode tunnel
   ip xfrm policy add dir in  mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
  proto esp mode tunnel

The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:

   ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
 enc "cbc(aes)" "azertyuiopqsdfgh"

   ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
 enc "cbc(aes)" "sqbdhgqsdjqjsdfh"

To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.

To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:

   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm

The outgoing traffic is steered to vti1 by a route via the vti interface:

   ip route add 192.168.0.0/16 dev vti1

The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.

Signed-off-by: Christophe Gouault 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/ip_vti.c |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -285,8 +285,17 @@ static int vti_rcv(struct sk_buff *skb)
tunnel = vti_tunnel_lookup(dev_net(skb->dev), iph->saddr, iph->daddr);
if (tunnel != NULL) {
struct pcpu_tstats *tstats;
+   u32 oldmark = skb->mark;
+   int ret;
 
-   if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
+
+   /* temporarily mark the skb with the tunnel o_key, to
+* only match policies with this mark.
+*/
+   skb->mark = be32_to_cpu(tunnel->parms.o_key);
+   ret = xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb);
+   skb->mark = oldmark;
+   if (!ret)
return -1;
 
tstats = this_cpu_ptr(tunnel->dev->tstats);
@@ -295,7 +304,6 @@ static int vti_rcv(struct sk_buff *skb)
tstats->rx_bytes += skb->len;
u64_stats_update_end(&tstats->syncp);
 
-   skb->mark = 0;
secpath_reset(skb);
skb->dev = tunnel->dev;
return 1;
@@ -327,7 +335,7 @@ static netdev_tx_t vti_tunnel_xmit(struc
 
memset(&fl4, 0, sizeof(fl4));
flowi4_init_output(&fl4, tunnel->parms.link,
-  be32_to_cpu(tunnel->parms.i_key), RT_TOS(tos),
+  be32_to_cpu(tunnel->parms.o_key), RT_TOS(tos),
   RT_SCOPE_UNIVERSE,
   IPPROTO_IPIP, 0,
   dst, tiph->saddr, 0, 0);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 17/54] net: vlan: fix nlmsg size calculation in vlan_get_size()

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Marc Kleine-Budde 

[ Upstream commit c33a39c575068c2ea9bffb22fd6de2df19c74b89 ]

This patch fixes the calculation of the nlmsg size, by adding the missing
nla_total_size().

Cc: Patrick McHardy 
Signed-off-by: Marc Kleine-Budde 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/8021q/vlan_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -171,7 +171,7 @@ static size_t vlan_get_size(const struct
 
return nla_total_size(2) +  /* IFLA_VLAN_PROTOCOL */
   nla_total_size(2) +  /* IFLA_VLAN_ID */
-  sizeof(struct ifla_vlan_flags) + /* IFLA_VLAN_FLAGS */
+  nla_total_size(sizeof(struct ifla_vlan_flags)) + /* 
IFLA_VLAN_FLAGS */
   vlan_qos_map_size(vlan->nr_ingress_mappings) +
   vlan_qos_map_size(vlan->nr_egress_mappings);
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 22/54] connector: use nlmsg_len() to check message length

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Mathias Krause 

[ Upstream commit 162b2bedc084d2d908a04c93383ba02348b648b0 ]

The current code tests the length of the whole netlink message to be
at least as long to fit a cn_msg. This is wrong as nlmsg_len includes
the length of the netlink message header. Use nlmsg_len() instead to
fix this "off-by-NLMSG_HDRLEN" size check.

Signed-off-by: Mathias Krause 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/connector/connector.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -157,17 +157,18 @@ static int cn_call_callback(struct sk_bu
 static void cn_rx_skb(struct sk_buff *__skb)
 {
struct nlmsghdr *nlh;
-   int err;
struct sk_buff *skb;
+   int len, err;
 
skb = skb_get(__skb);
 
if (skb->len >= NLMSG_HDRLEN) {
nlh = nlmsg_hdr(skb);
+   len = nlmsg_len(nlh);
 
-   if (nlh->nlmsg_len < sizeof(struct cn_msg) ||
+   if (len < (int)sizeof(struct cn_msg) ||
skb->len < nlh->nlmsg_len ||
-   nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) {
+   len > CONNECTOR_MAX_MSG_SIZE) {
kfree_skb(skb);
return;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 20/54] farsync: fix info leak in ioctl

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: =?UTF-8?q?Salva=20Peir=C3=B3?= 

[ Upstream commit 96b340406724d87e4621284ebac5e059d67b2194 ]

The fst_get_iface() code fails to initialize the two padding bytes of
struct sync_serial_settings after the ->loopback member. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Dan Carpenter 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/wan/farsync.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/wan/farsync.c
+++ b/drivers/net/wan/farsync.c
@@ -1972,6 +1972,7 @@ fst_get_iface(struct fst_card_info *card
}
 
i = port->index;
+   memset(&sync, 0, sizeof(sync));
sync.clock_rate = FST_RDL(card, portConfig[i].lineSpeed);
/* Lucky card and linux use same encoding here */
sync.clock_type = FST_RDB(card, portConfig[i].internalClock) ==


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 21/54] unix_diag: fix info leak

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Mathias Krause 

[ Upstream commit 6865d1e834be84ddd5808d93d5035b492346c64a ]

When filling the netlink message we miss to wipe the pad field,
therefore leak one byte of heap memory to userland. Fix this by
setting pad to 0.

Signed-off-by: Mathias Krause 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/unix/diag.c |1 +
 1 file changed, 1 insertion(+)

--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -124,6 +124,7 @@ static int sk_diag_fill(struct sock *sk,
rep->udiag_family = AF_UNIX;
rep->udiag_type = sk->sk_type;
rep->udiag_state = sk->sk_state;
+   rep->pad = 0;
rep->udiag_ino = sk_ino;
sock_diag_save_cookie(sk, rep->udiag_cookie);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 15/54] net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Fabio Estevam 

[ Upstream commit cb03db9d0e964568407fb08ea46cc2b6b7f67587 ]

net_secret() is only used when CONFIG_IPV6 or CONFIG_INET are selected.

Building a defconfig with both of these symbols unselected (Using the ARM
at91sam9rl_defconfig, for example) leads to the following build warning:

$ make at91sam9rl_defconfig
#
# configuration written to .config
#

$ make net/core/secure_seq.o
scripts/kconfig/conf --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
make[1]: `include/generated/mach-types.h' is up to date.
  CALLscripts/checksyscalls.sh
  CC  net/core/secure_seq.o
net/core/secure_seq.c:17:13: warning: 'net_secret_init' defined but not used 
[-Wunused-function]

Fix this warning by protecting the definition of net_secret() with these
symbols.

Reported-by: Olof Johansson 
Signed-off-by: Fabio Estevam 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/core/secure_seq.c |2 ++
 1 file changed, 2 insertions(+)

--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -10,6 +10,7 @@
 
 #include 
 
+#if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
 #define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
 static u32 net_secret[NET_SECRET_SIZE] cacheline_aligned;
@@ -29,6 +30,7 @@ static void net_secret_init(void)
cmpxchg(&net_secret[--i], 0, tmp);
}
 }
+#endif
 
 #ifdef CONFIG_INET
 static u32 seq_scale(u32 seq)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 26/54] virtio-net: refill only when device is up during setting queues

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Jason Wang 

[ Upstream commit 35ed159bfd96a7547ec277ed8b550c7cbd9841b6 ]

We used to schedule the refill work unconditionally after changing the
number of queues. This may lead an issue if the device is not
up. Since we only try to cancel the work in ndo_stop(), this may cause
the refill work still work after removing the device. Fix this by only
schedule the work when device is up.

The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2.
(virtio-net: fix the race between channels setting and refill)

Signed-off-by: Jason Wang 
Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/virtio_net.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -916,7 +916,9 @@ static int virtnet_set_queues(struct vir
return -EINVAL;
} else {
vi->curr_queue_pairs = queue_pairs;
-   schedule_delayed_work(&vi->refill, 0);
+   /* virtnet_open() will refill when device is going to up. */
+   if (dev->flags & IFF_UP)
+   schedule_delayed_work(&vi->refill, 0);
}
 
return 0;
@@ -1714,7 +1716,9 @@ static int virtnet_restore(struct virtio
vi->config_enable = true;
mutex_unlock(&vi->config_lock);
 
+   rtnl_lock();
virtnet_set_queues(vi, vi->curr_queue_pairs);
+   rtnl_unlock();
 
return 0;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 27/54] bridge: Correctly clamp MAX forward_delay when enabling STP

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Vlad Yasevich 

[ Upstream commit 4b6c7879d84ad06a2ac5b964808ed599187a188d ]

Commit be4f154d5ef0ca147ab6bcd38857a774133f5450
bridge: Clamp forward_delay when enabling STP
had a typo when attempting to clamp maximum forward delay.

It is possible to set bridge_forward_delay to be higher then
permitted maximum when STP is off.  When turning STP on, the
higher then allowed delay has to be clamed down to max value.

Signed-off-by: Vlad Yasevich 
CC: Herbert Xu 
CC: Stephen Hemminger 
Reviewed-by: Veaceslav Falico 
Acked-by: Herbert Xu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/bridge/br_stp_if.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -134,7 +134,7 @@ static void br_stp_start(struct net_brid
 
if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY)
__br_set_forward_delay(br, BR_MIN_FORWARD_DELAY);
-   else if (br->bridge_forward_delay < BR_MAX_FORWARD_DELAY)
+   else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY)
__br_set_forward_delay(br, BR_MAX_FORWARD_DELAY);
 
if (r == 0) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 25/54] virtio-net: fix the race between channels setting and refill

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Jason Wang 

[ Upstream commit 9b9cd8024a2882e896c65222aa421d461354e3f2 ]

Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx queues
which are being used) tries to refill on demand when changing the number of
channels by call try_refill_recv() directly, this may race:

- the refill work who may do the refill in the same time
- the try_refill_recv() called in bh since napi was not disabled

Which may led guest complain during setting channels:

virtio_net virtio0: input.1:id 0 is not a head!

Solve this issue by scheduling a refill work which can guarantee the
serialization of refill.

Signed-off-by: Jason Wang 
Cc: Sasha Levin 
Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Signed-off-by: Rusty Russell 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/virtio_net.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -902,7 +902,6 @@ static int virtnet_set_queues(struct vir
struct scatterlist sg;
struct virtio_net_ctrl_mq s;
struct net_device *dev = vi->dev;
-   int i;
 
if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
return 0;
@@ -916,10 +915,8 @@ static int virtnet_set_queues(struct vir
 queue_pairs);
return -EINVAL;
} else {
-   for (i = vi->curr_queue_pairs; i < queue_pairs; i++)
-   if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
-   schedule_delayed_work(&vi->refill, 0);
vi->curr_queue_pairs = queue_pairs;
+   schedule_delayed_work(&vi->refill, 0);
}
 
return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 39/54] ipv6: probe routes asynchronous in rt6_probe

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Hannes Frederic Sowa 

[ Upstream commit c2f17e827b419918c856131f592df9521e1a38e3 ]

Routes need to be probed asynchronous otherwise the call stack gets
exhausted when the kernel attemps to deliver another skb inline, like
e.g. xt_TEE does, and we probe at the same time.

We update neigh->updated still at once, otherwise we would send to
many probes.

Cc: Julian Anastasov 
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv6/route.c |   38 +++---
 1 file changed, 31 insertions(+), 7 deletions(-)

--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -473,6 +473,24 @@ out:
 }
 
 #ifdef CONFIG_IPV6_ROUTER_PREF
+struct __rt6_probe_work {
+   struct work_struct work;
+   struct in6_addr target;
+   struct net_device *dev;
+};
+
+static void rt6_probe_deferred(struct work_struct *w)
+{
+   struct in6_addr mcaddr;
+   struct __rt6_probe_work *work =
+   container_of(w, struct __rt6_probe_work, work);
+
+   addrconf_addr_solict_mult(&work->target, &mcaddr);
+   ndisc_send_ns(work->dev, NULL, &work->target, &mcaddr, NULL);
+   dev_put(work->dev);
+   kfree(w);
+}
+
 static void rt6_probe(struct rt6_info *rt)
 {
struct neighbour *neigh;
@@ -496,17 +514,23 @@ static void rt6_probe(struct rt6_info *r
 
if (!neigh ||
time_after(jiffies, neigh->updated + 
rt->rt6i_idev->cnf.rtr_probe_interval)) {
-   struct in6_addr mcaddr;
-   struct in6_addr *target;
+   struct __rt6_probe_work *work;
 
-   if (neigh) {
+   work = kmalloc(sizeof(*work), GFP_ATOMIC);
+
+   if (neigh && work)
neigh->updated = jiffies;
+
+   if (neigh)
write_unlock(&neigh->lock);
-   }
 
-   target = (struct in6_addr *)&rt->rt6i_gateway;
-   addrconf_addr_solict_mult(target, &mcaddr);
-   ndisc_send_ns(rt->dst.dev, NULL, target, &mcaddr, NULL);
+   if (work) {
+   INIT_WORK(&work->work, rt6_probe_deferred);
+   work->target = rt->rt6i_gateway;
+   dev_hold(rt->dst.dev);
+   work->dev = rt->dst.dev;
+   schedule_work(&work->work);
+   }
} else {
 out:
write_unlock(&neigh->lock);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 38/54] netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Julian Anastasov 

[ Upstream commit 56e42441ed54b092d6c7411138ce60d049e7c731 ]

Now when rt6_nexthop() can return nexthop address we can use it
for proper nexthop comparison of directly connected destinations.
For more information refer to commit bbb5823cf742a7
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").

Signed-off-by: Julian Anastasov 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/netfilter/nf_conntrack_h323_main.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -778,8 +778,8 @@ static int callforward_do_filter(const u
   flowi6_to_flowi(&fl1), false)) {
if (!afinfo->route(&init_net, (struct dst_entry **)&rt2,
   flowi6_to_flowi(&fl2), false)) {
-   if (!memcmp(&rt1->rt6i_gateway, 
&rt2->rt6i_gateway,
-   sizeof(rt1->rt6i_gateway)) &&
+   if (ipv6_addr_equal(rt6_nexthop(rt1),
+   rt6_nexthop(rt2)) &&
rt1->dst.dev == rt2->dst.dev)
ret = 1;
dst_release(&rt2->dst);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 30/54] sctp: Perform software checksum if packet has to be fragmented.

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Vlad Yasevich 

[ Upstream commit d2dbbba77e95dff4b4f901fee236fef6d9552072 ]

IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum.
This causes problems if SCTP packets has to be fragmented and
ipsummed has been set to PARTIAL due to checksum offload support.
This condition can happen when retransmitting after MTU discover,
or when INIT or other control chunks are larger then MTU.
Check for the rare fragmentation condition in SCTP and use software
checksum calculation in this case.

CC: Fan Du 
Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/sctp/output.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -548,7 +548,7 @@ int sctp_packet_transmit(struct sctp_pac
 */
if (!sctp_checksum_disable) {
if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
-   (dst_xfrm(dst) != NULL)) {
+   (dst_xfrm(dst) != NULL) || packet->ipfragok) {
__u32 crc32 = sctp_start_cksum((__u8 *)sh, 
cksum_buf_len);
 
/* 3) Put the resultant value into the checksum field 
in the


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 24/54] virtio-net: dont respond to cpu hotplug notifier if were not ready

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Jason Wang 

[ Upstream commit 3ab098df35f8b98b6553edc2e40234af512ba877 ]

We're trying to re-configure the affinity unconditionally in cpu hotplug
callback. This may lead the issue during resuming from s3/s4 since

- virt queues haven't been allocated at that time.
- it's unnecessary since thaw method will re-configure the affinity.

Fix this issue by checking the config_enable and do nothing is we're not ready.

The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
(virtio-net: reset virtqueue affinity when doing cpu hotplug).

Acked-by: Michael S. Tsirkin 
Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Cc: Wanlong Gao 
Reviewed-by: Wanlong Gao 
Signed-off-by: Jason Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/virtio_net.c |8 
 1 file changed, 8 insertions(+)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1097,6 +1097,11 @@ static int virtnet_cpu_callback(struct n
 {
struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
 
+   mutex_lock(&vi->config_lock);
+
+   if (!vi->config_enable)
+   goto done;
+
switch(action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
case CPU_DOWN_FAILED:
@@ -1109,6 +1114,9 @@ static int virtnet_cpu_callback(struct n
default:
break;
}
+
+done:
+   mutex_unlock(&vi->config_lock);
return NOTIFY_OK;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 34/54] net: fix cipso packet validation when !NETLABEL

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Seif Mazareeb 

[ Upstream commit f2e5ddcc0d12f9c4c7b254358ad245c9dddce13b ]

When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.

This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:

"icmpsic -s rand -d  -r 123456"
wait (1-2 min)

Signed-off-by: Seif Mazareeb 
Acked-by: Paul Moore 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 include/net/cipso_ipv4.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/include/net/cipso_ipv4.h
+++ b/include/net/cipso_ipv4.h
@@ -290,6 +290,7 @@ static inline int cipso_v4_validate(cons
unsigned char err_offset = 0;
u8 opt_len = opt[1];
u8 opt_iter;
+   u8 tag_len;
 
if (opt_len < 8) {
err_offset = 1;
@@ -302,11 +303,12 @@ static inline int cipso_v4_validate(cons
}
 
for (opt_iter = 6; opt_iter < opt_len;) {
-   if (opt[opt_iter + 1] > (opt_len - opt_iter)) {
+   tag_len = opt[opt_iter + 1];
+   if ((tag_len == 0) || (opt[opt_iter + 1] > (opt_len - 
opt_iter))) {
err_offset = opt_iter + 1;
goto out;
}
-   opt_iter += opt[opt_iter + 1];
+   opt_iter += tag_len;
}
 
 out:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 31/54] wanxl: fix info leak in ioctl

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Salva Peiró 

[ Upstream commit 2b13d06c9584b4eb773f1e80bbaedab9a1c344e1 ]

The wanxl_ioctl() code fails to initialize the two padding bytes of
struct sync_serial_settings after the ->loopback member. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Salva Peiró 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/wan/wanxl.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/wan/wanxl.c
+++ b/drivers/net/wan/wanxl.c
@@ -355,6 +355,7 @@ static int wanxl_ioctl(struct net_device
ifr->ifr_settings.size = size; /* data size wanted */
return -ENOBUFS;
}
+   memset(&line, 0, sizeof(line));
line.clock_type = get_status(port)->clocking;
line.clock_rate = 0;
line.loopback = 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 29/54] sctp: Use software crc32 checksum when xfrm transform will happen.

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Fan Du 

[ Upstream commit 27127a82561a2a3ed955ce207048e1b066a80a2a ]

igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

Signed-off-by: Fan Du 
Cc: Neil Horman 
Cc: Steffen Klassert 
Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/sctp/output.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -547,7 +547,8 @@ int sctp_packet_transmit(struct sctp_pac
 * by CRC32-C as described in .
 */
if (!sctp_checksum_disable) {
-   if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
+   if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
+   (dst_xfrm(dst) != NULL)) {
__u32 crc32 = sctp_start_cksum((__u8 *)sh, 
cksum_buf_len);
 
/* 3) Put the resultant value into the checksum field 
in the


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 32/54] be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Vasundhara Volam 

[ Upstream commit 0fb88d61bc60779dde88b0fc268da17eb81d0412 ]

It is a required field for all TX_CREATE cmd versions > 0.
This fixes a driver initialization failure, caused by recent SH-R Firmwares
(versions > 10.0.639.0) failing the TX_CREATE cmd when if_id field is
not passed.

Signed-off-by: Sathya Perla 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/ethernet/emulex/benet/be_cmds.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -1150,7 +1150,6 @@ int be_cmd_txq_create(struct be_adapter
 
if (lancer_chip(adapter)) {
req->hdr.version = 1;
-   req->if_id = cpu_to_le16(adapter->if_handle);
} else if (BEx_chip(adapter)) {
if (adapter->function_caps & BE_FUNCTION_CAPS_SUPER_NIC)
req->hdr.version = 2;
@@ -1158,6 +1157,8 @@ int be_cmd_txq_create(struct be_adapter
req->hdr.version = 2;
}
 
+   if (req->hdr.version > 0)
+   req->if_id = cpu_to_le16(adapter->if_handle);
req->num_pages = PAGES_4K_SPANNED(q_mem->va, q_mem->size);
req->ulp_num = BE_ULP1_NUM;
req->type = BE_ETH_TX_RING_TYPE_STANDARD;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 37/54] ipv6: fill rt6i_gateway with nexthop address

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Julian Anastasov 

[ Upstream commit 550bab42f83308c9d6ab04a980cc4333cef1c8fa ]

Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.

The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.

Signed-off-by: Julian Anastasov 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 include/net/ip6_route.h |6 ++
 net/ipv6/ip6_output.c   |4 ++--
 net/ipv6/route.c|8 ++--
 3 files changed, 10 insertions(+), 8 deletions(-)

--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -194,11 +194,9 @@ static inline int ip6_skb_dst_mtu(struct
   skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
 }
 
-static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct 
in6_addr *dest)
+static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt)
 {
-   if (rt->rt6i_flags & RTF_GATEWAY || !ipv6_addr_any(&rt->rt6i_gateway))
-   return &rt->rt6i_gateway;
-   return dest;
+   return &rt->rt6i_gateway;
 }
 
 #endif
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -130,7 +130,7 @@ static int ip6_finish_output2(struct sk_
}
 
rcu_read_lock_bh();
-   nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
+   nexthop = rt6_nexthop((struct rt6_info *)dst);
neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
if (unlikely(!neigh))
neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
@@ -898,7 +898,7 @@ static int ip6_dst_lookup_tail(struct so
 */
rt = (struct rt6_info *) *dst;
rcu_read_lock_bh();
-   n = __ipv6_neigh_lookup_noref(rt->dst.dev, rt6_nexthop(rt, 
&fl6->daddr));
+   n = __ipv6_neigh_lookup_noref(rt->dst.dev, rt6_nexthop(rt));
err = n && !(n->nud_state & NUD_VALID) ? -EINVAL : 0;
rcu_read_unlock_bh();
 
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -848,7 +848,6 @@ static struct rt6_info *rt6_alloc_cow(st
if (ort->rt6i_dst.plen != 128 &&
ipv6_addr_equal(&ort->rt6i_dst.addr, daddr))
rt->rt6i_flags |= RTF_ANYCAST;
-   rt->rt6i_gateway = *daddr;
}
 
rt->rt6i_flags |= RTF_CACHE;
@@ -1245,6 +1244,7 @@ struct dst_entry *icmp6_dst_alloc(struct
rt->dst.flags |= DST_HOST;
rt->dst.output  = ip6_output;
atomic_set(&rt->dst.__refcnt, 1);
+   rt->rt6i_gateway  = fl6->daddr;
rt->rt6i_dst.addr = fl6->daddr;
rt->rt6i_dst.plen = 128;
rt->rt6i_idev = idev;
@@ -1801,7 +1801,10 @@ static struct rt6_info *ip6_rt_copy(stru
in6_dev_hold(rt->rt6i_idev);
rt->dst.lastuse = jiffies;
 
-   rt->rt6i_gateway = ort->rt6i_gateway;
+   if (ort->rt6i_flags & RTF_GATEWAY)
+   rt->rt6i_gateway = ort->rt6i_gateway;
+   else
+   rt->rt6i_gateway = *dest;
rt->rt6i_flags = ort->rt6i_flags;
if ((ort->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF)) ==
(RTF_DEFAULT | RTF_ADDRCONF))
@@ -2088,6 +2091,7 @@ struct rt6_info *addrconf_dst_alloc(stru
else
rt->rt6i_flags |= RTF_LOCAL;
 
+   rt->rt6i_gateway  = *addr;
rt->rt6i_dst.addr = *addr;
rt->rt6i_dst.plen = 128;
rt->rt6i_table = fib6_get_table(net, RT6_TABLE_LOCAL);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 50/54] wireless: radiotap: fix parsing buffer overrun

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Johannes Berg 

commit f5563318ff1bde15b10e736e97ffce13be08bc1a upstream.

When parsing an invalid radiotap header, the parser can overrun
the buffer that is passed in because it doesn't correctly check
 1) the minimum radiotap header size
 2) the space for extended bitmaps

The first issue doesn't affect any in-kernel user as they all
check the minimum size before calling the radiotap function.
The second issue could potentially affect the kernel if an skb
is passed in that consists only of the radiotap header with a
lot of extended bitmaps that extend past the SKB. In that case
a read-only buffer overrun by at most 4 bytes is possible.

Fix this by adding the appropriate checks to the parser.

Reported-by: Evan Huus 
Signed-off-by: Johannes Berg 
Signed-off-by: Greg Kroah-Hartman 

---
 net/wireless/radiotap.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/net/wireless/radiotap.c
+++ b/net/wireless/radiotap.c
@@ -97,6 +97,10 @@ int ieee80211_radiotap_iterator_init(
struct ieee80211_radiotap_header *radiotap_header,
int max_length, const struct ieee80211_radiotap_vendor_namespaces *vns)
 {
+   /* check the radiotap header can actually be present */
+   if (max_length < sizeof(struct ieee80211_radiotap_header))
+   return -EINVAL;
+
/* Linux only supports version 0 radiotap format */
if (radiotap_header->it_version)
return -EINVAL;
@@ -131,7 +135,8 @@ int ieee80211_radiotap_iterator_init(
 */
 
if ((unsigned long)iterator->_arg -
-   (unsigned long)iterator->_rtheader >
+   (unsigned long)iterator->_rtheader +
+   sizeof(uint32_t) >
(unsigned long)iterator->_max_length)
return -EINVAL;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 33/54] net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Daniel Borkmann 

[ Upstream commit 90c6bd34f884cd9cee21f1d152baf6c18bcac949 ]

In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").

We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.

On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.

As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.

Before, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
msg_flags=0}, 0) = 5

After, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
msg_flags=0}, 0) = 5

Signed-off-by: Daniel Borkmann 
Cc: Eric Dumazet 
Cc: Eric W. Biederman 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/unix/af_unix.c |   10 ++
 1 file changed, 10 insertions(+)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1245,6 +1245,15 @@ static int unix_socketpair(struct socket
return 0;
 }
 
+static void unix_sock_inherit_flags(const struct socket *old,
+   struct socket *new)
+{
+   if (test_bit(SOCK_PASSCRED, &old->flags))
+   set_bit(SOCK_PASSCRED, &new->flags);
+   if (test_bit(SOCK_PASSSEC, &old->flags))
+   set_bit(SOCK_PASSSEC, &new->flags);
+}
+
 static int unix_accept(struct socket *sock, struct socket *newsock, int flags)
 {
struct sock *sk = sock->sk;
@@ -1279,6 +1288,7 @@ static int unix_accept(struct socket *so
/* attach accepted sock to socket */
unix_state_lock(tsk);
newsock->state = SS_CONNECTED;
+   unix_sock_inherit_flags(sock, newsock);
sock_graft(tsk, newsock);
unix_state_unlock(tsk);
return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 28/54] net: dst: provide accessor function to dst->xfrm

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Vlad Yasevich 

[ Upstream commit e87b3998d795123b4139bc3f25490dd236f68212 ]

dst->xfrm is conditionally defined.  Provide accessor funtion that
is always available.

Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 include/net/dst.h |   12 
 1 file changed, 12 insertions(+)

--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,10 +477,22 @@ static inline struct dst_entry *xfrm_loo
 {
return dst_orig;
 } 
+
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+   return NULL;
+}
+
 #else
 extern struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry 
*dst_orig,
 const struct flowi *fl, struct sock *sk,
 int flags);
+
+/* skb attached with this dst needs transformation if dst->xfrm is valid */
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+   return dst->xfrm;
+}
 #endif
 
 #endif /* _NET_DST_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 41/54] ARM: 7851/1: check for number of arguments in syscall_get/set_arguments()

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: AKASHI Takahiro 

commit 3c1532df5c1b54b5f6246cdef94eeb73a39fe43a upstream.

In ftrace_syscall_enter(),
syscall_get_arguments(..., 0, n, ...)
if (i == 0) {  ...; n--;}
memcpy(..., n * sizeof(args[0]));
If 'number of arguments(n)' is zero and 'argument index(i)' is also zero in
syscall_get_arguments(), none of arguments should be copied by memcpy().
Otherwise 'n--' can be a big positive number and unexpected amount of data
will be copied. Tracing system calls which take no argument, say sync(void),
may hit this case and eventually make the system corrupted.
This patch fixes the issue both in syscall_get_arguments() and
syscall_set_arguments().

Acked-by: Will Deacon 
Signed-off-by: AKASHI Takahiro 
Signed-off-by: Will Deacon 
Signed-off-by: Russell King 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/arm/include/asm/syscall.h |6 ++
 1 file changed, 6 insertions(+)

--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -57,6 +57,9 @@ static inline void syscall_get_arguments
 unsigned int i, unsigned int n,
 unsigned long *args)
 {
+   if (n == 0)
+   return;
+
if (i + n > SYSCALL_MAX_ARGS) {
unsigned long *args_bad = args + SYSCALL_MAX_ARGS - i;
unsigned int n_bad = n + i - SYSCALL_MAX_ARGS;
@@ -81,6 +84,9 @@ static inline void syscall_set_arguments
 unsigned int i, unsigned int n,
 const unsigned long *args)
 {
+   if (n == 0)
+   return;
+
if (i + n > SYSCALL_MAX_ARGS) {
pr_warning("%s called with max args %d, handling only %d\n",
   __func__, i + n, SYSCALL_MAX_ARGS);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 52/54] USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Diego Elio Pettenò 

commit c9d09dc7ad106492c17c587b6eeb99fe3f43e522 upstream.

Without this change, the USB cable for Freestyle Option and compatible
glucometers will not be detected by the driver.

Signed-off-by: Diego Elio Pettenò 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/serial/ti_usb_3410_5052.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/usb/serial/ti_usb_3410_5052.c
+++ b/drivers/usb/serial/ti_usb_3410_5052.c
@@ -203,6 +203,7 @@ static struct usb_device_id ti_id_table_
{ USB_DEVICE(IBM_VENDOR_ID, IBM_454B_PRODUCT_ID) },
{ USB_DEVICE(IBM_VENDOR_ID, IBM_454C_PRODUCT_ID) },
{ USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_PRODUCT_ID) },
+   { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STRIP_PORT_ID) },
{ USB_DEVICE(TI_VENDOR_ID, FRI2_PRODUCT_ID) },
{ }
 };


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 51/54] serial: vt8500: add missing braces

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Roel Kluin 

commit d969de8d83401683420638c8107dcfedb2146f37 upstream.

Due to missing braces on an if statement, in presence of a device_node a
port was always assigned -1, regardless of any alias entries in the
device tree. Conversely, if device_node was NULL, an unitialized port
ended up being used.

This patch adds the missing braces, fixing the issues.

Signed-off-by: Roel Kluin 
Acked-by: Tony Prisk 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/tty/serial/vt8500_serial.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/tty/serial/vt8500_serial.c
+++ b/drivers/tty/serial/vt8500_serial.c
@@ -559,12 +559,13 @@ static int vt8500_serial_probe(struct pl
if (!mmres || !irqres)
return -ENODEV;
 
-   if (np)
+   if (np) {
port = of_alias_get_id(np, "serial");
if (port >= VT8500_MAX_PORTS)
port = -1;
-   else
+   } else {
port = -1;
+   }
 
if (port < 0) {
/* calculate the port id */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 35/54] inet: fix possible memory corruption with UDP_CORK and UFO

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Hannes Frederic Sowa 

[ This is a simplified -stable version of a set of upstream commits. ]

This is a replacement patch only for stable which does fix the problems
handled by the following two commits in -net:

"ip_output: do skb ufo init for peeked non ufo skb as well" 
(e93b7d748be887cd7639b113ba7d7ef792a7efb9)
"ip6_output: do skb ufo init for peeked non ufo skb as well" 
(c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b)

Three frames are written on a corked udp socket for which the output
netdevice has UFO enabled.  If the first and third frame are smaller than
the mtu and the second one is bigger, we enqueue the second frame with
skb_append_datato_frags without initializing the gso fields. This leads
to the third frame appended regulary and thus constructing an invalid skb.

This fixes the problem by always using skb_append_datato_frags as soon
as the first frag got enqueued to the skb without marking the packet
as SKB_GSO_UDP.

The problem with only two frames for ipv6 was fixed by "ipv6: udp
packets following an UFO enqueued packet need also be handled by UFO"
(2811ebac2521ceac84f2bdae402455baa6a7fb47).

Signed-off-by: Hannes Frederic Sowa 
Cc: Jiri Pirko 
Cc: Eric Dumazet 
Cc: David Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 include/linux/skbuff.h |5 +
 net/ipv4/ip_output.c   |2 +-
 net/ipv6/ip6_output.c  |2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1308,6 +1308,11 @@ static inline int skb_pagelen(const stru
return len + skb_headlen(skb);
 }
 
+static inline bool skb_has_frags(const struct sk_buff *skb)
+{
+   return skb_shinfo(skb)->nr_frags;
+}
+
 /**
  * __skb_fill_page_desc - initialise a paged fragment in an skb
  * @skb: buffer containing fragment to be initialised
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -844,7 +844,7 @@ static int __ip_append_data(struct sock
csummode = CHECKSUM_PARTIAL;
 
cork->length += length;
-   if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+   if (((length > mtu) || (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
err = ip_ufo_append_data(sk, queue, getfrag, from, length,
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1250,7 +1250,7 @@ int ip6_append_data(struct sock *sk, int
skb = skb_peek_tail(&sk->sk_write_queue);
cork->length += length;
if (((length > mtu) ||
-(skb && skb_is_gso(skb))) &&
+(skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO)) {
err = ip6_ufo_append_data(sk, getfrag, from, length,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 42/54] ARM: integrator: deactivate timer0 on the Integrator/CP

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Linus Walleij 

commit 29114fd7db2fc82a34da8340d29b8fa413e03dca upstream.

This fixes a long-standing Integrator/CP regression from
commit 870e2928cf3368ca9b06bc925d0027b0a56bcd8e
"ARM: integrator-cp: convert use CLKSRC_OF for timer init"

When this code was introduced, the both aliases pointing the
system to use timer1 as primary (clocksource) and timer2
as secondary (clockevent) was ignored, and the system would
simply use the first two timers found as clocksource and
clockevent.

However this made the system timeline accelerate by a
factor x25, as it turns out that the way the clocking
actually works (totally undocumented and found after some
trial-and-error) is that timer0 runs @ 25MHz and timer1
and timer2 runs @ 1MHz. Presumably this divider setting
is a boot-on default and configurable albeit the way to
configure it is not documented.

So as a quick fix to the problem, let's mark timer0 as
disabled, so the code will chose timer1 and timer2 as it
used to.

This also deletes the two aliases for the primary and
secondary timer as they have been superceded by the
auto-selection

Cc: Rob Herring 
Cc: Russell King 
Signed-off-by: Linus Walleij 
Signed-off-by: Olof Johansson 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/arm/boot/dts/integratorcp.dts |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/arch/arm/boot/dts/integratorcp.dts
+++ b/arch/arm/boot/dts/integratorcp.dts
@@ -9,11 +9,6 @@
model = "ARM Integrator/CP";
compatible = "arm,integrator-cp";
 
-   aliases {
-   arm,timer-primary = &timer2;
-   arm,timer-secondary = &timer1;
-   };
-
chosen {
bootargs = "root=/dev/ram0 console=ttyAMA0,38400n8 earlyprintk";
};
@@ -24,14 +19,18 @@
};
 
timer0: timer@1300 {
+   /* TIMER0 runs @ 25MHz */
compatible = "arm,integrator-cp-timer";
+   status = "disabled";
};
 
timer1: timer@13000100 {
+   /* TIMER1 runs @ 1MHz */
compatible = "arm,integrator-cp-timer";
};
 
timer2: timer@13000200 {
+   /* TIMER2 runs @ 1MHz */
compatible = "arm,integrator-cp-timer";
};
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 00/66] 3.11.7-stable review

2013-11-01 Thread Greg Kroah-Hartman
This is the start of the stable review cycle for the 3.11.7 release.
There are 66 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Nov  3 22:04:49 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.11.7-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
Linux 3.11.7-rc1

Enrico Mioso 
usb: serial: option: blacklist Olivetti Olicard200

Greg Kroah-Hartman 
USB: serial: option: add support for Inovia SEW858 device

Diego Elio Pettenò 
USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table 
as well.

Roel Kluin 
serial: vt8500: add missing braces

Solomon Peachy 
wireless: cw1200: acquire hwbus lock around cw1200_irq_handler() call.

Johannes Berg 
wireless: radiotap: fix parsing buffer overrun

Hans-Frieder Vogt 
w1 - call request_module with w1 master mutex unlocked

Fengguang Wu 
writeback: fix negative bdi max pause

David Henningsson 
ALSA: hda - Fix inverted internal mic not indicated on some machines

Takashi Iwai 
ALSA: us122l: Fix pcm_usb_stream mmapping regression

Hugh Dickins 
mm: fix BUG in __split_huge_page_pmd

Weijie Yang 
mm/zswap: bugfix: memory leak when re-swapon

Cyrill Gorcunov 
mm: migration: do not lose soft dirty bit if page is in migration state

James Ralston 
i2c: ismt: initialize DMA buffer

Mikulas Patocka 
dm snapshot: fix data corruption

Mika Westerberg 
gpio/lynxpoint: check if the interrupt is enabled in IRQ handler

Miklos Szeredi 
ext[34]: fix double put in tmpfile

Linus Walleij 
ARM: integrator: deactivate timer0 on the Integrator/CP

AKASHI Takahiro 
ARM: 7851/1: check for number of arguments in syscall_get/set_arguments()

Mariusz Ceier 
davinci_emac.c: Fix IFF_ALLMULTI setup

Hannes Frederic Sowa 
ipv6: probe routes asynchronous in rt6_probe

Julian Anastasov 
netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper

Julian Anastasov 
ipv6: fill rt6i_gateway with nexthop address

Julian Anastasov 
ipv6: always prefer rt6i_gateway if present

Hannes Frederic Sowa 
inet: fix possible memory corruption with UDP_CORK and UFO

Seif Mazareeb 
net: fix cipso packet validation when !NETLABEL

Daniel Borkmann 
net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race

Vasundhara Volam 
be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd

Salva Peiró 
wanxl: fix info leak in ioctl

Vlad Yasevich 
sctp: Perform software checksum if packet has to be fragmented.

Fan Du 
sctp: Use software crc32 checksum when xfrm transform will happen.

Vlad Yasevich 
net: dst: provide accessor function to dst->xfrm

Vlad Yasevich 
bridge: Correctly clamp MAX forward_delay when enabling STP

Jason Wang 
virtio-net: refill only when device is up during setting queues

Jason Wang 
virtio-net: don't respond to cpu hotplug notifier if we're not ready

Eric Dumazet 
bnx2x: record rx queue for LRO packets

Mathias Krause 
connector: use nlmsg_len() to check message length

Mathias Krause 
unix_diag: fix info leak

Salva Peiró 
farsync: fix info leak in ioctl

stephen hemminger 
netem: free skb's in tree on reset

stephen hemminger 
netem: update backlog after drop

Eric Dumazet 
l2tp: must disable bh before calling l2tp_xmit_skb()

Christophe Gouault 
vti: get rid of nf mark rule in prerouting

Linus Lüssing 
Revert "bridge: only expire the mdb entry when query is received"

Vlad Yasevich 
bridge: update mdb expiration timer upon reports.

Marc Kleine-Budde 
net: vlan: fix nlmsg size calculation in vlan_get_size()

Amir Vadai 
net/mlx4_en: Fix pages never dma unmapped on rx

Amir Vadai 
net/mlx4_en: Rename name of mlx4_en_rx_alloc members

Paul Durrant 
xen-netback: Don't destroy the netdev until the vif is shut down

Fabio Estevam 
net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not 
selected

Marc Kleine-Budde 
can: dev: fix nlmsg size calculation in can_get_size()

Jiri Benc 
ipv4: fix ineffective source address selection

Mathias Krause 
proc connector: fix info leaks

Willem de Bruijn 
sit: amend "allow to use rtnl ops on fb tunnel"

Dan Carpenter 
net: heap overflow in __audit_sockaddr()

Sebastian Hesselbarth 
net: mv643xx_eth: fix orphaned statistics timer crash

Sebastian Hesselbarth 
net: mv643xx_eth: update statistics timer from timer context only

David S. Miller 
l2tp: Fix build warning with ipv6 disabled.

François CACHEREUL 
l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses

Matthias Schiffer 
batman-adv: set up network coding packet handlers during

[PATCH 3.10 36/54] ipv6: always prefer rt6i_gateway if present

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Julian Anastasov 

[ Upstream commit 96dc809514fb2328605198a0602b67554d8cce7b ]

In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
such that the recently introduced rt6_nexthop() is used
instead of an assigned neighbor.

As rt6_nexthop() prefers rt6i_gateway only for gatewayed
routes this causes a problem for users like IPVS, xt_TEE and
RAW(hdrincl) if they want to use different address for routing
compared to the destination address.

Another case is when redirect can create RTF_DYNAMIC
route without RTF_GATEWAY flag, we ignore the rt6i_gateway
in rt6_nexthop().

Fix the above problems by considering the rt6i_gateway if
present, so that traffic routed to address on local subnet is
not wrongly diverted to the destination address.

Thanks to Simon Horman and Phil Oester for spotting the
problematic commit.

Thanks to Hannes Frederic Sowa for his review and help in testing.

Reported-by: Phil Oester 
Reported-by: Mark Brooks 
Signed-off-by: Julian Anastasov 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 include/net/ip6_route.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -196,7 +196,7 @@ static inline int ip6_skb_dst_mtu(struct
 
 static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct 
in6_addr *dest)
 {
-   if (rt->rt6i_flags & RTF_GATEWAY)
+   if (rt->rt6i_flags & RTF_GATEWAY || !ipv6_addr_any(&rt->rt6i_gateway))
return &rt->rt6i_gateway;
return dest;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 02/66] tcp: TSQ can use a dynamic limit

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit c9eeec26e32e087359160406f96e0949b3cc6f10 ]

When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 1p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
 rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
  2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
  2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet 
Cc: Wei Liu 
Cc: Cong Wang 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_output.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -892,8 +892,7 @@ static int tcp_transmit_skb(struct sock
 
skb_orphan(skb);
skb->sk = sk;
-   skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ?
- tcp_wfree : sock_wfree;
+   skb->destructor = tcp_wfree;
atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
/* Build TCP header and checksum it. */
@@ -1837,7 +1836,6 @@ static bool tcp_write_xmit(struct sock *
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
 
-
tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
BUG_ON(!tso_segs);
 
@@ -1866,13 +1864,20 @@ static bool tcp_write_xmit(struct sock *
break;
}
 
-   /* TSQ : sk_wmem_alloc accounts skb truesize,
-* including skb overhead. But thats OK.
+   /* TCP Small Queues :
+* Control number of packets in qdisc/devices to two packets / 
or ~1 ms.
+* This allows for :
+*  - better RTT estimation and ACK scheduling
+*  - faster recovery
+*  - high rates
 */
-   if (atomic_read(&sk->sk_wmem_alloc) >= 
sysctl_tcp_limit_output_bytes) {
+   limit = max(skb->truesize, sk->sk_pacing_rate >> 10);
+
+   if (atomic_read(&sk->sk_wmem_alloc) > limit) {
set_bit(TSQ_THROTTLED, &tp->tsq_flags);
break;
}
+
limit = mss_now;
if (tso_segs > 1 && !tcp_urg_mode(tp))
limit = tcp_mss_split_point(sk, skb, mss_now,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 01/66] tcp: TSO packets automatic sizing

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commits 6d36824e730f247b602c90e8715a792003e3c5a7,
  02cf4ebd82ff0ac7254b88e466820a290ed8289a, and parts of
  7eec4174ff29cd42f2acfae8112f51c228545d40 ]

After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Van Jacobson 
Cc: Tom Herbert 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/networking/ip-sysctl.txt |9 
 include/net/sock.h |2 +
 include/net/tcp.h  |1 
 net/core/sock.c|1 
 net/ipv4/sysctl_net_ipv4.c |   10 +
 net/ipv4/tcp.c |   28 ++-
 net/ipv4/tcp_input.c   |   34 -
 net/ipv4/tcp_output.c  |2 -
 8 files changed, 80 insertions(+), 7 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -478,6 +478,15 @@ tcp_syn_retries - INTEGER
 tcp_timestamps - BOOLEAN
Enable timestamps as defined in RFC1323.
 
+tcp_min_tso_segs - INTEGER
+   Minimal number of segments per TSO frame.
+   Since linux-3.12, TCP does an automatic sizing of TSO frames,
+   depending on flow rate, instead of filling 64Kbytes packets.
+   For specific usages, it's possible to force TCP to build big
+   TSO frames. Note that TCP stack might split too big TSO packets
+   if available window is too small.
+   Default: 2
+
 tcp_tso_win_divisor - INTEGER
This allows control over what percentage of the congestion window
can be consumed by a single TSO frame.
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -232,6 +232,7 @@ struct cg_proto;
   *@sk_napi_id: id of the last napi context to receive data for sk
   *@sk_ll_usec: usecs to busypoll when there is no data
   *@sk_allocation: allocation mode
+  *@sk_pacing_rate: Pacing rate (if supported by transport/packet 
scheduler)
   *@sk_sndbuf: size of send buffer in bytes
   *@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
   *   %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
@@ -361,6 +362,7 @@ struct sock {
kmemcheck_bitfield_end(flags);
int sk_wmem_queued;
gfp_t   sk_allocation;
+   u32 sk_pacing_rate; /* bytes per second */
netdev_features_t   sk_route_caps;
netdev_features_t   sk_route_nocaps;
int sk_gso_type;
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -284,6 +284,7 @@ extern int sysctl_tcp_thin_dupack;
 extern int sysctl_tcp_early_retrans;
 extern int sysctl_tcp_limit_output_bytes;
 extern int sysctl_tcp_challenge_ack_limit;
+extern int sysctl_tcp_min_tso_segs;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2297,6 +2297,7 @@ void sock_init_data(struct socket *sock,
sk->sk_ll_usec  =   sysctl_net_busy_read;
 #endif
 
+   sk->sk_pacing_rate = ~0U;
/*
 * Before updating sk_refcnt, we must commit prior changes to memory
 * (Documentation/RCU/rculist_nulls.txt for details)
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_

[PATCH 3.11 04/66] tcp: do not forget FIN in tcp_shifted_skb()

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 5e8a402f831dbe7ee831340a91439e46f0d38acd ]

Yuchung found following problem :

 There are bugs in the SACK processing code, merging part in
 tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
 skbs FIN flag. When a receiver first SACK the FIN sequence, and later
 throw away ofo queue (e.g., sack-reneging), the sender will stop
 retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 < S 0:0(0) win 32792 
+.000 > S. 0:0(0) ack 1 
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(1) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257 
+.050 < . 1:1(0) ack 2001 win 257 
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%

First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd076ab
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Cc: Ilpo Järvinen 
Acked-by: Ilpo Järvinen 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_input.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1279,7 +1279,10 @@ static bool tcp_shifted_skb(struct sock
tp->lost_cnt_hint -= tcp_skb_pcount(prev);
}
 
-   TCP_SKB_CB(skb)->tcp_flags |= TCP_SKB_CB(prev)->tcp_flags;
+   TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;
+   if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+   TCP_SKB_CB(prev)->end_seq++;
+
if (skb == tcp_highest_sack(sk))
tcp_advance_highest_sack(sk, skb);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 03/66] tcp: must unclone packets before mangling them

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit c52e2421f7368fd36cbe330d2cf41b10452e39a9 ]

TCP stack should make sure it owns skbs before mangling them.

We had various crashes using bnx2x, and it turned out gso_size
was cleared right before bnx2x driver was populating TC descriptor
of the _previous_ packet send. TCP stack can sometime retransmit
packets that are still in Qdisc.

Of course we could make bnx2x driver more robust (using
ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack.

We have identified two points where skb_unclone() was needed.

This patch adds a WARN_ON_ONCE() to warn us if we missed another
fix of this kind.

Kudos to Neal for finding the root cause of this bug. Its visible
using small MSS.

Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_output.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -981,6 +981,9 @@ static void tcp_queue_skb(struct sock *s
 static void tcp_set_skb_tso_segs(const struct sock *sk, struct sk_buff *skb,
 unsigned int mss_now)
 {
+   /* Make sure we own this skb before messing gso_size/gso_segs */
+   WARN_ON_ONCE(skb_cloned(skb));
+
if (skb->len <= mss_now || !sk_can_gso(sk) ||
skb->ip_summed == CHECKSUM_NONE) {
/* Avoid the costly divide in the normal
@@ -1062,9 +1065,7 @@ int tcp_fragment(struct sock *sk, struct
if (nsize < 0)
nsize = 0;
 
-   if (skb_cloned(skb) &&
-   skb_is_nonlinear(skb) &&
-   pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+   if (skb_unclone(skb, GFP_ATOMIC))
return -ENOMEM;
 
/* Get a new skb... force flag on. */
@@ -2339,6 +2340,8 @@ int __tcp_retransmit_skb(struct sock *sk
int oldpcount = tcp_skb_pcount(skb);
 
if (unlikely(oldpcount > 1)) {
+   if (skb_unclone(skb, GFP_ATOMIC))
+   return -ENOMEM;
tcp_init_tso_segs(sk, skb, cur_mss);
tcp_adjust_pcount(sk, skb, oldpcount - 
tcp_skb_pcount(skb));
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.10 43/54] gpio/lynxpoint: check if the interrupt is enabled in IRQ handler

2013-11-01 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: Mika Westerberg 

commit 03d152d5582abc8a1c19cb107164c3724bbd4be4 upstream.

Checking LP_INT_STAT is not enough in the interrupt handler because its
contents get updated regardless of whether the pin has interrupt enabled or
not. This causes the driver to loop forever for GPIOs that are pulled up.

Fix this by checking the interrupt enable bit for the pin as well.

Signed-off-by: Mika Westerberg 
Acked-by: Mathias Nyman 
Signed-off-by: Linus Walleij 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/gpio/gpio-lynxpoint.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/gpio/gpio-lynxpoint.c
+++ b/drivers/gpio/gpio-lynxpoint.c
@@ -248,14 +248,15 @@ static void lp_gpio_irq_handler(unsigned
struct lp_gpio *lg = irq_data_get_irq_handler_data(data);
struct irq_chip *chip = irq_data_get_irq_chip(data);
u32 base, pin, mask;
-   unsigned long reg, pending;
+   unsigned long reg, ena, pending;
unsigned virq;
 
/* check from GPIO controller which pin triggered the interrupt */
for (base = 0; base < lg->chip.ngpio; base += 32) {
reg = lp_gpio_reg(&lg->chip, base, LP_INT_STAT);
+   ena = lp_gpio_reg(&lg->chip, base, LP_INT_ENABLE);
 
-   while ((pending = inl(reg))) {
+   while ((pending = (inl(reg) & inl(ena {
pin = __ffs(pending);
mask = BIT(pin);
/* Clear before handling so we don't lose an edge */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 06/66] net: do not call sock_put() on TIMEWAIT sockets

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ]

commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.

We should instead use inet_twsk_put()

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/inet_hashtables.c  |2 +-
 net/ipv6/inet6_hashtables.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -287,7 +287,7 @@ begintw:
if (unlikely(!INET_TW_MATCH(sk, net, acookie,
saddr, daddr, ports,
dif))) {
-   sock_put(sk);
+   inet_twsk_put(inet_twsk(sk));
goto begintw;
}
goto out;
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -116,7 +116,7 @@ begintw:
}
if (unlikely(!INET6_TW_MATCH(sk, net, saddr, daddr,
 ports, dif))) {
-   sock_put(sk);
+   inet_twsk_put(inet_twsk(sk));
goto begintw;
}
goto out;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 07/66] batman-adv: set up network coding packet handlers during module init

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Matthias Schiffer 

[ Upstream commit 6c519bad7b19a2c14a075b400edabaa630330123 ]

batman-adv saves its table of packet handlers as a global state, so handlers
must be set up only once (and setting them up a second time will fail).

The recently-added network coding support tries to set up its handler each time
a new softif is registered, which obviously fails when more that one softif is
used (and in consequence, the softif creation fails).

Fix this by splitting up batadv_nc_init into batadv_nc_init (which is called
only once) and batadv_nc_mesh_init (which is called for each softif); in
addition batadv_nc_free is renamed to batadv_nc_mesh_free to keep naming
consistent.

Signed-off-by: Matthias Schiffer 
Signed-off-by: Marek Lindner 
Signed-off-by: Antonio Quartulli 
Signed-off-by: Greg Kroah-Hartman 
---
 net/batman-adv/main.c   |5 +++--
 net/batman-adv/network-coding.c |   28 ++--
 net/batman-adv/network-coding.h |   14 ++
 3 files changed, 31 insertions(+), 16 deletions(-)

--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -61,6 +61,7 @@ static int __init batadv_init(void)
batadv_recv_handler_init();
 
batadv_iv_init();
+   batadv_nc_init();
 
batadv_event_workqueue = create_singlethread_workqueue("bat_events");
 
@@ -138,7 +139,7 @@ int batadv_mesh_init(struct net_device *
if (ret < 0)
goto err;
 
-   ret = batadv_nc_init(bat_priv);
+   ret = batadv_nc_mesh_init(bat_priv);
if (ret < 0)
goto err;
 
@@ -163,7 +164,7 @@ void batadv_mesh_free(struct net_device
batadv_vis_quit(bat_priv);
 
batadv_gw_node_purge(bat_priv);
-   batadv_nc_free(bat_priv);
+   batadv_nc_mesh_free(bat_priv);
batadv_dat_free(bat_priv);
batadv_bla_free(bat_priv);
 
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -35,6 +35,20 @@ static int batadv_nc_recv_coded_packet(s
   struct batadv_hard_iface *recv_if);
 
 /**
+ * batadv_nc_init - one-time initialization for network coding
+ */
+int __init batadv_nc_init(void)
+{
+   int ret;
+
+   /* Register our packet type */
+   ret = batadv_recv_handler_register(BATADV_CODED,
+  batadv_nc_recv_coded_packet);
+
+   return ret;
+}
+
+/**
  * batadv_nc_start_timer - initialise the nc periodic worker
  * @bat_priv: the bat priv with all the soft interface information
  */
@@ -45,10 +59,10 @@ static void batadv_nc_start_timer(struct
 }
 
 /**
- * batadv_nc_init - initialise coding hash table and start house keeping
+ * batadv_nc_mesh_init - initialise coding hash table and start house keeping
  * @bat_priv: the bat priv with all the soft interface information
  */
-int batadv_nc_init(struct batadv_priv *bat_priv)
+int batadv_nc_mesh_init(struct batadv_priv *bat_priv)
 {
bat_priv->nc.timestamp_fwd_flush = jiffies;
bat_priv->nc.timestamp_sniffed_purge = jiffies;
@@ -70,11 +84,6 @@ int batadv_nc_init(struct batadv_priv *b
batadv_hash_set_lock_class(bat_priv->nc.coding_hash,
   &batadv_nc_decoding_hash_lock_class_key);
 
-   /* Register our packet type */
-   if (batadv_recv_handler_register(BATADV_CODED,
-batadv_nc_recv_coded_packet) < 0)
-   goto err;
-
INIT_DELAYED_WORK(&bat_priv->nc.work, batadv_nc_worker);
batadv_nc_start_timer(bat_priv);
 
@@ -1721,12 +1730,11 @@ free_nc_packet:
 }
 
 /**
- * batadv_nc_free - clean up network coding memory
+ * batadv_nc_mesh_free - clean up network coding memory
  * @bat_priv: the bat priv with all the soft interface information
  */
-void batadv_nc_free(struct batadv_priv *bat_priv)
+void batadv_nc_mesh_free(struct batadv_priv *bat_priv)
 {
-   batadv_recv_handler_unregister(BATADV_CODED);
cancel_delayed_work_sync(&bat_priv->nc.work);
 
batadv_nc_purge_paths(bat_priv, bat_priv->nc.coding_hash, NULL);
--- a/net/batman-adv/network-coding.h
+++ b/net/batman-adv/network-coding.h
@@ -22,8 +22,9 @@
 
 #ifdef CONFIG_BATMAN_ADV_NC
 
-int batadv_nc_init(struct batadv_priv *bat_priv);
-void batadv_nc_free(struct batadv_priv *bat_priv);
+int batadv_nc_init(void);
+int batadv_nc_mesh_init(struct batadv_priv *bat_priv);
+void batadv_nc_mesh_free(struct batadv_priv *bat_priv);
 void batadv_nc_update_nc_node(struct batadv_priv *bat_priv,
  struct batadv_orig_node *orig_node,
  struct batadv_orig_node *orig_neigh_node,
@@ -46,12 +47,17 @@ int batadv_nc_init_debugfs(struct batadv
 
 #else /* ifdef CONFIG_BATMAN_ADV_NC */
 
-static inline int batadv_nc_init(struct batadv_priv *bat_priv)
+static inline int batadv_nc_init(void)
 {
return 0;
 }
 
-static inline void 

Re: linux-next: manual merge of the dt-rh tree with the powerpc tree

2013-11-01 Thread Stephen Rothwell
Hi Rob,

On Fri, 01 Nov 2013 17:24:42 -0500 Rob Herring  wrote:
>
> On 11/01/2013 12:20 AM, Stephen Rothwell wrote:
> > 
> > Today's linux-next merge of the dt-rh tree got a conflict in 
> > arch/powerpc/include/asm/prom.h between commit a3e31b458844 ("of:
> > Move definition of of_find_next_cache_node into common code") from
> > the powerpc tree and commit 0c3f061c195c ("of: implement
> > of_node_to_nid as a weak function") from the dt-rh tree.
> 
> Ben, I can pick these 2 patches up instead if you want to drop them
> and avoid the conflict.

The conflict is pretty trivial and so should not require any action.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpO_Yz_1UoSt.pgp
Description: PGP signature


[PATCH 3.11 09/66] l2tp: Fix build warning with ipv6 disabled.

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: "David S. Miller" 

[ Upstream commit 8d8a51e26a6d415e1470759f2cf5f3ee3ee86196 ]

net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’:
net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ 
[-Wunused-variable]

Create a helper "l2tp_tunnel()" to facilitate this, and as a side
effect get rid of a bunch of unnecessary void pointer casts.

Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/l2tp/l2tp_core.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -115,6 +115,11 @@ struct l2tp_net {
 static void l2tp_session_set_header_len(struct l2tp_session *session, int 
version);
 static void l2tp_tunnel_free(struct l2tp_tunnel *tunnel);
 
+static inline struct l2tp_tunnel *l2tp_tunnel(struct sock *sk)
+{
+   return sk->sk_user_data;
+}
+
 static inline struct l2tp_net *l2tp_pernet(struct net *net)
 {
BUG_ON(!net);
@@ -496,7 +501,6 @@ out:
 static inline int l2tp_verify_udp_checksum(struct sock *sk,
   struct sk_buff *skb)
 {
-   struct l2tp_tunnel *tunnel = (struct l2tp_tunnel *)sk->sk_user_data;
struct udphdr *uh = udp_hdr(skb);
u16 ulen = ntohs(uh->len);
__wsum psum;
@@ -505,7 +509,7 @@ static inline int l2tp_verify_udp_checks
return 0;
 
 #if IS_ENABLED(CONFIG_IPV6)
-   if (sk->sk_family == PF_INET6 && !tunnel->v4mapped) {
+   if (sk->sk_family == PF_INET6 && !l2tp_tunnel(sk)->v4mapped) {
if (!uh->check) {
LIMIT_NETDEBUG(KERN_INFO "L2TP: IPv6: checksum is 0\n");
return 1;
@@ -1305,10 +1309,9 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
  */
 static void l2tp_tunnel_destruct(struct sock *sk)
 {
-   struct l2tp_tunnel *tunnel;
+   struct l2tp_tunnel *tunnel = l2tp_tunnel(sk);
struct l2tp_net *pn;
 
-   tunnel = sk->sk_user_data;
if (tunnel == NULL)
goto end;
 
@@ -1676,7 +1679,7 @@ int l2tp_tunnel_create(struct net *net,
}
 
/* Check if this socket has already been prepped */
-   tunnel = (struct l2tp_tunnel *)sk->sk_user_data;
+   tunnel = l2tp_tunnel(sk);
if (tunnel != NULL) {
/* This socket has already been prepped */
err = -EBUSY;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 10/66] net: mv643xx_eth: update statistics timer from timer context only

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Sebastian Hesselbarth 

[ Upstream commit 041b4ddb84989f06ff1df0ca869b950f1ee3cb1c ]

Each port driver installs a periodic timer to update port statistics
by calling mib_counters_update. As mib_counters_update is also called
from non-timer context, we should not reschedule the timer there but
rather move it to timer-only context.

Signed-off-by: Sebastian Hesselbarth 
Acked-by: Jason Cooper 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/ethernet/marvell/mv643xx_eth.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -1131,15 +1131,13 @@ static void mib_counters_update(struct m
p->rx_discard += rdlp(mp, RX_DISCARD_FRAME_CNT);
p->rx_overrun += rdlp(mp, RX_OVERRUN_FRAME_CNT);
spin_unlock_bh(&mp->mib_counters_lock);
-
-   mod_timer(&mp->mib_counters_timer, jiffies + 30 * HZ);
 }
 
 static void mib_counters_timer_wrapper(unsigned long _mp)
 {
struct mv643xx_eth_private *mp = (void *)_mp;
-
mib_counters_update(mp);
+   mod_timer(&mp->mib_counters_timer, jiffies + 30 * HZ);
 }
 
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.11 15/66] ipv4: fix ineffective source address selection

2013-11-01 Thread Greg Kroah-Hartman
3.11-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Benc 

[ Upstream commit 0a7e22609067ff524fc7bbd45c6951dd08561667 ]

When sending out multicast messages, the source address in inet->mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").

Signed-off-by: Jiri Benc 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/route.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2074,7 +2074,7 @@ struct rtable *__ip_route_output_key(str
  RT_SCOPE_LINK);
goto make_route;
}
-   if (fl4->saddr) {
+   if (!fl4->saddr) {
if (ipv4_is_multicast(fl4->daddr))
fl4->saddr = inet_select_addr(dev_out, 0,
  
fl4->flowi4_scope);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >