date:20170407

Re: [PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions

2017-04-07 Thread kbuild test robot

Hi Florian,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-dsa-Receive-path-simplifications/20170408-074503
config: tile-allmodconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

>> net//dsa/tag_mtk.c:117:2: warning: initialization from incompatible pointer 
>> type [enabled by default]
   net//dsa/tag_mtk.c:117:2: warning: (near initialization for 
'mtk_netdev_ops.rcv') [enabled by default]

vim +117 net//dsa/tag_mtk.c

5cd8985a Sean Wang 2017-04-07  101  
5cd8985a Sean Wang 2017-04-07  102  skb->dev->stats.rx_packets++;
5cd8985a Sean Wang 2017-04-07  103  skb->dev->stats.rx_bytes += skb->len;
5cd8985a Sean Wang 2017-04-07  104  
5cd8985a Sean Wang 2017-04-07  105  netif_receive_skb(skb);
5cd8985a Sean Wang 2017-04-07  106  
5cd8985a Sean Wang 2017-04-07  107  return 0;
5cd8985a Sean Wang 2017-04-07  108  
5cd8985a Sean Wang 2017-04-07  109  out_drop:
5cd8985a Sean Wang 2017-04-07  110  kfree_skb(skb);
5cd8985a Sean Wang 2017-04-07  111  out:
5cd8985a Sean Wang 2017-04-07  112  return 0;
5cd8985a Sean Wang 2017-04-07  113  }
5cd8985a Sean Wang 2017-04-07  114  
5cd8985a Sean Wang 2017-04-07  115  const struct dsa_device_ops mtk_netdev_ops 
= {
5cd8985a Sean Wang 2017-04-07  116  .xmit   = mtk_tag_xmit,
5cd8985a Sean Wang 2017-04-07 @117  .rcv= mtk_tag_rcv,
5cd8985a Sean Wang 2017-04-07  118  };

:: The code at line 117 was first introduced by commit
:: 5cd8985a19090f2b0ce8700ae3ec19e06a4fc5e9 net-next: dsa: add Mediatek tag 
RX/TX handler

:: TO: Sean Wang 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions

2017-04-07 Thread kbuild test robot

Hi Florian,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-dsa-Receive-path-simplifications/20170408-074503
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

>> net//dsa/tag_mtk.c:117:9: error: initialization from incompatible pointer 
>> type [-Werror=incompatible-pointer-types]
 .rcv = mtk_tag_rcv,
^~~
   net//dsa/tag_mtk.c:117:9: note: (near initialization for 
'mtk_netdev_ops.rcv')
   cc1: some warnings being treated as errors

vim +117 net//dsa/tag_mtk.c

5cd8985a Sean Wang 2017-04-07  101  
5cd8985a Sean Wang 2017-04-07  102  skb->dev->stats.rx_packets++;
5cd8985a Sean Wang 2017-04-07  103  skb->dev->stats.rx_bytes += skb->len;
5cd8985a Sean Wang 2017-04-07  104  
5cd8985a Sean Wang 2017-04-07  105  netif_receive_skb(skb);
5cd8985a Sean Wang 2017-04-07  106  
5cd8985a Sean Wang 2017-04-07  107  return 0;
5cd8985a Sean Wang 2017-04-07  108  
5cd8985a Sean Wang 2017-04-07  109  out_drop:
5cd8985a Sean Wang 2017-04-07  110  kfree_skb(skb);
5cd8985a Sean Wang 2017-04-07  111  out:
5cd8985a Sean Wang 2017-04-07  112  return 0;
5cd8985a Sean Wang 2017-04-07  113  }
5cd8985a Sean Wang 2017-04-07  114  
5cd8985a Sean Wang 2017-04-07  115  const struct dsa_device_ops mtk_netdev_ops 
= {
5cd8985a Sean Wang 2017-04-07  116  .xmit   = mtk_tag_xmit,
5cd8985a Sean Wang 2017-04-07 @117  .rcv= mtk_tag_rcv,
5cd8985a Sean Wang 2017-04-07  118  };

:: The code at line 117 was first introduced by commit
:: 5cd8985a19090f2b0ce8700ae3ec19e06a4fc5e9 net-next: dsa: add Mediatek tag 
RX/TX handler

:: TO: Sean Wang 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe

2017-04-07 Thread Deepa Dinamani

>> - trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld",
>> + trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu 
>> ts:%lld.%09ld",
>>field->seqnum,
>>field->duration,
>>field->outer_duration,
>> -  field->timestamp.tv_sec,
>> +  (long long)field->timestamp.tv_sec,
>
> Refresh my memory. We need the cast because on 64 bit boxes
> timestamp.tv_sec is just a long?

This is only required until we change the definition of timespec64.
Right now it is defined as

#if __BITS_PER_LONG == 64
# define timespec64 timespec
#else
struct timespec64 {
time64_t tv_sec;
long tv_nsec;
};
#endif

And timespec.tv_sec is just long int on 64 bit machines.
This is why we need the cast now.

We will probably change this and only define __kernel_timespec instead
of timespec, leaving only one definition of timespec64.
At that time, we will not need this.

-Deepa

Re: [PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe

2017-04-07 Thread Steven Rostedt

On Fri,  7 Apr 2017 17:57:00 -0700
Deepa Dinamani  wrote:

> struct timespec is not y2038 safe on 32 bit machines
> and needs to be replaced by struct timespec64
> in order to represent times beyond year 2038 on such
> machines.
> 
> Fix all the timestamp representation in struct trace_hwlat
> and all the corresponding implementations.
> 

> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 02a4aeb..08f9bab 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -4,7 +4,6 @@
>   * Copyright (C) 2008 Red Hat Inc, Steven Rostedt 
>   *
>   */
> -
>  #include 
>  #include 
>  #include 
> @@ -1161,11 +1160,11 @@ trace_hwlat_print(struct trace_iterator *iter, int 
> flags,
>  
>   trace_assign_type(field, entry);
>  
> - trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld",
> + trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%lld.%09ld",
>field->seqnum,
>field->duration,
>field->outer_duration,
> -  field->timestamp.tv_sec,
> +  (long long)field->timestamp.tv_sec,

Refresh my memory. We need the cast because on 64 bit boxes
timestamp.tv_sec is just a long?

Other than that.

Reviewed-by: Steven Rostedt (VMware) 

-- Steve

>field->timestamp.tv_nsec);
>  
>   if (field->nmi_count) {
> @@ -1195,10 +1194,10 @@ trace_hwlat_raw(struct trace_iterator *iter, int 
> flags,
>  
>   trace_assign_type(field, iter->ent);
>  
> - trace_seq_printf(s, "%llu %lld %ld %09ld %u\n",
> + trace_seq_printf(s, "%llu %lld %lld %09ld %u\n",
>field->duration,
>field->outer_duration,
> -  field->timestamp.tv_sec,
> +  (long long)field->timestamp.tv_sec,
>field->timestamp.tv_nsec,
>field->seqnum);
>

[PATCH 05/12] fs: ufs: Use ktime_get_real_ts64() for birthtime

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME is not y2038 safe.
Replace it with ktime_get_real_ts64().
Inode time formats are already 64 bit long and
accommodates time64_t.

Signed-off-by: Deepa Dinamani 
---
 fs/ufs/ialloc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ufs/ialloc.c b/fs/ufs/ialloc.c
index 9774555..d1dd8cc 100644
--- a/fs/ufs/ialloc.c
+++ b/fs/ufs/ialloc.c
@@ -176,6 +176,7 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode)
struct ufs_cg_private_info * ucpi;
struct ufs_cylinder_group * ucg;
struct inode * inode;
+   struct timespec64 ts;
unsigned cg, bit, i, j, start;
struct ufs_inode_info *ufsi;
int err = -ENOSPC;
@@ -323,8 +324,9 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode)
lock_buffer(bh);
ufs2_inode = (struct ufs2_inode *)bh->b_data;
ufs2_inode += ufs_inotofsbo(inode->i_ino);
-   ufs2_inode->ui_birthtime = cpu_to_fs64(sb, CURRENT_TIME.tv_sec);
-   ufs2_inode->ui_birthnsec = cpu_to_fs32(sb, 
CURRENT_TIME.tv_nsec);
+   ktime_get_real_ts64(&ts);
+   ufs2_inode->ui_birthtime = cpu_to_fs64(sb, ts.tv_sec);
+   ufs2_inode->ui_birthnsec = cpu_to_fs32(sb, ts.tv_nsec);
mark_buffer_dirty(bh);
unlock_buffer(bh);
if (sb->s_flags & MS_SYNCHRONOUS)
-- 
2.7.4

[PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe

2017-04-07 Thread Deepa Dinamani

struct timespec is not y2038 safe on 32 bit machines
and needs to be replaced by struct timespec64
in order to represent times beyond year 2038 on such
machines.

Fix all the timestamp representation in struct trace_hwlat
and all the corresponding implementations.

Signed-off-by: Deepa Dinamani 
---
 kernel/trace/trace_entries.h |  6 +++---
 kernel/trace/trace_hwlat.c   | 14 +++---
 kernel/trace/trace_output.c  |  9 -
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index c203ac4..adcdbbe 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -348,14 +348,14 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
__field(u64,duration)
__field(u64,outer_duration  )
__field(u64,nmi_total_ts)
-   __field_struct( struct timespec,timestamp   )
-   __field_desc(   long,   timestamp,  tv_sec  )
+   __field_struct( struct timespec64,  timestamp   )
+   __field_desc(   s64,timestamp,  tv_sec  )
__field_desc(   long,   timestamp,  tv_nsec )
__field(unsigned int,   nmi_count   )
__field(unsigned int,   seqnum  )
),
 
-   
F_printk("cnt:%u\tts:%010lu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n",
+   
F_printk("cnt:%u\tts:%010llu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n",
 __entry->seqnum,
 __entry->tv_sec,
 __entry->tv_nsec,
diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 21ea6ae..d7c8e4e 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -79,12 +79,12 @@ static u64 last_tracing_thresh = DEFAULT_LAT_THRESHOLD * 
NSEC_PER_USEC;
 
 /* Individual latency samples are stored here when detected. */
 struct hwlat_sample {
-   u64 seqnum; /* unique sequence */
-   u64 duration;   /* delta */
-   u64 outer_duration; /* delta (outer loop) */
-   u64 nmi_total_ts;   /* Total time spent in NMIs */
-   struct timespec timestamp;  /* wall time */
-   int nmi_count;  /* # NMIs during this sample */
+   u64 seqnum; /* unique sequence */
+   u64 duration;   /* delta */
+   u64 outer_duration; /* delta (outer loop) */
+   u64 nmi_total_ts;   /* Total time spent in NMIs */
+   struct timespec64   timestamp;  /* wall time */
+   int nmi_count;  /* # NMIs during this sample */
 };
 
 /* keep the global state somewhere. */
@@ -250,7 +250,7 @@ static int get_sample(void)
s.seqnum = hwlat_data.count;
s.duration = sample;
s.outer_duration = outer_sample;
-   s.timestamp = CURRENT_TIME;
+   ktime_get_real_ts64(&s.timestamp);
s.nmi_total_ts = nmi_total_ts;
s.nmi_count = nmi_count;
trace_hwlat_sample(&s);
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 02a4aeb..08f9bab 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -4,7 +4,6 @@
  * Copyright (C) 2008 Red Hat Inc, Steven Rostedt 
  *
  */
-
 #include 
 #include 
 #include 
@@ -1161,11 +1160,11 @@ trace_hwlat_print(struct trace_iterator *iter, int 
flags,
 
trace_assign_type(field, entry);
 
-   trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld",
+   trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%lld.%09ld",
 field->seqnum,
 field->duration,
 field->outer_duration,
-field->timestamp.tv_sec,
+(long long)field->timestamp.tv_sec,
 field->timestamp.tv_nsec);
 
if (field->nmi_count) {
@@ -1195,10 +1194,10 @@ trace_hwlat_raw(struct trace_iterator *iter, int flags,
 
trace_assign_type(field, iter->ent);
 
-   trace_seq_printf(s, "%llu %lld %ld %09ld %u\n",
+   trace_seq_printf(s, "%llu %lld %lld %09ld %u\n",
 field->duration,
 field->outer_duration,
-field->timestamp.tv_sec,
+(long long)field->timestamp.tv_sec,
 field->timestamp.tv_nsec,
 field->seqnum);
 
-- 
2.7.4

[PATCH 10/12] apparmorfs: Replace CURRENT_TIME with current_time()

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by
current_time().

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. current_time() is also planned to be transitioned
to y2038 safe behavior along with this change.

CURRENT_TIME macro will be deleted before merging the
aforementioned change.

Signed-off-by: Deepa Dinamani 
---
 security/apparmor/apparmorfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index be0b498..4f6ac9d 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -1357,7 +1357,7 @@ static int aa_mk_null_file(struct dentry *parent)
 
inode->i_ino = get_next_ino();
inode->i_mode = S_IFCHR | S_IRUGO | S_IWUGO;
-   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
init_special_inode(inode, S_IFCHR | S_IRUGO | S_IWUGO,
   MKDEV(MEM_MAJOR, 3));
d_instantiate(dentry, inode);
-- 
2.7.4

[PATCH 11/12] time: Delete CURRENT_TIME_SEC and CURRENT_TIME

2017-04-07 Thread Deepa Dinamani

All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have
been replaced by other time functions. These macros are
also not y2038 safe.
And, all their use cases can be fulfilled by y2038 safe
ktime_get_* variants.

Signed-off-by: Deepa Dinamani 
Acked-by: John Stultz 
Reviewed-by: Arnd Bergmann 
---
 include/linux/time.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 23f0f5c..c0543f5 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -151,9 +151,6 @@ static inline bool timespec_inject_offset_valid(const 
struct timespec *ts)
return true;
 }
 
-#define CURRENT_TIME   (current_kernel_time())
-#define CURRENT_TIME_SEC   ((struct timespec) { get_seconds(), 0 })
-
 /* Some architectures do not supply their own clocksource.
  * This is mainly the case in architectures that get their
  * inter-tick times by reading the counter on their interval
-- 
2.7.4

[PATCH 09/12] lustre: Replace CURRENT_TIME macro

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by
current_time() for filesystem times, and ktime_get_*
functions for others.

struct timespec is also not y2038 safe.
Retain timespec for timestamp representation here as lustre
uses it internally everywhere.
These references will be changed to use struct timespec64
in a separate patch.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. current_time() is also planned to be transitioned
to y2038 safe behavior along with this change.

CURRENT_TIME macro will be deleted before merging the
aforementioned change.

Signed-off-by: Deepa Dinamani 
---
 drivers/staging/lustre/lustre/llite/llite_lib.c | 6 +++---
 drivers/staging/lustre/lustre/osc/osc_io.c  | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 7b80040..2b4b6b9 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1472,17 +1472,17 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr 
*attr, bool hsm_import)
 
/* We mark all of the fields "set" so MDS/OST does not re-set them */
if (attr->ia_valid & ATTR_CTIME) {
-   attr->ia_ctime = CURRENT_TIME;
+   attr->ia_ctime = current_time(inode);
attr->ia_valid |= ATTR_CTIME_SET;
}
if (!(attr->ia_valid & ATTR_ATIME_SET) &&
(attr->ia_valid & ATTR_ATIME)) {
-   attr->ia_atime = CURRENT_TIME;
+   attr->ia_atime = current_time(inode);
attr->ia_valid |= ATTR_ATIME_SET;
}
if (!(attr->ia_valid & ATTR_MTIME_SET) &&
(attr->ia_valid & ATTR_MTIME)) {
-   attr->ia_mtime = CURRENT_TIME;
+   attr->ia_mtime = current_time(inode);
attr->ia_valid |= ATTR_MTIME_SET;
}
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c 
b/drivers/staging/lustre/lustre/osc/osc_io.c
index f991bee..cbab800 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -216,7 +216,7 @@ static int osc_io_submit(const struct lu_env *env,
struct cl_object *obj = ios->cis_obj;
 
cl_object_attr_lock(obj);
-   attr->cat_mtime = LTIME_S(CURRENT_TIME);
+   attr->cat_mtime = ktime_get_real_seconds();
attr->cat_ctime = attr->cat_mtime;
cl_object_attr_update(env, obj, attr, CAT_MTIME | CAT_CTIME);
cl_object_attr_unlock(obj);
@@ -256,7 +256,7 @@ static void osc_page_touch_at(const struct lu_env *env,
   kms > loi->loi_kms ? "" : "not ", loi->loi_kms, kms,
   loi->loi_lvb.lvb_size);
 
-   attr->cat_ctime = LTIME_S(CURRENT_TIME);
+   attr->cat_ctime = ktime_get_real_seconds();
attr->cat_mtime = attr->cat_ctime;
valid = CAT_MTIME | CAT_CTIME;
if (kms > loi->loi_kms) {
-- 
2.7.4

[PATCH 07/12] fs: btrfs: Use ktime_get_real_ts for root ctime

2017-04-07 Thread Deepa Dinamani

btrfs_root_item maintains the ctime for root updates.
This is not part of vfs_inode.

Since current_time() uses struct inode* as an argument
as Linus suggested, this cannot be used to update root
times unless, we modify the signature to use inode.

Since btrfs uses nanosecond time granularity, it can also
use ktime_get_real_ts directly to obtain timestamp for
the root. It is necessary to use the timespec time api
here because the same btrfs_set_stack_timespec_*() apis
are used for vfs inode times as well. These can be
transitioned to using timespec64 when btrfs internally
changes to use timespec64 as well.

Signed-off-by: Deepa Dinamani 
Acked-by: David Sterba 
Reviewed-by: Arnd Bergmann 
---
 fs/btrfs/root-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index a08224e..7d6bc30 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -501,8 +501,9 @@ void btrfs_update_root_times(struct btrfs_trans_handle 
*trans,
 struct btrfs_root *root)
 {
struct btrfs_root_item *item = &root->root_item;
-   struct timespec ct = current_fs_time(root->fs_info->sb);
+   struct timespec ct;
 
+   ktime_get_real_ts(&ct);
spin_lock(&root->root_item_lock);
btrfs_set_root_ctransid(item, trans->transid);
btrfs_set_stack_timespec_sec(&item->ctime, ct.tv_sec);
-- 
2.7.4

[PATCH 12/12] time: Delete current_fs_time() function

2017-04-07 Thread Deepa Dinamani

All uses of the current_fs_time() function have been
replaced by other time interfaces.

And, its use cases can be fulfilled by current_time()
or ktime_get_* variants.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
---
 include/linux/fs.h |  1 -
 kernel/time/time.c | 14 --
 2 files changed, 15 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f1d7347..cce6c57 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1430,7 +1430,6 @@ static inline void i_gid_write(struct inode *inode, gid_t 
gid)
inode->i_gid = make_kgid(inode->i_sb->s_user_ns, gid);
 }
 
-extern struct timespec current_fs_time(struct super_block *sb);
 extern struct timespec current_time(struct inode *inode);
 
 /*
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 25bdd25..cf69cca 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -230,20 +230,6 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p)
return copy_to_user(txc_p, &txc, sizeof(struct timex)) ? -EFAULT : ret;
 }
 
-/**
- * current_fs_time - Return FS time
- * @sb: Superblock.
- *
- * Return the current time truncated to the time granularity supported by
- * the fs.
- */
-struct timespec current_fs_time(struct super_block *sb)
-{
-   struct timespec now = current_kernel_time();
-   return timespec_trunc(now, sb->s_time_gran);
-}
-EXPORT_SYMBOL(current_fs_time);
-
 /*
  * Convert jiffies to milliseconds and back.
  *
-- 
2.7.4

[PATCH 08/12] fs: ubifs: Replace CURRENT_TIME_SEC with current_time

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME_SEC is not y2038 safe. current_time() will
be transitioned to use 64 bit time along with vfs in a
separate patch.
There is no plan to transition CURRENT_TIME_SEC to use
y2038 safe time interfaces.

current_time() returns timestamps according to the
granularities set in the inode's super_block.
The granularity check to call current_fs_time() or
CURRENT_TIME_SEC is not required.

Use current_time() directly to update inode timestamp.
Use timespec_trunc during file system creation, before
the first inode is created.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
---
 fs/ubifs/dir.c   | 12 ++--
 fs/ubifs/file.c  | 12 ++--
 fs/ubifs/ioctl.c |  2 +-
 fs/ubifs/misc.h  | 10 --
 fs/ubifs/sb.c| 14 ++
 fs/ubifs/xattr.c |  6 +++---
 6 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 30825d88..8510d79 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -121,7 +121,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, struct 
inode *dir,
 
inode_init_owner(inode, dir, mode);
inode->i_mtime = inode->i_atime = inode->i_ctime =
-ubifs_current_time(inode);
+current_time(inode);
inode->i_mapping->nrpages = 0;
 
switch (mode & S_IFMT) {
@@ -750,7 +750,7 @@ static int ubifs_link(struct dentry *old_dentry, struct 
inode *dir,
lock_2_inodes(dir, inode);
inc_nlink(inode);
ihold(inode);
-   inode->i_ctime = ubifs_current_time(inode);
+   inode->i_ctime = current_time(inode);
dir->i_size += sz_change;
dir_ui->ui_size = dir->i_size;
dir->i_mtime = dir->i_ctime = inode->i_ctime;
@@ -823,7 +823,7 @@ static int ubifs_unlink(struct inode *dir, struct dentry 
*dentry)
}
 
lock_2_inodes(dir, inode);
-   inode->i_ctime = ubifs_current_time(dir);
+   inode->i_ctime = current_time(dir);
drop_nlink(inode);
dir->i_size -= sz_change;
dir_ui->ui_size = dir->i_size;
@@ -927,7 +927,7 @@ static int ubifs_rmdir(struct inode *dir, struct dentry 
*dentry)
}
 
lock_2_inodes(dir, inode);
-   inode->i_ctime = ubifs_current_time(dir);
+   inode->i_ctime = current_time(dir);
clear_nlink(inode);
drop_nlink(dir);
dir->i_size -= sz_change;
@@ -1405,7 +1405,7 @@ static int do_rename(struct inode *old_dir, struct dentry 
*old_dentry,
 * Like most other Unix systems, set the @i_ctime for inodes on a
 * rename.
 */
-   time = ubifs_current_time(old_dir);
+   time = current_time(old_dir);
old_inode->i_ctime = time;
 
/* We must adjust parent link count when renaming directories */
@@ -1578,7 +1578,7 @@ static int ubifs_xrename(struct inode *old_dir, struct 
dentry *old_dentry,
 
lock_4_inodes(old_dir, new_dir, NULL, NULL);
 
-   time = ubifs_current_time(old_dir);
+   time = current_time(old_dir);
fst_inode->i_ctime = time;
snd_inode->i_ctime = time;
old_dir->i_mtime = old_dir->i_ctime = time;
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..2cda3d6 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1196,7 +1196,7 @@ static int do_truncation(struct ubifs_info *c, struct 
inode *inode,
mutex_lock(&ui->ui_mutex);
ui->ui_size = inode->i_size;
/* Truncation changes inode [mc]time */
-   inode->i_mtime = inode->i_ctime = ubifs_current_time(inode);
+   inode->i_mtime = inode->i_ctime = current_time(inode);
/* Other attributes may be changed at the same time as well */
do_attr_changes(inode, attr);
err = ubifs_jnl_truncate(c, inode, old_size, new_size);
@@ -1243,7 +1243,7 @@ static int do_setattr(struct ubifs_info *c, struct inode 
*inode,
mutex_lock(&ui->ui_mutex);
if (attr->ia_valid & ATTR_SIZE) {
/* Truncation changes inode [mc]time */
-   inode->i_mtime = inode->i_ctime = ubifs_current_time(inode);
+   inode->i_mtime = inode->i_ctime = current_time(inode);
/* 'truncate_setsize()' changed @i_size, update @ui_size */
ui->ui_size = inode->i_size;
}
@@ -1420,7 +1420,7 @@ int ubifs_update_time(struct inode *inode, struct 
timespec *time,
  */
 static int update_mctime(struct inode *inode)
 {
-   struct timespec now = ubifs_current_time(inode);
+   struct timespec now = current_time(inode);
struct ubifs_inode *ui = ubifs_inode(inode);
struct ubifs_info *c = inode->i_sb->s_fs_info;
 
@@ -1434,7 +1434,7 @@ static int update_mctime(struct inode *inode)
return err;
 
mutex_lock(&ui->ui_mutex);
-   inode->i_mtime = inode->i_ctime = ubifs_current_time(inode);
+   inode->i_mtime = inode->i_ctime = current_time(inode);
release = ui->dirty;
mark_inode_dirty_sync(ino

[PATCH 03/12] fs: cifs: Replace CURRENT_TIME by other appropriate apis

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by
current_time() for filesystem times, and ktime_get_*
functions for authentication timestamps and timezone
calculations.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe.

CURRENT_TIME macro will be deleted before merging the
aforementioned change.

The inode timestamps read from the server are assumed
to have correct granularity and range.

The patch also assumes that the difference between server and
client times lie in the range INT_MIN..INT_MAX. This is valid
because this is the difference between current times between
server and client, and the largest timezone difference is in the
range of one day.

All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned
to use timespec64.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
---
 fs/cifs/cifsencrypt.c |  4 +++-
 fs/cifs/cifssmb.c | 10 +-
 fs/cifs/inode.c   | 28 +++-
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index 058ac9b..68abbb0 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -478,6 +478,7 @@ find_timestamp(struct cifs_ses *ses)
unsigned char *blobptr;
unsigned char *blobend;
struct ntlmssp2_name *attrptr;
+   struct timespec ts;
 
if (!ses->auth_key.len || !ses->auth_key.response)
return 0;
@@ -502,7 +503,8 @@ find_timestamp(struct cifs_ses *ses)
blobptr += attrsize; /* advance attr value */
}
 
-   return cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
+   ktime_get_real_ts(&ts);
+   return cpu_to_le64(cifs_UnixTimeToNT(ts));
 }
 
 static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 0669506..2f279b7 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -478,14 +478,14 @@ decode_lanman_negprot_rsp(struct TCP_Server_Info *server, 
NEGOTIATE_RSP *pSMBr)
 * this requirement.
 */
int val, seconds, remain, result;
-   struct timespec ts, utc;
-   utc = CURRENT_TIME;
+   struct timespec ts;
+   unsigned long utc = ktime_get_real_seconds();
ts = cnvrtDosUnixTm(rsp->SrvTime.Date,
rsp->SrvTime.Time, 0);
cifs_dbg(FYI, "SrvTime %d sec since 1970 (utc: %d) diff: %d\n",
-(int)ts.tv_sec, (int)utc.tv_sec,
-(int)(utc.tv_sec - ts.tv_sec));
-   val = (int)(utc.tv_sec - ts.tv_sec);
+(int)ts.tv_sec, (int)utc,
+(int)(utc - ts.tv_sec));
+   val = (int)(utc - ts.tv_sec);
seconds = abs(val);
result = (seconds / MIN_TZ_ADJ) * MIN_TZ_ADJ;
remain = seconds % MIN_TZ_ADJ;
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index b261db3..c3b2fa0 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -322,9 +322,9 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct 
super_block *sb)
fattr->cf_mode = S_IFDIR | S_IXUGO | S_IRWXU;
fattr->cf_uid = cifs_sb->mnt_uid;
fattr->cf_gid = cifs_sb->mnt_gid;
-   fattr->cf_atime = CURRENT_TIME;
-   fattr->cf_ctime = CURRENT_TIME;
-   fattr->cf_mtime = CURRENT_TIME;
+   ktime_get_real_ts(&fattr->cf_mtime);
+   fattr->cf_mtime = timespec_trunc(fattr->cf_mtime, sb->s_time_gran);
+   fattr->cf_atime = fattr->cf_ctime = fattr->cf_mtime;
fattr->cf_nlink = 2;
fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL;
 }
@@ -586,9 +586,10 @@ static int cifs_sfu_mode(struct cifs_fattr *fattr, const 
unsigned char *path,
 /* Fill a cifs_fattr struct with info from FILE_ALL_INFO */
 static void
 cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
-  struct cifs_sb_info *cifs_sb, bool adjust_tz,
+  struct super_block *sb, bool adjust_tz,
   bool symlink)
 {
+   struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
 
memset(fattr, 0, sizeof(*fattr));
@@ -598,8 +599,10 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, 
FILE_ALL_INFO *info,
 
if (info->LastAccessTime)
fattr->cf_atime = cifs_NTtimeToUnix(info->LastAccessTime);
-   else
-   fattr->cf_atime = CURRENT_TIME;
+   else {
+   ktime_get_real_ts(&fattr->cf_atime);
+   fattr->cf_atime = timespec_trunc(fattr->cf_atime, 
sb->s_time_gran);
+   }
 
fattr->cf_ctime = cifs_NTtimeToUnix(info->

[PATCH 06/12] audit: Use timespec64 to represent audit timestamps

2017-04-07 Thread Deepa Dinamani

struct timespec is not y2038 safe.
Audit timestamps are recorded in string format into
an audit buffer for a given context.
These mark the entry timestamps for the syscalls.
Use y2038 safe struct timespec64 to represent the times.
The log strings can handle this transition as strings can
hold upto 1024 characters.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
Acked-by: Paul Moore 
Acked-by: Richard Guy Briggs 
---
 include/linux/audit.h |  4 ++--
 kernel/audit.c| 10 +-
 kernel/audit.h|  2 +-
 kernel/auditsc.c  |  6 +++---
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 6fdfefc..f830508 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -332,7 +332,7 @@ static inline void audit_ptrace(struct task_struct *t)
/* Private API (for audit.c only) */
 extern unsigned int audit_serial(void);
 extern int auditsc_get_stamp(struct audit_context *ctx,
- struct timespec *t, unsigned int *serial);
+ struct timespec64 *t, unsigned int *serial);
 extern int audit_set_loginuid(kuid_t loginuid);
 
 static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
@@ -511,7 +511,7 @@ static inline void __audit_seccomp(unsigned long syscall, 
long signr, int code)
 static inline void audit_seccomp(unsigned long syscall, long signr, int code)
 { }
 static inline int auditsc_get_stamp(struct audit_context *ctx,
- struct timespec *t, unsigned int *serial)
+ struct timespec64 *t, unsigned int *serial)
 {
return 0;
 }
diff --git a/kernel/audit.c b/kernel/audit.c
index 2f4964c..fcbf377 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1625,10 +1625,10 @@ unsigned int audit_serial(void)
 }
 
 static inline void audit_get_stamp(struct audit_context *ctx,
-  struct timespec *t, unsigned int *serial)
+  struct timespec64 *t, unsigned int *serial)
 {
if (!ctx || !auditsc_get_stamp(ctx, t, serial)) {
-   *t = CURRENT_TIME;
+   ktime_get_real_ts64(t);
*serial = audit_serial();
}
 }
@@ -1652,7 +1652,7 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
 int type)
 {
struct audit_buffer *ab;
-   struct timespec t;
+   struct timespec64 t;
unsigned int uninitialized_var(serial);
 
if (audit_initialized != AUDIT_INITIALIZED)
@@ -1705,8 +1705,8 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
}
 
audit_get_stamp(ab->ctx, &t, &serial);
-   audit_log_format(ab, "audit(%lu.%03lu:%u): ",
-t.tv_sec, t.tv_nsec/100, serial);
+   audit_log_format(ab, "audit(%llu.%03lu:%u): ",
+(unsigned long long)t.tv_sec, t.tv_nsec/100, 
serial);
 
return ab;
 }
diff --git a/kernel/audit.h b/kernel/audit.h
index 0f1cf6d..cdf96f4 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -112,7 +112,7 @@ struct audit_context {
enum audit_statestate, current_state;
unsigned intserial; /* serial number for record */
int major;  /* syscall number */
-   struct timespec ctime;  /* time of syscall entry */
+   struct timespec64   ctime;  /* time of syscall entry */
unsigned long   argv[4];/* syscall arguments */
longreturn_code;/* syscall return code */
u64 prio;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index e59ffc7..a2d9217 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1532,7 +1532,7 @@ void __audit_syscall_entry(int major, unsigned long a1, 
unsigned long a2,
return;
 
context->serial = 0;
-   context->ctime  = CURRENT_TIME;
+   ktime_get_real_ts64(&context->ctime);
context->in_syscall = 1;
context->current_state  = state;
context->ppid   = 0;
@@ -1941,13 +1941,13 @@ EXPORT_SYMBOL_GPL(__audit_inode_child);
 /**
  * auditsc_get_stamp - get local copies of audit_context values
  * @ctx: audit_context for the task
- * @t: timespec to store time recorded in the audit_context
+ * @t: timespec64 to store time recorded in the audit_context
  * @serial: serial value that is recorded in the audit_context
  *
  * Also sets the context as auditable.
  */
 int auditsc_get_stamp(struct audit_context *ctx,
-  struct timespec *t, unsigned int *serial)
+  struct timespec64 *t, unsigned int *serial)
 {
if (!ctx->in_syscall)
return 0;
-- 
2.7.4

[PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME is not y2038 safe.
The macro will be deleted and all the references to it
will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.
Retain timespec for timestamp representation here as ceph
uses it internally everywhere.
These references will be changed to use struct timespec64
in a separate patch.

The current_fs_time() api is being changed to use vfs
struct inode* as an argument instead of struct super_block*.

Set the new mds client request r_stamp field using
ktime_get_real_ts() instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use
timespec_trunc() to truncate the timestamp, using the right
granularity from the superblock.

This api will be transitioned to be y2038 safe along
with vfs.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
---
 drivers/block/rbd.c   | 2 +-
 fs/ceph/mds_client.c  | 4 +++-
 net/ceph/messenger.c  | 6 --
 net/ceph/osd_client.c | 4 ++--
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 517838b..77204da 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct 
rbd_obj_request *obj_request)
 {
struct ceph_osd_request *osd_req = obj_request->osd_req;
 
-   osd_req->r_mtime = CURRENT_TIME;
+   ktime_get_real_ts(&osd_req->r_mtime);
osd_req->r_data_offset = obj_request->offset;
 }
 
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index c681762..1d3fa90 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1666,6 +1666,7 @@ struct ceph_mds_request *
 ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
 {
struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS);
+   struct timespec ts;
 
if (!req)
return ERR_PTR(-ENOMEM);
@@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, 
int op, int mode)
init_completion(&req->r_safe_completion);
INIT_LIST_HEAD(&req->r_unsafe_item);
 
-   req->r_stamp = current_fs_time(mdsc->fsc->sb);
+   ktime_get_real_ts(&ts);
+   req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran);
 
req->r_op = op;
req->r_direct_mode = mode;
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index f76bb33..5766a6c 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1386,8 +1386,9 @@ static void prepare_write_keepalive(struct 
ceph_connection *con)
dout("prepare_write_keepalive %p\n", con);
con_out_kvec_reset(con);
if (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2) {
-   struct timespec now = CURRENT_TIME;
+   struct timespec now;
 
+   ktime_get_real_ts(&now);
con_out_kvec_add(con, sizeof(tag_keepalive2), &tag_keepalive2);
ceph_encode_timespec(&con->out_temp_keepalive2, &now);
con_out_kvec_add(con, sizeof(con->out_temp_keepalive2),
@@ -3176,8 +3177,9 @@ bool ceph_con_keepalive_expired(struct ceph_connection 
*con,
 {
if (interval > 0 &&
(con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2)) {
-   struct timespec now = CURRENT_TIME;
+   struct timespec now;
struct timespec ts;
+   ktime_get_real_ts(&now);
jiffies_to_timespec(interval, &ts);
ts = timespec_add(con->last_keepalive_ack, ts);
return timespec_compare(&now, &ts) >= 0;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index e15ea9e..242d7c0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -3574,7 +3574,7 @@ ceph_osdc_watch(struct ceph_osd_client *osdc,
ceph_oid_copy(&lreq->t.base_oid, oid);
ceph_oloc_copy(&lreq->t.base_oloc, oloc);
lreq->t.flags = CEPH_OSD_FLAG_WRITE;
-   lreq->mtime = CURRENT_TIME;
+   ktime_get_real_ts(&lreq->mtime);
 
lreq->reg_req = alloc_linger_request(lreq);
if (!lreq->reg_req) {
@@ -3632,7 +3632,7 @@ int ceph_osdc_unwatch(struct ceph_osd_client *osdc,
ceph_oid_copy(&req->r_base_oid, &lreq->t.base_oid);
ceph_oloc_copy(&req->r_base_oloc, &lreq->t.base_oloc);
req->r_flags = CEPH_OSD_FLAG_WRITE;
-   req->r_mtime = CURRENT_TIME;
+   ktime_get_real_ts(&req->r_mtime);
osd_req_op_watch_init(req, 0, lreq->linger_id,
  CEPH_OSD_WATCH_OP_UNWATCH);
 
-- 
2.7.4

[PATCH 01/12] fs: f2fs: Use ktime_get_real_seconds for sit_info times

2017-04-07 Thread Deepa Dinamani

CURRENT_TIME_SEC is not y2038 safe.

Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds
in segment timestamps used by GC algorithm including the
segment mtime timestamps.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
---
 fs/f2fs/segment.c | 2 +-
 fs/f2fs/segment.h | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 010324c..0531500 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2678,7 +2678,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
sit_i->dirty_sentries = 0;
sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK;
sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time);
-   sit_i->mounted_time = CURRENT_TIME_SEC.tv_sec;
+   sit_i->mounted_time = ktime_get_real_seconds();
mutex_init(&sit_i->sentry_lock);
return 0;
 }
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 57e36c1..156afc3 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -692,8 +692,9 @@ static inline void set_to_next_sit(struct sit_info *sit_i, 
unsigned int start)
 static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi)
 {
struct sit_info *sit_i = SIT_I(sbi);
-   return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec -
-   sit_i->mounted_time;
+   time64_t now = ktime_get_real_seconds();
+
+   return sit_i->elapsed_time + now - sit_i->mounted_time;
 }
 
 static inline void set_summary(struct f2fs_summary *sum, nid_t nid,
-- 
2.7.4

[PATCH 00/12] Delete CURRENT_TIME, CURRENT_TIME_SEC and current_fs_time

2017-04-07 Thread Deepa Dinamani

The series contains the last unmerged uses of CURRENT_TIME,
CURRENT_TIME_SEC, and current_fs_time().
The series also deletes these apis.

All the patches except [PATCH 9/12] and [PATCH 10/12] are resend patches.
These patches fix new instances of CURRENT_TIME.
cifs and ceph patches have been squashed so that we have one patch per
filesystem.

We want to get these merged onto 4.12 release so that I can post the series
that changes vfs timestamps to use 64 bits for 4.13 release.

I'm proposing these to be merged through Andrew's tree.

Filesystem maintainers, please let Andrew know if you will be picking up
the patch in your trees.

Let me know if anybody has other preferences for merging.

Deepa Dinamani (12):
  fs: f2fs: Use ktime_get_real_seconds for sit_info times
  trace: Make trace_hwlat timestamp y2038 safe
  fs: cifs: Replace CURRENT_TIME by other appropriate apis
  fs: ceph: CURRENT_TIME with ktime_get_real_ts()
  fs: ufs: Use ktime_get_real_ts64() for birthtime
  audit: Use timespec64 to represent audit timestamps
  fs: btrfs: Use ktime_get_real_ts for root ctime
  fs: ubifs: Replace CURRENT_TIME_SEC with current_time
  lustre: Replace CURRENT_TIME macro
  apparmorfs: Replace CURRENT_TIME with current_time()
  time: Delete CURRENT_TIME_SEC and CURRENT_TIME
  time: Delete current_fs_time() function

 drivers/block/rbd.c |  2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c |  6 +++---
 drivers/staging/lustre/lustre/osc/osc_io.c  |  4 ++--
 fs/btrfs/root-tree.c|  3 ++-
 fs/ceph/mds_client.c|  4 +++-
 fs/cifs/cifsencrypt.c   |  4 +++-
 fs/cifs/cifssmb.c   | 10 -
 fs/cifs/inode.c | 28 +
 fs/f2fs/segment.c   |  2 +-
 fs/f2fs/segment.h   |  5 +++--
 fs/ubifs/dir.c  | 12 +--
 fs/ubifs/file.c | 12 +--
 fs/ubifs/ioctl.c|  2 +-
 fs/ubifs/misc.h | 10 -
 fs/ubifs/sb.c   | 14 +
 fs/ubifs/xattr.c|  6 +++---
 fs/ufs/ialloc.c |  6 --
 include/linux/audit.h   |  4 ++--
 include/linux/fs.h  |  1 -
 include/linux/time.h|  3 ---
 kernel/audit.c  | 10 -
 kernel/audit.h  |  2 +-
 kernel/auditsc.c|  6 +++---
 kernel/time/time.c  | 14 -
 kernel/trace/trace_entries.h|  6 +++---
 kernel/trace/trace_hwlat.c  | 14 ++---
 kernel/trace/trace_output.c |  9 
 net/ceph/messenger.c|  6 --
 net/ceph/osd_client.c   |  4 ++--
 security/apparmor/apparmorfs.c  |  2 +-
 30 files changed, 100 insertions(+), 111 deletions(-)

-- 
2.7.4

Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

2017-04-07 Thread Grygorii Strashko

Hi Sebastian,

On 09/09/2016 07:46 AM, Grygorii Strashko wrote:
> On 09/08/2016 07:00 PM, Sebastian Andrzej Siewior wrote:
>> On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
>>> I've collected trace before first occurrence of  "NOHZ: 
>>> local_softirq_pending 80"
>>>
> 
>>
>>>  irq/354-4848400-85[000]90.642393: softirq_exit: vec=3 
>>> [action=NET_RX]
>>>  irq/354-4848400-85[000]90.642419: sched_switch: 
>>> irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
>>
>> We don't serve TIMER & SCHED because those two are pushed to the
>> ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
>> the next best thing which is RCU.
>>
>>>   rcuc/0-11[000]90.642430: irq_handler_entry:irq=354 
>>> name=48484000.ethernet
>> but not for long.
>>
>>>   rcuc/0-11[000]90.642432: irq_handler_exit: irq=354 
>>> ret=handled
>>>   rcuc/0-11[000]90.642435: sched_waking: 
>>> comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>>>   rcuc/0-11[000]90.642442: sched_migrate_task:   
>>> comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>>>   rcuc/0-11[000]90.642453: sched_wakeup: 
>>> irq/354-4848400:85 [49] success=1 CPU:001
>>>iperf-1284  [001]90.642462: sched_stat_runtime:   comm=iperf 
>>> pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>>>   rcuc/0-11[000]90.642464: irq_handler_entry:irq=355 
>>> name=48484000.ethernet
>>>   rcuc/0-11[000]90.642466: irq_handler_exit: irq=355 
>>> ret=handled
>>>   rcuc/0-11[000]90.642469: sched_waking: 
>>> comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>>>iperf-1284  [001]90.642473: sched_switch: iperf:1284 
>>> [120] R ==> irq/354-4848400:85 [49]
>>>  irq/354-4848400-85[001]90.642481: softirq_raise:vec=3 
>>> [action=NET_RX]
>>>   rcuc/0-11[000]90.642483: sched_wakeup: 
>>> irq/355-4848400:86 [49] success=1 CPU:001
>>>  irq/354-4848400-85[001]90.642493: softirq_entry:vec=3 
>>> [action=NET_RX]
>>>   rcuc/0-11[000]90.642497: sched_migrate_task:   
>>> comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
>> ach that IRQ thread no pinned. Good. We migrate.
>>
> 
> It looks like scheduler playing ping-pong between CPUs with threaded irqs 
> irq/354-355.
> And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
> I can see better latency and netperf improvement
> cyclictest -m -Sp98 -q  -D4m
> T: 0 ( 1318) P:98 I:1000 C: 24 Min:  9 Act:   14 Avg:   15 Max:  
> 42
> T: 1 ( 1319) P:98 I:1500 C: 159909 Min:  9 Act:   14 Avg:   16 Max:  
> 39
> 
> if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
> I can observe more less similar results as with this patch. 
> 
> 
>>>   rcuc/0-11[000]90.642515: irq_handler_entry:irq=354 
>>> name=48484000.ethernet
>>>   rcuc/0-11[000]90.642516: irq_handler_exit: irq=354 
>>> ret=handled
> 
>>
>> As you see ksoftirqd left the CPU with a D so I would assume it is
>> blocked on a lock and waits. 
>> NET_RX is in progress but scheduled out due to RCUC which is also
>> scheduled out.
>>
>> I assume we got to softirq because nothing else can run. It will see
>> that NET_RX is pending and tries it but blocks on the lock
>> (lock_softirq()). It schedules out. Nothing left -> idle.
>>
>> The idle code checks to see if a softirq is pending and in fact there is
>> SCHED on the list and ksoftirq was about to handle it but due to
>> ordering complication (NET_RX before SCHED) it can't. And we have the
>> warning.
>>
>> This 
>>
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -105,6 +105,7 @@ void softirq_check_pending_idle(void)
>>  {
>>  static int rate_limit;
>>  struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
>> +struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd);
>>  u32 warnpending;
>>  int i;
>>  
>> @@ -112,7 +113,7 @@ void softirq_check_pending_idle(void)
>>  return;
>>  
>>  warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
>> -for (i = 0; i < NR_SOFTIRQS; i++) {
>> +for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) {
>>  struct task_struct *tsk = sr->runner[i];
>>  
>>  /*
>> @@ -132,6 +133,15 @@ void softirq_check_pending_idle(void)
>>  }
>>  }
>>  
>> +if (warnpending && ksoft_tsk) {
>> +raw_spin_lock(&ksoft_tsk->pi_lock);
>> +if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == 
>> TASK_RUNNING) {
>> +/* Clear all bits pending in that task */
>> +warnpending &= ~(ksoft_tsk->softirqs_raised);
>> +}
>> +raw_spin_unlock(&ksoft_tsk->pi_lock);
>> +}
>> +
>>  if (warnpending) {
>>  printk(K

Re: [PATCH 02/12] ftgmac100: Move ftgmac100_hard_start_xmit() around

2017-04-07 Thread Benjamin Herrenschmidt

On Fri, 2017-04-07 at 05:49 -0700, David Miller wrote:
> >    /* Tx ring */
> > + struct sk_buff *tx_skbs[TX_QUEUE_ENTRIES];
> >    unsigned int tx_clean_pointer;
> >    unsigned int tx_pointer;
> >    unsigned int tx_pending;
> 
> Add this only in patch #6 or wherever it is that you start using it,
> not this patch

Yup, just an accident, funny that I didn't notice it (I did review my
own series ;-)

Thanks !

Ben.

Re: [PATCH 11/12] ftgmac100: Add support for fragmented tx

2017-04-07 Thread Benjamin Herrenschmidt

On Fri, 2017-04-07 at 06:26 -0700, Florian Fainelli wrote:
> 
> On 04/06/2017 08:31 PM, Benjamin Herrenschmidt wrote:
> > Add NETIF_F_SG and create multiple TX ring entries for skb fragments.
> > 
> > On reclaim, the skb is only freed on the segment marked as "last".
> > 
> > > > Signed-off-by: Benjamin Herrenschmidt 
> > 
> 
> [snip]
> >  
> > > > -   dma_unmap_single(priv->dev, map, skb_headlen(skb), 
> > > > DMA_TO_DEVICE);
> > > > +   if (skb_shinfo(skb)->nr_frags == 0 && len < ETH_ZLEN)
> > +   len = ETH_ZLEN;
> 
> This is where skb_put_padto() would help you eliminate this test since
> you'd be dealing skb->len >= ETH_ZLEN.

Ok, thanks.

> > +   dma_unmap_single(priv->dev, map, len, DMA_TO_DEVICE);
> > > > +   } else {
> > > > +   dma_unmap_page(priv->dev, map,
> > > > +      ftgmac100_txdes_get_buffer_size(txdes),
> > > > +      DMA_TO_DEVICE);
> > > > +   }
> >  
> > > > -   dev_kfree_skb(skb);
> > > > +   if (ftgmac100_txdes_get_last_segment(txdes))
> > +   dev_kfree_skb(skb);
> 
> This makes you do an uncached access to the descriptor, right? is there
> a way you could use bookeeping information to free the last fragment?

Not a big deal, it's handled in a subsequent patch. I have to read the
descriptor first word anyway to know the packet has been completed, 
a further patch I just pass that info along to ftgmac100_free_tx_packet

> >     priv->tx_skbs[pointer] = NULL;
> >  
> > > >     /* Clear txdes0 except end of ring bit, clear txdes1 as we
> > @@ -623,10 +642,9 @@ static void ftgmac100_tx_complete(struct ftgmac100 
> > *priv)
> >  static int ftgmac100_hard_start_xmit(struct sk_buff *skb,
> > > >      struct net_device *netdev)
> >  {
> > > > -   unsigned int len = (skb->len < ETH_ZLEN) ? ETH_ZLEN : skb->len;
> > > >     struct ftgmac100 *priv = netdev_priv(netdev);
> > > > -   struct ftgmac100_txdes *txdes;
> > > > -   unsigned int pointer;
> > > > +   struct ftgmac100_txdes *txdes, *first;
> > > > +   unsigned int pointer, nfrags, len, i, j;
> > > >     dma_addr_t map;
> >  
> > > >     /* The HW doesn't pad small frames */
> > @@ -642,26 +660,35 @@ static int ftgmac100_hard_start_xmit(struct sk_buff 
> > *skb,
> > > >     goto drop;
> > > >     }
> >  
> > > > -   map = dma_map_single(priv->dev, skb->data, skb_headlen(skb), 
> > > > DMA_TO_DEVICE);
> > > > -   if (unlikely(dma_mapping_error(priv->dev, map))) {
> > > > -   /* drop packet */
> > > > +   /* Do we have a limit on #fragments ? I yet have to get a reply
> > > > +    * from Aspeed. If there's one I haven't hit it.
> > > > +    */
> > > > +   nfrags = skb_shinfo(skb)->nr_frags;
> > +
> > > > +   /* Get header len and pad for non-fragmented packets */
> > > > +   len = skb_headlen(skb);
> > > > +   if (nfrags == 0 && len < ETH_ZLEN)
> > +   len = ETH_ZLEN;
> 
> Same here skb_put_padto() would eliminate the test.

Yup, I'll fix that, thx.

> [snip]
> 
> >  
> > + dma_err:
> > > > +   if (net_ratelimit())
> > +   netdev_err(netdev, "map tx fragment failed\n");
> 
> You may consider adding a software counter that tracks mapping failures
> (few drivers do that) in a subsequent set of changes.

Ok. I want to add a bunch of SW counters for other things too so
I'll add to the list.

Cheers,
Ben.

Re: [PATCH 05/12] ftgmac100: Pad small frames properly

2017-04-07 Thread Benjamin Herrenschmidt

On Fri, 2017-04-07 at 06:05 -0700, Florian Fainelli wrote:
> You may want to use skb_put_padto() which also takes care of bumping
> skb->len accordingly, in case that makes a difference for the
> ftgmac100 hardware.

Subsequent patch adds fragments, so it needs to bump headlen. I'll have
a look this week-end, thanks.

Cheers,
Ben.

Re: [PATCH 02/12] ftgmac100: Move ftgmac100_hard_start_xmit() around

2017-04-07 Thread Benjamin Herrenschmidt

On Fri, 2017-04-07 at 13:09 +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 4/7/2017 6:30 AM, Benjamin Herrenschmidt wrote:
> 
> > Move it below ftgmac100_xmit() and the rest of the tx path
> > 
> > No code change.
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > ---
> >  drivers/net/ethernet/faraday/ftgmac100.c | 59 --
> > --
> >  1 file changed, 30 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index f4b719214..3966c7a 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -63,6 +63,7 @@ struct ftgmac100 {
> >     u32 rxdes0_edorr_mask;
> > 
> >     /* Tx ring */
> > +   struct sk_buff *tx_skbs[TX_QUEUE_ENTRIES];
> 
> "No code change", really?

Patch splitting accident, thanks for pointing it out. I'll fix it up in
the next round.

> 
> >     unsigned int tx_clean_pointer;
> >     unsigned int tx_pointer;
> >     unsigned int tx_pending;
> 
> [...]
> 
> MBR, Sergei

[PATCH net-next] gso: Support frag_list splitting with head_frag

2017-04-07 Thread ilant

From: Ilan Tayari 

A driver may use build_skb() for received packets.
These SKBs then have a head_frag.

Since commit d7e8883cfcf4 ("net: make GRO aware of
skb->head_frag"), GRO may build frag_list SKBs out of
head_frag received SKBs.
In such a case, the chained SKBs end up with a head_frag.

Commit 07b26c9454a2 ("gso: Support partial splitting at
the frag_list pointer") adds partial segmentation of frag_list
SKB chains into individual SKBs.
However, this is not done if the chained SKBs have any
linear part, because the device may not be able to DMA
the private linear buffer.

A chained frag_list SKB with head_frag is wrongfully
detected in this case as having a private linear part
and thus falls back to software GSO, while in fact the
linear part is backed by a DMA page just like any other frag.

This causes low performance when forwarding those packets
that were built with build_skb()

Allow partial segmentation at the frag_list pointer for
chained SKBs with head_frag.

Note that such SKBs can only be created by GRO, when applied
to received packets with head_frag.
Also note that this change only affects the data path that
performs the partial segmentation at frag_list pointer, and
not any of the other more common data paths.

Signed-off-by: Ilan Tayari 
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9f781092fda9..5d9a11eafbf5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3093,7 +3093,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 * containing the same amount of data.
 */
skb_walk_frags(head_skb, iter) {
-   if (skb_headlen(iter))
+   if (skb_headlen(iter) && !iter->head_frag)
goto normal;
 
len -= iter->len;
-- 
2.11.0

[PATCH net-next 4/8] rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event

2017-04-07 Thread David Ahern

Changing the master device for a link generates many messages; the one
generated for POST_TYPE_CHANGE is redundant:

[LINK]11: dummy1:  mtu 1500 qdisc noqueue master 
br1 state UNKNOWN group default event POST_TYPE_CHANGE
link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff
[LINK]11: dummy1:  mtu 1500 qdisc noqueue master 
br1 state UNKNOWN group default
link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff

Remove POST_TYPE_CHANGE from the list of notifiers that generate
notifications.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 1 -
 net/core/rtnetlink.c | 4 
 2 files changed, 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index df5ade1bc684..80f3574a4630 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -906,7 +906,6 @@ enum {
IFLA_EVENT_CHANGE_NAME,
IFLA_EVENT_FEAT_CHANGE,
IFLA_EVENT_BONDING_FAILOVER,
-   IFLA_EVENT_POST_TYPE_CHANGE,
IFLA_EVENT_NOTIFY_PEERS,
IFLA_EVENT_CHANGE_UPPER,
IFLA_EVENT_RESEND_IGMP,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1503138ebfe1..739b06ac3e7f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1294,9 +1294,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_BONDING_FAILOVER:
rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
break;
-   case NETDEV_POST_TYPE_CHANGE:
-   rtnl_event = IFLA_EVENT_POST_TYPE_CHANGE;
-   break;
case NETDEV_NOTIFY_PEERS:
rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
break;
@@ -4182,7 +4179,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
case NETDEV_BONDING_FAILOVER:
-   case NETDEV_POST_TYPE_CHANGE:
case NETDEV_NOTIFY_PEERS:
case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
-- 
2.1.4

[PATCH net-next 8/8] rtnetlink: Do not generate notifications for CHANGELOWERSTATE event

2017-04-07 Thread David Ahern

CHANGELOWERSTATE is an internal event; do not generate userspace
notifications.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 1 -
 net/core/rtnetlink.c | 4 
 2 files changed, 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8f23e9dde667..37e19966fbf3 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -908,7 +908,6 @@ enum {
IFLA_EVENT_BONDING_FAILOVER,
IFLA_EVENT_CHANGE_UPPER,
IFLA_EVENT_RESEND_IGMP,
-   IFLA_EVENT_CHANGE_LOWER_STATE,
IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2587aa339c4..21c03861aaf3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1300,9 +1300,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_RESEND_IGMP:
rtnl_event = IFLA_EVENT_RESEND_IGMP;
break;
-   case NETDEV_CHANGELOWERSTATE:
-   rtnl_event = IFLA_EVENT_CHANGE_LOWER_STATE;
-   break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
rtnl_event = IFLA_EVENT_CHANGE_TX_QUEUE_LEN;
break;
@@ -4172,7 +4169,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_BONDING_FAILOVER:
case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
-   case NETDEV_CHANGELOWERSTATE:
case NETDEV_CHANGE_TX_QUEUE_LEN:
rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL);
break;
-- 
2.1.4

[PATCH net-next 5/8] rtnetlink: Remove NETDEV_CHANGEINFODATA netdev event

2017-04-07 Thread David Ahern

Nobody cares about the event, so don't send it.

Signed-off-by: David Ahern 
---
 drivers/net/bonding/bond_options.c | 2 --
 include/linux/netdevice.h  | 1 -
 include/uapi/linux/if_link.h   | 1 -
 net/core/rtnetlink.c   | 4 
 4 files changed, 8 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 1bcbb8913e17..533518a64496 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -673,8 +673,6 @@ int __bond_opt_set(struct bonding *bond,
 out:
if (ret)
bond_opt_error_interpret(bond, opt, ret, val);
-   else if (bond->dev->reg_state == NETREG_REGISTERED)
-   call_netdevice_notifiers(NETDEV_CHANGEINFODATA, bond->dev);
 
return ret;
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cc07c3be2705..bd0d26fbec2b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2279,7 +2279,6 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_CHANGEUPPER 0x0015
 #define NETDEV_RESEND_IGMP 0x0016
 #define NETDEV_PRECHANGEMTU0x0017 /* notify before mtu change happened */
-#define NETDEV_CHANGEINFODATA  0x0018
 #define NETDEV_BONDING_INFO0x0019
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 80f3574a4630..a500a12d0131 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -909,7 +909,6 @@ enum {
IFLA_EVENT_NOTIFY_PEERS,
IFLA_EVENT_CHANGE_UPPER,
IFLA_EVENT_RESEND_IGMP,
-   IFLA_EVENT_CHANGE_INFO_DATA,
IFLA_EVENT_PRE_CHANGE_UPPER,
IFLA_EVENT_CHANGE_LOWER_STATE,
IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 739b06ac3e7f..702723c5db58 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1303,9 +1303,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_RESEND_IGMP:
rtnl_event = IFLA_EVENT_RESEND_IGMP;
break;
-   case NETDEV_CHANGEINFODATA:
-   rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
-   break;
case NETDEV_PRECHANGEUPPER:
rtnl_event = IFLA_EVENT_PRE_CHANGE_UPPER;
break;
@@ -4182,7 +4179,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_NOTIFY_PEERS:
case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
-   case NETDEV_CHANGEINFODATA:
case NETDEV_PRECHANGEUPPER:
case NETDEV_CHANGELOWERSTATE:
case NETDEV_CHANGE_TX_QUEUE_LEN:
-- 
2.1.4

[PATCH net-next 7/8] rtnetlink: Do not generate notifications for NOTIFY_PEERS event

2017-04-07 Thread David Ahern

NOTIFY_PEERS is an internal event; do not generate userspace
notifications.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 1 -
 net/core/rtnetlink.c | 4 
 2 files changed, 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4fa3bf3eb21d..8f23e9dde667 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -906,7 +906,6 @@ enum {
IFLA_EVENT_CHANGE_NAME,
IFLA_EVENT_FEAT_CHANGE,
IFLA_EVENT_BONDING_FAILOVER,
-   IFLA_EVENT_NOTIFY_PEERS,
IFLA_EVENT_CHANGE_UPPER,
IFLA_EVENT_RESEND_IGMP,
IFLA_EVENT_CHANGE_LOWER_STATE,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e2b0ec5174e7..d2587aa339c4 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1294,9 +1294,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_BONDING_FAILOVER:
rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
break;
-   case NETDEV_NOTIFY_PEERS:
-   rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
-   break;
case NETDEV_CHANGEUPPER:
rtnl_event = IFLA_EVENT_CHANGE_UPPER;
break;
@@ -4173,7 +4170,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
case NETDEV_BONDING_FAILOVER:
-   case NETDEV_NOTIFY_PEERS:
case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
case NETDEV_CHANGELOWERSTATE:
-- 
2.1.4

[PATCH net-next 6/8] rtnetlink: Do not generate notifications for PRECHANGEUPPER event

2017-04-07 Thread David Ahern

PRECHANGEUPPER is an internal event; do not generate userspace
notifications.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 1 -
 net/core/rtnetlink.c | 4 
 2 files changed, 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a500a12d0131..4fa3bf3eb21d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -909,7 +909,6 @@ enum {
IFLA_EVENT_NOTIFY_PEERS,
IFLA_EVENT_CHANGE_UPPER,
IFLA_EVENT_RESEND_IGMP,
-   IFLA_EVENT_PRE_CHANGE_UPPER,
IFLA_EVENT_CHANGE_LOWER_STATE,
IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 702723c5db58..e2b0ec5174e7 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1303,9 +1303,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_RESEND_IGMP:
rtnl_event = IFLA_EVENT_RESEND_IGMP;
break;
-   case NETDEV_PRECHANGEUPPER:
-   rtnl_event = IFLA_EVENT_PRE_CHANGE_UPPER;
-   break;
case NETDEV_CHANGELOWERSTATE:
rtnl_event = IFLA_EVENT_CHANGE_LOWER_STATE;
break;
@@ -4179,7 +4176,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_NOTIFY_PEERS:
case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
-   case NETDEV_PRECHANGEUPPER:
case NETDEV_CHANGELOWERSTATE:
case NETDEV_CHANGE_TX_QUEUE_LEN:
rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL);
-- 
2.1.4

[PATCH net-next 2/8] rtnetlink: Do not generate notification for UDP_TUNNEL_PUSH_INFO

2017-04-07 Thread David Ahern

NETDEV_UDP_TUNNEL_PUSH_INFO is an internal notifier; nothing userspace
can do so don't generate a netlink notification.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 1 -
 net/core/rtnetlink.c | 4 
 2 files changed, 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index e8b7e9342cc0..064218a840b7 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -914,7 +914,6 @@ enum {
IFLA_EVENT_CHANGE_INFO_DATA,
IFLA_EVENT_PRE_CHANGE_UPPER,
IFLA_EVENT_CHANGE_LOWER_STATE,
-   IFLA_EVENT_UDP_TUNNEL_PUSH_INFO,
IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 72d365ae14b3..1a46be074fd6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1318,9 +1318,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_CHANGELOWERSTATE:
rtnl_event = IFLA_EVENT_CHANGE_LOWER_STATE;
break;
-   case NETDEV_UDP_TUNNEL_PUSH_INFO:
-   rtnl_event = IFLA_EVENT_UDP_TUNNEL_PUSH_INFO;
-   break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
rtnl_event = IFLA_EVENT_CHANGE_TX_QUEUE_LEN;
break;
@@ -4196,7 +4193,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_CHANGEINFODATA:
case NETDEV_PRECHANGEUPPER:
case NETDEV_CHANGELOWERSTATE:
-   case NETDEV_UDP_TUNNEL_PUSH_INFO:
case NETDEV_CHANGE_TX_QUEUE_LEN:
rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL);
break;
-- 
2.1.4

[PATCH net-next 1/8] rtnetlink: Do not generate notifications for MTU events

2017-04-07 Thread David Ahern

Changing MTU on a link currently causes 3 messages to be sent to userspace:

[LINK]11: dummy1:  mtu 1490 qdisc noqueue state 
UNKNOWN group default event PRE_CHANGE_MTU
link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff

[LINK]11: dummy1:  mtu 1500 qdisc noqueue state 
UNKNOWN group default event CHANGE_MTU
link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff

[LINK]11: dummy1:  mtu 1500 qdisc noqueue state 
UNKNOWN group default
link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff

Remove the PRE_CHANGE_MTU and CHANGE_MTU messages.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 2 --
 net/core/rtnetlink.c | 8 
 2 files changed, 10 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 97f6d302f627..e8b7e9342cc0 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -903,7 +903,6 @@ enum {
 enum {
IFLA_EVENT_UNSPEC,
IFLA_EVENT_REBOOT,
-   IFLA_EVENT_CHANGE_MTU,
IFLA_EVENT_CHANGE_ADDR,
IFLA_EVENT_CHANGE_NAME,
IFLA_EVENT_FEAT_CHANGE,
@@ -912,7 +911,6 @@ enum {
IFLA_EVENT_NOTIFY_PEERS,
IFLA_EVENT_CHANGE_UPPER,
IFLA_EVENT_RESEND_IGMP,
-   IFLA_EVENT_PRE_CHANGE_MTU,
IFLA_EVENT_CHANGE_INFO_DATA,
IFLA_EVENT_PRE_CHANGE_UPPER,
IFLA_EVENT_CHANGE_LOWER_STATE,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b2bd4c9ee860..72d365ae14b3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1285,9 +1285,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_REBOOT:
rtnl_event = IFLA_EVENT_REBOOT;
break;
-   case NETDEV_CHANGEMTU:
-   rtnl_event = IFLA_EVENT_CHANGE_MTU;
-   break;
case NETDEV_CHANGEADDR:
rtnl_event = IFLA_EVENT_CHANGE_ADDR;
break;
@@ -1312,9 +1309,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_RESEND_IGMP:
rtnl_event = IFLA_EVENT_RESEND_IGMP;
break;
-   case NETDEV_PRECHANGEMTU:
-   rtnl_event = IFLA_EVENT_PRE_CHANGE_MTU;
-   break;
case NETDEV_CHANGEINFODATA:
rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
break;
@@ -4191,7 +4185,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
 
switch (event) {
case NETDEV_REBOOT:
-   case NETDEV_CHANGEMTU:
case NETDEV_CHANGEADDR:
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
@@ -4200,7 +4193,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_NOTIFY_PEERS:
case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
-   case NETDEV_PRECHANGEMTU:
case NETDEV_CHANGEINFODATA:
case NETDEV_PRECHANGEUPPER:
case NETDEV_CHANGELOWERSTATE:
-- 
2.1.4

[PATCH net-next 3/8] rtnetlink: Do not generate notifications for CHANGEADDR event

2017-04-07 Thread David Ahern

Changing hardware address generates redundant messages:

[LINK]11: dummy1:  mtu 1500 qdisc noqueue state 
UNKNOWN group default event CHANGE_ADDR
link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff

[LINK]11: dummy1:  mtu 1500 qdisc noqueue state 
UNKNOWN group default
link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff

Do not send a notification for the CHANGEADDR notifier.

Signed-off-by: David Ahern 
---
 include/uapi/linux/if_link.h | 1 -
 net/core/rtnetlink.c | 4 
 2 files changed, 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 064218a840b7..df5ade1bc684 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -903,7 +903,6 @@ enum {
 enum {
IFLA_EVENT_UNSPEC,
IFLA_EVENT_REBOOT,
-   IFLA_EVENT_CHANGE_ADDR,
IFLA_EVENT_CHANGE_NAME,
IFLA_EVENT_FEAT_CHANGE,
IFLA_EVENT_BONDING_FAILOVER,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1a46be074fd6..1503138ebfe1 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1285,9 +1285,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, 
unsigned long event)
case NETDEV_REBOOT:
rtnl_event = IFLA_EVENT_REBOOT;
break;
-   case NETDEV_CHANGEADDR:
-   rtnl_event = IFLA_EVENT_CHANGE_ADDR;
-   break;
case NETDEV_CHANGENAME:
rtnl_event = IFLA_EVENT_CHANGE_NAME;
break;
@@ -4182,7 +4179,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
 
switch (event) {
case NETDEV_REBOOT:
-   case NETDEV_CHANGEADDR:
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
case NETDEV_BONDING_FAILOVER:
-- 
2.1.4

[PATCH net-next 0/8] rtnetlink: Cleanup user notifications for netdev events

2017-04-07 Thread David Ahern

Vlad's recent patch to add the event type to rtnetlink notifications
points out a number of redundant or unnecessary notifications sent to
userspace for events that are essentially internal to the kernel. Trim
the list before the new IFLA attributes become part of the UAPI.

David Ahern (8):
  rtnetlink: Do not generate notifications for MTU events
  rtnetlink: Do not generate notification for UDP_TUNNEL_PUSH_INFO
  rtnetlink: Do not generate notifications for CHANGEADDR event
  rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event
  rtnetlink: Remove NETDEV_CHANGEINFODATA netdev event
  rtnetlink: Do not generate notifications for PRECHANGEUPPER event
  rtnetlink: Do not generate notifications for NOTIFY_PEERS event
  rtnetlink: Do not generate notifications for CHANGELOWERSTATE event

 drivers/net/bonding/bond_options.c |  2 --
 include/linux/netdevice.h  |  1 -
 include/uapi/linux/if_link.h   |  9 -
 net/core/rtnetlink.c   | 36 
 4 files changed, 48 deletions(-)

-- 
2.1.4

Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications

2017-04-07 Thread David Miller

From: Florian Fainelli 
Date: Fri,  7 Apr 2017 13:41:51 -0700

> This patch series does factor the common code found in all tag implementations
> into dsa_switch_rcv(). The original motivation was to add GRO support, but 
> this
> may be a lot of work with unclear benefits at this point.

Let's get the other new DSA driver merged and then you can respin this and
make sure all the new tag drivers get handled properly.

Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi

2017-04-07 Thread Willem de Bruijn

On Fri, Apr 7, 2017 at 5:10 PM, Michael S. Tsirkin  wrote:
> On Fri, Apr 07, 2017 at 04:59:58PM -0400, Willem de Bruijn wrote:
>> On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin  wrote:
>> > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote:
>> >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin  
>> >> wrote:
>> >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote:
>> >> >> From: Willem de Bruijn 
>> >> >>
>> >> >> Amortize the cost of virtual interrupts by doing both rx and tx work
>> >> >> on reception of a receive interrupt if tx napi is enabled. With
>> >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion
>> >> >> interrupts for bidirectional workloads.
>> >> >>
>> >> >> Signed-off-by: Willem de Bruijn 
>> >
>> > This is a popular approach, but I think this will only work well if tx
>> > and rx interrupts are processed on the same CPU and if tx queue is per
>> > cpu.  If they target different CPUs or if tx queue is used from multiple
>> > CPUs they will conflict on the shared locks.
>>
>> Yes. As a result of this discussion I started running a few vcpu affinity 
>> tests.
>>
>> > This can even change dynamically as CPUs/queues are reconfigured.
>> > How about adding a flag and skipping the tx poll if there's no match?
>>
>> I suspect that even with the cache invalidations this optimization
>> will be an improvement over handling all tx interrupts in the tx napi
>> handler. I will get the datapoint for that.
>>
>> That said, we can make this conditional. What flag exactly do you
>> propose? Compare raw_smp_processor_id() in the rx softint with one
>> previously stored in the napi tx callback?
>
> I'm not sure. Another idea is to check vi->affinity_hint_set.
> If set we know rq and sq are on the same CPU.

I was not aware of that flag, thanks. Yes, that looks like it should work.

Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi

2017-04-07 Thread Michael S. Tsirkin

On Fri, Apr 07, 2017 at 04:59:58PM -0400, Willem de Bruijn wrote:
> On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin  wrote:
> > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote:
> >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin  
> >> wrote:
> >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote:
> >> >> From: Willem de Bruijn 
> >> >>
> >> >> Amortize the cost of virtual interrupts by doing both rx and tx work
> >> >> on reception of a receive interrupt if tx napi is enabled. With
> >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion
> >> >> interrupts for bidirectional workloads.
> >> >>
> >> >> Signed-off-by: Willem de Bruijn 
> >
> > This is a popular approach, but I think this will only work well if tx
> > and rx interrupts are processed on the same CPU and if tx queue is per
> > cpu.  If they target different CPUs or if tx queue is used from multiple
> > CPUs they will conflict on the shared locks.
> 
> Yes. As a result of this discussion I started running a few vcpu affinity 
> tests.
> 
> The data is not complete. In particular, I don't have the data yet to
> compare having tx and rx irq on the same cpu (0,0) vs on different
> (0,2) for this patchset. Which is the relevant data to your point.
> 
> Initial results for unmodified upstream driver at {1, 10, 100}x
> TCP_STREAM, for irq cpu affinity (rx,tx). Process is always pinned to cpu
> 1. This is a 4 vcpu system pinned by the host to 4 cores on the same
> socket. The previously reported results were obtained with txq, rtx
> and process on different vcpus (0,2). Running all on the same vcpu
> lower cycle count considerably:
> 
> irq 0,0
> 1throughput_Mbps=29767.14  391,488,924,526  cycles
> 10  throughput_Mbps=40808.64  424,530,251,896  cycles
> 100 throughput_Mbps=33475.13  414,622,071,167  cycles
> 
> irq 0,2
> 1   throughput_Mbps=30176.05  395,673,200,747  cycles
> 10 throughput_Mbps=40729.26  433,948,374,991  cycles
> 100 throughput_Mbps=33758.68 436,291,949,393  cycles
> 
> irq 1,1
> 1throughput_Mbps=26635.20 269,071,002,844  cycles
> 10  throughput_Mbps=42385.05 299,945,944,516  cycles
> 100 throughput_Mbps=33580.98 283,272,895,507  cycles
> 
> With this patch set applied, cpu (1,1)
> 
> 1 throughput_Mbps=34980.76  276,504,805,414  cycles
> 10   throughput_Mbps=42519.92 298,105,889,785  cycles
> 100 throughput_Mbps=35268.86 296,670,598,712  cycles
> 
> I will need to get data for (0,2) vs (0,0).
> 
> > This can even change dynamically as CPUs/queues are reconfigured.
> > How about adding a flag and skipping the tx poll if there's no match?
> 
> I suspect that even with the cache invalidations this optimization
> will be an improvement over handling all tx interrupts in the tx napi
> handler. I will get the datapoint for that.
> 
> That said, we can make this conditional. What flag exactly do you
> propose? Compare raw_smp_processor_id() in the rx softint with one
> previously stored in the napi tx callback?

I'm not sure. Another idea is to check vi->affinity_hint_set.
If set we know rq and sq are on the same CPU.

-- 
MST

Re: [PATCH] net: thunderx: Enable TSO and checksum offloads for ipv6

2017-04-07 Thread David Miller

From: dev.srinivas...@gmail.com
Date: Thu,  6 Apr 2017 16:12:26 +0530

> From: Thanneeru Srinivasulu 
> 
> Adding support for TSO and checksum hardware offloads for ipv6.
> 
> Signed-off-by: Thanneeru Srinivasulu 
> Signed-off-by: Sunil Goutham 

Applied to net-next, thanks.

Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi

2017-04-07 Thread Willem de Bruijn

On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin  wrote:
> On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote:
>> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin  wrote:
>> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote:
>> >> From: Willem de Bruijn 
>> >>
>> >> Amortize the cost of virtual interrupts by doing both rx and tx work
>> >> on reception of a receive interrupt if tx napi is enabled. With
>> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion
>> >> interrupts for bidirectional workloads.
>> >>
>> >> Signed-off-by: Willem de Bruijn 
>
> This is a popular approach, but I think this will only work well if tx
> and rx interrupts are processed on the same CPU and if tx queue is per
> cpu.  If they target different CPUs or if tx queue is used from multiple
> CPUs they will conflict on the shared locks.

Yes. As a result of this discussion I started running a few vcpu affinity tests.

The data is not complete. In particular, I don't have the data yet to
compare having tx and rx irq on the same cpu (0,0) vs on different
(0,2) for this patchset. Which is the relevant data to your point.

Initial results for unmodified upstream driver at {1, 10, 100}x
TCP_STREAM, for irq cpu affinity (rx,tx). Process is always pinned to cpu
1. This is a 4 vcpu system pinned by the host to 4 cores on the same
socket. The previously reported results were obtained with txq, rtx
and process on different vcpus (0,2). Running all on the same vcpu
lower cycle count considerably:

irq 0,0
1throughput_Mbps=29767.14  391,488,924,526  cycles
10  throughput_Mbps=40808.64  424,530,251,896  cycles
100 throughput_Mbps=33475.13  414,622,071,167  cycles

irq 0,2
1   throughput_Mbps=30176.05  395,673,200,747  cycles
10 throughput_Mbps=40729.26  433,948,374,991  cycles
100 throughput_Mbps=33758.68 436,291,949,393  cycles

irq 1,1
1throughput_Mbps=26635.20 269,071,002,844  cycles
10  throughput_Mbps=42385.05 299,945,944,516  cycles
100 throughput_Mbps=33580.98 283,272,895,507  cycles

With this patch set applied, cpu (1,1)

1 throughput_Mbps=34980.76  276,504,805,414  cycles
10   throughput_Mbps=42519.92 298,105,889,785  cycles
100 throughput_Mbps=35268.86 296,670,598,712  cycles

I will need to get data for (0,2) vs (0,0).

> This can even change dynamically as CPUs/queues are reconfigured.
> How about adding a flag and skipping the tx poll if there's no match?

I suspect that even with the cache invalidations this optimization
will be an improvement over handling all tx interrupts in the tx napi
handler. I will get the datapoint for that.

That said, we can make this conditional. What flag exactly do you
propose? Compare raw_smp_processor_id() in the rx softint with one
previously stored in the napi tx callback?

Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications

2017-04-07 Thread Andrew Lunn

> Yes, that's a good point, this just became slightly obsolete since the
> mediatek tag code got included.

Yes, rebase, add the received tags, and Dave can merge it before the
next talk ends :-)

 Andrew

Re: [PATCH net-next 2/3] net: dsa: Move skb_unshare() to dsa_switch_rcv()

2017-04-07 Thread Andrew Lunn

On Fri, Apr 07, 2017 at 01:41:53PM -0700, Florian Fainelli wrote:
> All DSA tag receive functions need to unshare the skb before mangling it, move
> this to the generic dsa_switch_rcv() function which will allow us to make the
> tag receive function return their mangled skb without caring about freeing a
> NULL skb.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 2/3] net: dsa: Move skb_unshare() to dsa_switch_rcv()

2017-04-07 Thread Vivien Didelot

Florian Fainelli  writes:

> All DSA tag receive functions need to unshare the skb before mangling it, move
> this to the generic dsa_switch_rcv() function which will allow us to make the
> tag receive function return their mangled skb without caring about freeing a
> NULL skb.
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot

Re: [PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions

2017-04-07 Thread Vivien Didelot

Florian Fainelli  writes:

> All DSA tag receive functions do strictly the same thing after they have 
> located
> the originating source port from their tag specific protocol:
>
> - push ETH_HLEN bytes
> - set pkt_type to PACKET_HOST
> - call eth_type_trans()
> - bump up counters
> - call netif_receive_skb()
>
> Factor all of that into dsa_switch_rcv(). This also makes us return a pointer 
> to
> a sk_buff, which makes us symetric with the xmit function.
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot

Re: [PATCH net-next 1/3] net: dsa: Do not check for NULL dst in tag parsers

2017-04-07 Thread Vivien Didelot

Florian Fainelli  writes:

> dsa_switch_rcv() already tests for dst == NULL, so there is no need to 
> duplicate
> the same check within the tag receive functions.
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot

Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications

2017-04-07 Thread Florian Fainelli



On 04/07/2017 01:53 PM, Andrew Lunn wrote:
> On Fri, Apr 07, 2017 at 01:41:51PM -0700, Florian Fainelli wrote:
>> Hi all,
>>
>> This patch series does factor the common code found in all tag 
>> implementations
>> into dsa_switch_rcv(). The original motivation was to add GRO support, but 
>> this
>> may be a lot of work with unclear benefits at this point.
>>
>> Florian Fainelli (3):
>>   net: dsa: Do not check for NULL dst in tag parsers
>>   net: dsa: Move skb_unshare() to dsa_switch_rcv()
>>   net: dsa: Factor bottom tag receive functions
> 
> Hey, are you looking in my git repo and stealing my patches? I have a
> pretty much identical set :-)

Actually, I did not at all, good thing we are at the same conference and
not talking about that :)

> 
> I was holding off for the microchip and mediatek drivers to land, so
> to not cause conflicts.

Yes, that's a good point, this just became slightly obsolete since the
mediatek tag code got included.
-- 
Florian

Re: [PATCH net-next 1/3] net: dsa: Do not check for NULL dst in tag parsers

2017-04-07 Thread Andrew Lunn

On Fri, Apr 07, 2017 at 01:41:52PM -0700, Florian Fainelli wrote:
> dsa_switch_rcv() already tests for dst == NULL, so there is no need to 
> duplicate
> the same check within the tag receive functions.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications

2017-04-07 Thread Andrew Lunn

On Fri, Apr 07, 2017 at 01:41:51PM -0700, Florian Fainelli wrote:
> Hi all,
> 
> This patch series does factor the common code found in all tag implementations
> into dsa_switch_rcv(). The original motivation was to add GRO support, but 
> this
> may be a lot of work with unclear benefits at this point.
> 
> Florian Fainelli (3):
>   net: dsa: Do not check for NULL dst in tag parsers
>   net: dsa: Move skb_unshare() to dsa_switch_rcv()
>   net: dsa: Factor bottom tag receive functions

Hey, are you looking in my git repo and stealing my patches? I have a
pretty much identical set :-)

I was holding off for the microchip and mediatek drivers to land, so
to not cause conflicts.

   Andrew

Re: [PATCH net-next v4 0/5] net-next: dsa: add Mediatek MT7530 support

2017-04-07 Thread David Miller

From: 
Date: Fri, 7 Apr 2017 16:45:04 +0800

> MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
> Mediatek router platforms such as MT7623A or MT7623N which includes 7-port
> Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. Among these ports,
> The port from 0 to 4 are the user ports connecting with the remote devices
> while the port 5 and 6 are the CPU ports connecting into Mediatek Ethernet
> GMAC.
> 
> The patch series integrated Mediatek MT7530 into DSA support which
> includes the most of the essential callbacks such as tag insertion for
> port distinguishing, port control, bridge offloading, STP setup and
> ethtool operations to allow DSA to model each user port into independently
> standalone netdevice as the other DSA driver had done.

Series applied, thanks Sean.

Re: [PATCH RFC v2 0/2] phylib non-autoneg speed setting

2017-04-07 Thread Florian Fainelli



On 04/06/2017 03:00 AM, Russell King - ARM Linux wrote:
> This set of two patches fixes and cleans up the phylib code associated
> with selecting the fixed mode.
> 
> phylib currently assumes that all PHYs will support the 10baseT/Half mode
> of operation irrespective of the supported bitmask, because of the way
> phy_find_valid() and phy_find_setting() operate.
> 
> This series resolves that by reimplementing the table lookup, filtering
> out all unsupported modes.  This results in an unsupported speed setting
> always choosing one of the speeds that are supported by the PHY.  Full
> details are in the patch commit log.
> 
> The second patch cleans up the loop in phy_supported_speeds().

This looks totally fine to me, do you mind resubmitting as non RFC?

> 
>  drivers/net/phy/phy.c | 119 
> --
>  1 file changed, 68 insertions(+), 51 deletions(-)
> 
> v1->v2: fixed 0-day build errors
> 

-- 
Florian

[PATCH net-next 2/3] net: dsa: Move skb_unshare() to dsa_switch_rcv()

2017-04-07 Thread Florian Fainelli

All DSA tag receive functions need to unshare the skb before mangling it, move
this to the generic dsa_switch_rcv() function which will allow us to make the
tag receive function return their mangled skb without caring about freeing a
NULL skb.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa.c | 4 
 net/dsa/tag_brcm.c| 5 -
 net/dsa/tag_dsa.c | 5 -
 net/dsa/tag_edsa.c| 5 -
 net/dsa/tag_qca.c | 5 -
 net/dsa/tag_trailer.c | 5 -
 6 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 95d1a756202c..c20da7bfc771 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -903,6 +903,10 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct 
net_device *dev,
return 0;
}
 
+   skb = skb_unshare(skb, GFP_ATOMIC);
+   if (!skb)
+   return 0;
+
return dst->rcv(skb, dev, pt, orig_dev);
 }
 
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index 68d4feef96d4..263941769c88 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -102,10 +102,6 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
ds = dst->cpu_switch;
 
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (skb == NULL)
-   goto out;
-
if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN)))
goto out_drop;
 
@@ -151,7 +147,6 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
 out_drop:
kfree_skb(skb);
-out:
return 0;
 }
 
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 377569c0e4f7..b7032699eaad 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -77,10 +77,6 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
int source_device;
int source_port;
 
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (skb == NULL)
-   goto out;
-
if (unlikely(!pskb_may_pull(skb, DSA_HLEN)))
goto out_drop;
 
@@ -175,7 +171,6 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
 
 out_drop:
kfree_skb(skb);
-out:
return 0;
 }
 
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 30520ff9c9a2..b87009672b40 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -90,10 +90,6 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
int source_device;
int source_port;
 
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (skb == NULL)
-   goto out;
-
if (unlikely(!pskb_may_pull(skb, EDSA_HLEN)))
goto out_drop;
 
@@ -194,7 +190,6 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
 
 out_drop:
kfree_skb(skb);
-out:
return 0;
 }
 
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 6579d6db1bc6..d1324649808c 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -75,10 +75,6 @@ static int qca_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
int port;
__be16 *phdr, hdr;
 
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (!skb)
-   goto out;
-
if (unlikely(!pskb_may_pull(skb, QCA_HDR_LEN)))
goto out_drop;
 
@@ -126,7 +122,6 @@ static int qca_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
 out_drop:
kfree_skb(skb);
-out:
return 0;
 }
 
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index f5c764ee2968..1fc0b221a70f 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -68,10 +68,6 @@ static int trailer_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
ds = dst->cpu_switch;
 
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (skb == NULL)
-   goto out;
-
if (skb_linearize(skb))
goto out_drop;
 
@@ -100,7 +96,6 @@ static int trailer_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
 out_drop:
kfree_skb(skb);
-out:
return 0;
 }
 
-- 
2.9.3

[PATCH net-next 1/3] net: dsa: Do not check for NULL dst in tag parsers

2017-04-07 Thread Florian Fainelli

dsa_switch_rcv() already tests for dst == NULL, so there is no need to duplicate
the same check within the tag receive functions.

Signed-off-by: Florian Fainelli 
---
 net/dsa/tag_brcm.c| 3 ---
 net/dsa/tag_dsa.c | 3 ---
 net/dsa/tag_edsa.c| 3 ---
 net/dsa/tag_qca.c | 3 ---
 net/dsa/tag_trailer.c | 2 --
 5 files changed, 14 deletions(-)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index e2ed6cf68261..68d4feef96d4 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -100,9 +100,6 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
int source_port;
u8 *brcm_tag;
 
-   if (unlikely(dst == NULL))
-   goto out_drop;
-
ds = dst->cpu_switch;
 
skb = skb_unshare(skb, GFP_ATOMIC);
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index e42ba906100c..377569c0e4f7 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -77,9 +77,6 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
int source_device;
int source_port;
 
-   if (unlikely(dst == NULL))
-   goto out_drop;
-
skb = skb_unshare(skb, GFP_ATOMIC);
if (skb == NULL)
goto out;
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 6a9b7a9e4e15..30520ff9c9a2 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -90,9 +90,6 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
int source_device;
int source_port;
 
-   if (unlikely(dst == NULL))
-   goto out_drop;
-
skb = skb_unshare(skb, GFP_ATOMIC);
if (skb == NULL)
goto out;
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 4e0dad759d04..6579d6db1bc6 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -75,9 +75,6 @@ static int qca_tag_rcv(struct sk_buff *skb, struct net_device 
*dev,
int port;
__be16 *phdr, hdr;
 
-   if (unlikely(!dst))
-   goto out_drop;
-
skb = skb_unshare(skb, GFP_ATOMIC);
if (!skb)
goto out;
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index 74c948512550..f5c764ee2968 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -66,8 +66,6 @@ static int trailer_rcv(struct sk_buff *skb, struct net_device 
*dev,
u8 *trailer;
int source_port;
 
-   if (unlikely(dst == NULL))
-   goto out_drop;
ds = dst->cpu_switch;
 
skb = skb_unshare(skb, GFP_ATOMIC);
-- 
2.9.3

[PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions

2017-04-07 Thread Florian Fainelli

All DSA tag receive functions do strictly the same thing after they have located
the originating source port from their tag specific protocol:

- push ETH_HLEN bytes
- set pkt_type to PACKET_HOST
- call eth_type_trans()
- bump up counters
- call netif_receive_skb()

Factor all of that into dsa_switch_rcv(). This also makes us return a pointer to
a sk_buff, which makes us symetric with the xmit function.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h |  2 +-
 net/dsa/dsa.c | 20 +++-
 net/dsa/dsa_priv.h|  5 +++--
 net/dsa/tag_brcm.c| 18 +-
 net/dsa/tag_dsa.c | 18 +-
 net/dsa/tag_edsa.c| 18 +-
 net/dsa/tag_qca.c | 18 +-
 net/dsa/tag_trailer.c | 18 +-
 8 files changed, 48 insertions(+), 69 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index ffe56cc338fe..671a00aa35de 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -123,7 +123,7 @@ struct dsa_switch_tree {
 * protocol to use.
 */
struct net_device   *master_netdev;
-   int (*rcv)(struct sk_buff *skb,
+   struct sk_buff *(*rcv)(struct sk_buff *skb,
   struct net_device *dev,
   struct packet_type *pt,
   struct net_device *orig_dev);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index c20da7bfc771..21f960ef8138 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "dsa_priv.h"
 
@@ -897,6 +898,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct 
net_device *dev,
  struct packet_type *pt, struct net_device *orig_dev)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
+   struct sk_buff *nskb = NULL;
 
if (unlikely(dst == NULL)) {
kfree_skb(skb);
@@ -907,7 +909,23 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct 
net_device *dev,
if (!skb)
return 0;
 
-   return dst->rcv(skb, dev, pt, orig_dev);
+   nskb = dst->rcv(skb, dev, pt, orig_dev);
+   if (!nskb) {
+   kfree_skb(skb);
+   return 0;
+   }
+
+   skb = nskb;
+   skb_push(skb, ETH_HLEN);
+   skb->pkt_type = PACKET_HOST;
+   skb->protocol = eth_type_trans(skb, skb->dev);
+
+   skb->dev->stats.rx_packets++;
+   skb->dev->stats.rx_bytes += skb->len;
+
+   netif_receive_skb(skb);
+
+   return 0;
 }
 
 static struct packet_type dsa_pack_type __read_mostly = {
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 0706a511244e..f4ac8d5fb8d5 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -17,8 +17,9 @@
 
 struct dsa_device_ops {
struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
-   int (*rcv)(struct sk_buff *skb, struct net_device *dev,
-  struct packet_type *pt, struct net_device *orig_dev);
+   struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
+  struct packet_type *pt,
+  struct net_device *orig_dev);
 };
 
 struct dsa_slave_priv {
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index 263941769c88..2a9b52c5af86 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -92,8 +92,9 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, 
struct net_device *dev
return NULL;
 }
 
-static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev,
-   struct packet_type *pt, struct net_device *orig_dev)
+static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, struct net_device 
*dev,
+   struct packet_type *pt,
+   struct net_device *orig_dev)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
struct dsa_switch *ds;
@@ -133,21 +134,12 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
skb->data - ETH_HLEN - BRCM_TAG_LEN,
2 * ETH_ALEN);
 
-   skb_push(skb, ETH_HLEN);
-   skb->pkt_type = PACKET_HOST;
skb->dev = ds->ports[source_port].netdev;
-   skb->protocol = eth_type_trans(skb, skb->dev);
 
-   skb->dev->stats.rx_packets++;
-   skb->dev->stats.rx_bytes += skb->len;
-
-   netif_receive_skb(skb);
-
-   return 0;
+   return skb;
 
 out_drop:
-   kfree_skb(skb);
-   return 0;
+   return NULL;
 }
 
 const struct dsa_device_ops brcm_netdev_ops = {
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index b7032699eaad..1c6633f0de01 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -68,8 +68,9 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NULL;
 }
 
-static int dsa_rcv(struct sk_buff *skb, struct net_device *dev,
-  struct pa

[PATCH net-next 0/3] net: dsa: Receive path simplifications

2017-04-07 Thread Florian Fainelli

Hi all,

This patch series does factor the common code found in all tag implementations
into dsa_switch_rcv(). The original motivation was to add GRO support, but this
may be a lot of work with unclear benefits at this point.

Florian Fainelli (3):
  net: dsa: Do not check for NULL dst in tag parsers
  net: dsa: Move skb_unshare() to dsa_switch_rcv()
  net: dsa: Factor bottom tag receive functions

 include/net/dsa.h |  2 +-
 net/dsa/dsa.c | 24 +++-
 net/dsa/dsa_priv.h|  5 +++--
 net/dsa/tag_brcm.c| 26 +-
 net/dsa/tag_dsa.c | 26 +-
 net/dsa/tag_edsa.c| 26 +-
 net/dsa/tag_qca.c | 26 +-
 net/dsa/tag_trailer.c | 25 +
 8 files changed, 52 insertions(+), 108 deletions(-)

-- 
2.9.3

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 12:55 -0700, David Miller wrote:
> 
> One idea.  We use the macro thing to generate a "netlink.pot" file
> and then some userland tree can contain the latest netlink.pot and
> the translations.

Right. For the record - since we just talked about it - I was thinking
of putting it into a special section so that we could easily write a
script to generate the pot file from the kernel binary after doing
"allyesconfig" or collecting it from modules or so.

Thinking of modules though - do we need to adjust the module loading
code to load a new section? I'll have to look into that.

johannes

[vhost:linux-next 13/26] drivers//virtio/virtio_pci_common.c:186:7: error: too few arguments to function 'vp_dev->setup_vq'

2017-04-07 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
head:   4293ed1476ec42e45e54f812341058d812d820a5
commit: d5edad95c2f89cced19a23713f752442b620f0e1 [13/26] virtio: add context 
flag to find vqs
config: x86_64-randconfig-x012-201714 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout d5edad95c2f89cced19a23713f752442b620f0e1
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers//virtio/virtio_pci_common.c: In function 'vp_setup_vq':
>> drivers//virtio/virtio_pci_common.c:186:7: error: too few arguments to 
>> function 'vp_dev->setup_vq'
 vq = vp_dev->setup_vq(vp_dev, info, index, callback, name,
  ^~
   drivers//virtio/virtio_pci_common.c: In function 'vp_find_vqs_msix':
>> drivers//virtio/virtio_pci_common.c:318:12: error: too many arguments to 
>> function 'vp_setup_vq'
  vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
   ^~~
   drivers//virtio/virtio_pci_common.c:172:26: note: declared here
static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned 
index,
 ^~~
   drivers//virtio/virtio_pci_common.c: In function 'vp_find_vqs_intx':
   drivers//virtio/virtio_pci_common.c:371:12: error: too many arguments to 
function 'vp_setup_vq'
  vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
   ^~~
   drivers//virtio/virtio_pci_common.c:172:26: note: declared here
static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned 
index,
 ^~~

vim +186 drivers//virtio/virtio_pci_common.c

f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  180 
unsigned long flags;
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  181 
 
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  182 
/* fill out our structure that represents an active queue */
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  183 
if (!info)
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  184 
return ERR_PTR(-ENOMEM);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  185 
 
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 @186 
vq = vp_dev->setup_vq(vp_dev, info, index, callback, name,
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  187 
  msix_vec);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  188 
if (IS_ERR(vq))
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  189 
goto out_info;
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  190 
 
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  191 
info->vq = vq;
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  192 
if (callback) {
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  193 
spin_lock_irqsave(&vp_dev->lock, flags);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  194 
list_add(&info->node, &vp_dev->virtqueues);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  195 
spin_unlock_irqrestore(&vp_dev->lock, flags);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  196 
} else {
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  197 
INIT_LIST_HEAD(&info->node);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  198 
}
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  199 
 
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  200 
vp_dev->vqs[index] = info;
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  201 
return vq;
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  202 
 
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  203 
 out_info:
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  204 
kfree(info);
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  205 
return vq;
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  206 
 }
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  207 
 
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  208 
 static void vp_del_vq(struct virtqueue *vq)
f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04  209 
 {
f6813487d drivers/virtio/virtio_pci_common.c

Re: [PATCH v2 3/4] bluetooth: hci_uart: add LL protocol serdev driver support

2017-04-07 Thread Dan Williams

On Fri, 2017-04-07 at 13:48 -0500, Rob Herring wrote:
> On Fri, Apr 7, 2017 at 12:09 PM, Dan Williams 
> wrote:
> > On Fri, 2017-04-07 at 09:35 -0500, Rob Herring wrote:
> > > Turns out that the LL protocol and the TI-ST are the same thing
> > > AFAICT.
> > > The TI-ST adds firmware loading, GPIO control, and shared access
> > > for
> > > NFC, FM radio, etc. For now, we're only implementing what is
> > > needed
> > > for
> > > BT. This mirrors other drivers like BCM and Intel, but uses the
> > > new
> > > serdev bus.
> > > 
> > > The firmware loading is greatly simplified by using existing
> > > infrastructure to send commands. It may be a bit slower than the
> > > original code using synchronous functions, but the real
> > > bottleneck is
> > > likely doing firmware load at 115.2kbps.
> > 
> > Is there no way to put the TI-specific stuff into a TI UART module
> > rather than building it into the generic one?
> 
> In case it's not clear, all of HCI_LL is the TI specific part, not
> just what I'm adding. So you are talking about putting each UART BT
> protocol into a separate module. I'd assume that is doable, but seems
> orthogonal to this patch set. I'd also assume there was some reason
> that was not done already.

Ok, thanks for the explanation.  Wasn't clear at all from the file
paths in the source tree; it looked generic.

Dan

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Johannes Berg 
Date: Fri, 07 Apr 2017 21:46:46 +0200

> On Fri, 2017-04-07 at 12:43 -0700, David Miller wrote:
>> I suspect that someone will eventually give us a hard time about the
>> strings wrt. internationalization. :-) It's solvable at least
>> partially in userspace I suppose.
> 
> I tend to think of the strings more of a debugging aid, but that's a
> good point.
> 
> Perhaps we should have a macro that has the strings - similar to the
> inline function I put into netlink - but if we make that a macro we can
> put the strings into a separate section to be able to find them more
> easily for later translation, and optionally even omit them entirely
> (to satisfy the kernel tinification folks)?

One idea.  We use the macro thing to generate a "netlink.pot" file and
then some userland tree can contain the latest netlink.pot and the
translations.

I'm suggesting it this way because whilst putting the netlink.pot file
in the kernel tree is probably OK, putting all the translation there
might not be.

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 21:45 +0200, Pablo Neira Ayuso wrote:
> 
> We only need to store the pointer to the attribute in the error
> container structure. We can calculate the offset from nl_err() by
> pasing the skbuff as parameter there, right?

Ah, that's a good point, we could store the nlattr pointer and
calculate the offset later. I'll need to try this, but yeah, good idea!

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 12:43 -0700, David Miller wrote:
> 
> I have no strong opinions about string length.
> 
> In my opinion, I would like to believe that if someone tried to get a
> networking patch applied that emitted a rediculously long string then
> we would give them feedback about how that is not acceptable.  Right?

Agree with this, I don't really see much point in adding an extra
warning. There's an implicit (multi-KB though) limit as well in the
message size ;-)

> I suspect that someone will eventually give us a hard time about the
> strings wrt. internationalization. :-) It's solvable at least
> partially in userspace I suppose.

I tend to think of the strings more of a debugging aid, but that's a
good point.

Perhaps we should have a macro that has the strings - similar to the
inline function I put into netlink - but if we make that a macro we can
put the strings into a separate section to be able to find them more
easily for later translation, and optionally even omit them entirely
(to satisfy the kernel tinification folks)?

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Pablo Neira Ayuso

On Fri, Apr 07, 2017 at 09:29:17PM +0200, Johannes Berg wrote:
> On Fri, 2017-04-07 at 21:21 +0200, Pablo Neira Ayuso wrote:
> > I think the most flexible way is to pass the container error
> > structure to nla_parse() so it sets this for you. This would also
> > save tons of "malformed attribute" strings.
> 
> Yes, for sure. The thing is we'll probalby have to pass down the
> request skb *and* the error struct so that we can get the offset, and
> this seems like the generic thing that we really should try to get the
> most information generated.

We only need to store the pointer to the attribute in the error
container structure. We can calculate the offset from nl_err() by
pasing the skbuff as parameter there, right?

Re: [PATCH net-next v4 1/5] dt-bindings: net: dsa: add Mediatek MT7530 binding

2017-04-07 Thread Florian Fainelli



On 04/07/2017 01:45 AM, sean.w...@mediatek.com wrote:
> From: Sean Wang 
> 
> Add device-tree binding for Mediatek MT7530 switch.
> 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Sean Wang 
> Acked-by: Rob Herring 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next v4 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-04-07 Thread Florian Fainelli



On 04/07/2017 01:45 AM, sean.w...@mediatek.com wrote:
> From: Sean Wang 
> 
> MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
> Mediatek router platforms such as MT7623A or MT7623N platform which
> includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY.
> Among these ports, The port from 0 to 4 are the user ports connecting
> with the remote devices while the port 5 and 6 are the CPU ports
> connecting into Mediatek Ethernet GMAC.
> 
> For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC
> through either the TRGMII or RGMII which could be controlled by phy-mode
> in the dt-bindings to specify which mode is preferred to use. And for
> port 5, only RGMII can be specified. However, currently, only port 6 is
> being supported in this DSA driver.
> 
> The driver is made with the reference to qca8k and other existing DSA
> driver. The most of the essential callbacks of the DSA are already
> support in the driver, including tag insert for user port distinguishing,
> port control, bridge offloading, STP setup and ethtool operation to allow
> DSA to model each user port into a standalone netdevice as the other DSA
> driver had done.
> 
> Signed-off-by: Sean Wang 
> Signed-off-by: Landen Chao 

Reviewed-by: Florian Fainelli 
-- 
Florian

[RFC] netlink: send exterr cookie on success

2017-04-07 Thread Johannes Berg

From: Johannes Berg 

This is for Jamal, it allows the subsystem to return an arbitrary
cookie to identify a new object/operation, perhaps to delete or
cancel it later.

This is actually something we use a lot in wifi too, though I'm
not sure how we could do the backport necessary for that (since
we sadly need to ship to all kinds of old kernels)

Also contains the missing (err) check I pointed out - I'll squash
this patch into the first of the exterr after I test it.

Signed-off-by: Johannes Berg 
---
 include/linux/netlink.h  |  2 ++
 include/uapi/linux/netlink.h |  4 
 net/netlink/af_netlink.c | 28 +++-
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 662f343dc68b..76a0b0a22349 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -67,6 +67,8 @@ struct netlink_ext_err {
u32 ext_code;
u32 msg_offset;
u16 attr;
+   const u8 *cookie;
+   u8 cookie_len;
 };
 
 extern void netlink_kernel_release(struct sock *sk);
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 0ef970e6f9f5..72a398e0f332 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -119,6 +119,9 @@ struct nlmsgerr {
  * @NLMSGERR_ATTR_CODE: extended per-subsystem error code (u32)
  * @NLMSGERR_ATTR_ATTR: top-level attribute that caused the error
  * (or is missing, u16)
+ * @NLMSGERR_ATTR_COOKIE: arbitrary subsystem specific cookie to
+ * be used - in the success case - to identify a created
+ * object or operation or similar (binary)
  */
 enum nlmsgerr_attrs {
NLMSGERR_ATTR_UNUSED,
@@ -126,6 +129,7 @@ enum nlmsgerr_attrs {
NLMSGERR_ATTR_OFFS,
NLMSGERR_ATTR_CODE,
NLMSGERR_ATTR_ATTR,
+   NLMSGERR_ATTR_COOKIE,
 };
 
 #define NETLINK_ADD_MEMBERSHIP 1
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 532d906af3be..6b256e540bc1 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2277,6 +2277,9 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr 
*nlh, int err,
if (exterr && exterr->attr)
payload += nla_total_size(sizeof(u16));
}
+   } else if (nlk->flags & NETLINK_F_EXT_ACK) {
+   if (exterr && exterr->cookie_len)
+   payload += nla_total_size(exterr->cookie_len);
}
 
skb = nlmsg_new(payload, GFP_KERNEL);
@@ -2301,15 +2304,22 @@ void netlink_ack(struct sk_buff *in_skb, struct 
nlmsghdr *nlh, int err,
memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : 
sizeof(*nlh));
 
if (nlk->flags & NETLINK_F_EXT_ACK) {
-   if (exterr && exterr->msg)
-   WARN_ON(nla_put_string(skb, NLMSGERR_ATTR_MSG,
-  exterr->msg));
-   if (exterr && exterr->msg_offset)
-   WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS,
-   exterr->msg_offset));
-   if (exterr && exterr->attr)
-   WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR,
-   exterr->attr));
+   if (err) {
+   if (exterr && exterr->msg)
+   WARN_ON(nla_put_string(skb, NLMSGERR_ATTR_MSG,
+  exterr->msg));
+   if (exterr && exterr->msg_offset)
+   WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS,
+   exterr->msg_offset));
+   if (exterr && exterr->attr)
+   WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR,
+   exterr->attr));
+   } else {
+   if (exterr && exterr->attr)
+   WARN_ON(nla_put(skb, NLMSGERR_ATTR_COOKIE,
+   exterr->cookie_len,
+   exterr->cookie));
+   }
}
 
netlink_unicast(in_skb->sk, skb, NETLINK_CB(in_skb).portid, 
MSG_DONTWAIT);
-- 
2.11.0

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri, 7 Apr 2017 21:35:50 +0200

> On Fri, Apr 07, 2017 at 12:20:53PM -0700, David Miller wrote:
> [...]
>> Let's just discuss the UAPI, since people complain we don't talk
>> about that enough :-)  For those playing at home it is three new
>> attributes returned in a netlink ACK when the application asks
>> for the extended response:
>> 
>>  NLMSGERR_ATTR_MSG   string  Extended error string
>>  NLMSGERR_ATTR_OFFS  u32 Byte offset to netlink element causing 
>> error
>>  NLMSGERR_ATTR_CODE  u32 Subsystem specific error code
>>  NLMSGERR_ATTR_ATTR  u16 Netlink attribute triggering error or 
>> missing
> 
> I think it would be good if we get a definition to cap the maximum
> string length to something reasonable? This can be added in a follow
> up patch BTW. Thus, we get people coming back to us and request larger
> strings with a reason why they need more room for this.
> 
> In general, my main concern with strings is that they can be used in a
> more freely way than these u32 offsets and error codes, and
> specifically how inconsistent these string will look like across
> different netlink subsystems.
> 
> Anyway, as long as this is optional (not every subsystem if forced to
> use strings) I'm fine with it :).

I have no strong opinions about string length.

In my opinion, I would like to believe that if someone tried to get a
networking patch applied that emitted a rediculously long string then
we would give them feedback about how that is not acceptable.  Right?

I suspect that someone will eventually give us a hard time about the
strings wrt. internationalization. :-) It's solvable at least
partially in userspace I suppose.

Re: [RFC 1/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 20:26 +0200, Johannes Berg wrote:
> 
> + if (nlk->flags & NETLINK_F_EXT_ACK) {
> + if (exterr && exterr->msg)
> + WARN_ON(nla_put_string(skb,
> NLMSGERR_ATTR_MSG,
> +    exterr->msg));
> + if (exterr && exterr->msg_offset)
> + WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS,
> + exterr->msg_offset));
> + if (exterr && exterr->attr)
> + WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR,
> + exterr->attr));
> + }

I forgot to check (err != 0) here, which can cause inconsistencies -
noticed that while just adding the error case Jamal wanted. I'll send
out that as a separate patch, and squash it later when I resubmit.

johannes

Re: [PATCH net-next v4 1/5] dt-bindings: net: dsa: add Mediatek MT7530 binding

2017-04-07 Thread Andrew Lunn

On Fri, Apr 07, 2017 at 04:45:05PM +0800, sean.w...@mediatek.com wrote:
> From: Sean Wang 
> 
> Add device-tree binding for Mediatek MT7530 switch.
> 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Sean Wang 
> Acked-by: Rob Herring 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v4 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-04-07 Thread Andrew Lunn

On Fri, Apr 07, 2017 at 04:45:09PM +0800, sean.w...@mediatek.com wrote:
> From: Sean Wang 
> 
> MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
> Mediatek router platforms such as MT7623A or MT7623N platform which
> includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY.
> Among these ports, The port from 0 to 4 are the user ports connecting
> with the remote devices while the port 5 and 6 are the CPU ports
> connecting into Mediatek Ethernet GMAC.
> 
> For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC
> through either the TRGMII or RGMII which could be controlled by phy-mode
> in the dt-bindings to specify which mode is preferred to use. And for
> port 5, only RGMII can be specified. However, currently, only port 6 is
> being supported in this DSA driver.
> 
> The driver is made with the reference to qca8k and other existing DSA
> driver. The most of the essential callbacks of the DSA are already
> support in the driver, including tag insert for user port distinguishing,
> port control, bridge offloading, STP setup and ethtool operation to allow
> DSA to model each user port into a standalone netdevice as the other DSA
> driver had done.
> 
> Signed-off-by: Sean Wang 
> Signed-off-by: Landen Chao 

Hi Sean

This is looking good now. Thanks

Reviewed-by: Andrew Lunn 

Andrew

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Pablo Neira Ayuso

On Fri, Apr 07, 2017 at 12:20:53PM -0700, David Miller wrote:
[...]
> Let's just discuss the UAPI, since people complain we don't talk
> about that enough :-)  For those playing at home it is three new
> attributes returned in a netlink ACK when the application asks
> for the extended response:
> 
>   NLMSGERR_ATTR_MSG   string  Extended error string
>   NLMSGERR_ATTR_OFFS  u32 Byte offset to netlink element causing 
> error
>   NLMSGERR_ATTR_CODE  u32 Subsystem specific error code
>   NLMSGERR_ATTR_ATTR  u16 Netlink attribute triggering error or 
> missing

I think it would be good if we get a definition to cap the maximum
string length to something reasonable? This can be added in a follow
up patch BTW. Thus, we get people coming back to us and request larger
strings with a reason why they need more room for this.

In general, my main concern with strings is that they can be used in a
more freely way than these u32 offsets and error codes, and
specifically how inconsistent these string will look like across
different netlink subsystems.

Anyway, as long as this is optional (not every subsystem if forced to
use strings) I'm fine with it :).

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri, 7 Apr 2017 21:27:14 +0200

> On Fri, Apr 07, 2017 at 12:22:23PM -0700, David Miller wrote:
>> From: Johannes Berg 
>> Date: Fri, 07 Apr 2017 21:09:45 +0200
>> 
>> > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote:
>> >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote:
>> >> [...]
>> >> > Heh. I think I really want to solve - at least partially -
>> >> > nla_parse()
>> >> > to see that it can be done this way. It'd be nice to even transform
>> >> > all
>> >> > the callers (I generated half of these patches with spatch anyway)
>> >> > to
>> >> > have at least that.
>> >> 
>> >> We can just have a modified version of nla_parse that deals with
>> >> this.
>> > 
>> > Yes, but we need to figure out a good way to have the offset.
>> > 
>> > We also need to see if we want to *force* having the offset. In some
>> > sense that'd be useful, in another it might be very complicated to fill
>> > it in at all times, if for example errors come from lower layers like
>> > drivers.
>> 
>> It has to be optional, some kinds of errors don't have an exact
>> context per-se.
>> 
>> Also another way to look at this is that we're providing a lot of
>> new power and expressability.  So even if only one aspect of the
>> new error reporting is used it's a positive step forward.
>> 
>> So allow offset "0" meaning "unspecified".
> 
> Instead, we can just not send the offset attribute to userspace if
> it's not specified. So missing attribute means "unspecified".

Agreed, not providing the attribute to indicate this is fine.

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri, 7 Apr 2017 21:21:34 +0200

> For my usecases in netfilter, the attributes and an specific error
> code should be enough to figure out what is wrong. Will not need
> strings.
> 
> BTW, we may not have an offset, eg. EINVAL because of missing
> attribute. Given we have different requirements, I would leave it to
> each subsystem to decide what netlink error attributes are specified.

Yep, completely agreed.

The use cases for offset and missing attribute I see as follows:

1) Top-level attribute is missing. Here, offset is set to zero
   and the missing attribute number is given as well.

2) Nested attribute is missing.  Offset is set to the location of the
   beginning of nesting, inside of which the missing attribute was
   needed.

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 21:27 +0200, Pablo Neira Ayuso wrote:
> 
> > Also another way to look at this is that we're providing a lot of
> > new power and expressability.  So even if only one aspect of the
> > new error reporting is used it's a positive step forward.

True.

> > So allow offset "0" meaning "unspecified".
> 
> Instead, we can just not send the offset attribute to userspace if
> it's not specified. So missing attribute means "unspecified".
> 
> I'm always a bit worried this "0 means something" semantics :)

Yeah, I have that. If it's 0 internally the attribute will be omitted.

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 21:21 +0200, Pablo Neira Ayuso wrote:
> 
> For my usecases in netfilter, the attributes and an specific error
> code should be enough to figure out what is wrong. Will not need
> strings.

Fair enough.

> BTW, we may not have an offset, eg. EINVAL because of missing
> attribute. Given we have different requirements, I would leave it to
> each subsystem to decide what netlink error attributes are specified.
> 
> > (It's ultimately always going to be optional since we'll continue
> > returning errors without *any* extended error information - likely
> > indefinitely - but if we have a wrong attribute, should we always
> > have
> > an offset? Would be nice, but could be difficult in practice)
> > 
> > > We can probably use struct nla_policy to place the extended error
> > > code or the string (as we discussed I don't need string errors,
> > > but
> > > I'm fine if other people find it useful).
> > 
> > I don't think for the error strings really would be useful for
> > nla_parse() or a policy - we can return something generic like
> > "malformed attribute" there as a string, and hopefully point to the
> > attribute/offset from there generically. I just really want to see
> > how
> > to actually do that.
> 
> I think the most flexible way is to pass the container error
> structure to nla_parse() so it sets this for you. This would also
> save tons of "malformed attribute" strings.

Yes, for sure. The thing is we'll probalby have to pass down the
request skb *and* the error struct so that we can get the offset, and
this seems like the generic thing that we really should try to get the
most information generated.

johannes

Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi

2017-04-07 Thread Michael S. Tsirkin

On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote:
> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin  wrote:
> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote:
> >> From: Willem de Bruijn 
> >>
> >> Amortize the cost of virtual interrupts by doing both rx and tx work
> >> on reception of a receive interrupt if tx napi is enabled. With
> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion
> >> interrupts for bidirectional workloads.
> >>
> >> Signed-off-by: Willem de Bruijn 

This is a popular approach, but I think this will only work well if tx
and rx interrupts are processed on the same CPU and if tx queue is per
cpu.  If they target different CPUs or if tx queue is used from multiple
CPUs they will conflict on the shared locks.

This can even change dynamically as CPUs/queues are reconfigured.
How about adding a flag and skipping the tx poll if there's no match?

-- 
MST

Re: [RFC 2/3] genetlink: pass extended error report down

2017-04-07 Thread Ben Greear


On 04/07/2017 12:12 PM, Johannes Berg wrote:

On Fri, 2017-04-07 at 11:37 -0700, Ben Greear wrote:


I guess the error string must be constant and always available in
memory in this implementation?


Yes.


I think it would be nice to dynamically create strings (malloc,
snprintf, etc) and have the err_str logic free it when done?


We can think about that later, but I don't actually think it makes a
lot of sense - if we point to the attribute and/or offset you really
ought to have enough info to figure out what's up.


We can think about it later, but lots of things in the wifi stack
could use a descriptive message specific to the failure.  Often these
messages are much more useful if you explain why the failure conflicts
with regulatory, channel, virtual-dev combination, etc info, so that needs
to be dynamic.  The code that is failing knows, so I'd like to pass it
back to user-space.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Pablo Neira Ayuso

On Fri, Apr 07, 2017 at 12:22:23PM -0700, David Miller wrote:
> From: Johannes Berg 
> Date: Fri, 07 Apr 2017 21:09:45 +0200
> 
> > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote:
> >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote:
> >> [...]
> >> > Heh. I think I really want to solve - at least partially -
> >> > nla_parse()
> >> > to see that it can be done this way. It'd be nice to even transform
> >> > all
> >> > the callers (I generated half of these patches with spatch anyway)
> >> > to
> >> > have at least that.
> >> 
> >> We can just have a modified version of nla_parse that deals with
> >> this.
> > 
> > Yes, but we need to figure out a good way to have the offset.
> > 
> > We also need to see if we want to *force* having the offset. In some
> > sense that'd be useful, in another it might be very complicated to fill
> > it in at all times, if for example errors come from lower layers like
> > drivers.
> 
> It has to be optional, some kinds of errors don't have an exact
> context per-se.
> 
> Also another way to look at this is that we're providing a lot of
> new power and expressability.  So even if only one aspect of the
> new error reporting is used it's a positive step forward.
> 
> So allow offset "0" meaning "unspecified".

Instead, we can just not send the offset attribute to userspace if
it's not specified. So missing attribute means "unspecified".

I'm always a bit worried this "0 means something" semantics :)

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 12:20 -0700, David Miller wrote:

> But what it lacks fundamentally is context.  Therefore it can't be
> used to provide the offset or the bad attribute number.  So it can't
> meet our requirements.

Yes, it doesn't address the requirements here, and in a sense I suspect
this will be true anywhere else too - perhaps that's the reason why it
was never applied even over in perf. I've not gotten a good answer for
that question yet :)

> We've passed down new arguments into core method call chains for much
> less useful facilitites than this.  So that doesn't bother me.  And
> as that is a kernel internal matter, we can refine it later.

Yes, both are true.

> Wrt. nla_parse(), we can solve that problem as follow-on patches too.

Yes. I just want to be sure we can solve it, but OTOH we're not locking
down anything but the UAPI so whatever way we find to solve it, even if
it requires a complete rearchitecture internally, it still doesn't
matter.

> So I consider this change ready as far as the implementation is
> concerned.

Fair enough. I still do need to test it though :)

> Ok, Jiri, start reading.  I will try to make you happy here.
> 
> Let's just discuss the UAPI, since people complain we don't talk
> about that enough :-)  For those playing at home it is three new
> attributes returned in a netlink ACK when the application asks
> for the extended response:
> 
> NLMSGERR_ATTR_MSG   string  Extended error string
> NLMSGERR_ATTR_OFFS  u32 Byte offset to netlink
> element causing error
> NLMSGERR_ATTR_CODE  u32 Subsystem specific error code
> NLMSGERR_ATTR_ATTR  u16 Netlink attribute triggering
> error or missing

That's four ;-)

Since you're carefully listing it here, you should also say that

 * the new behaviour is entirely optional, enabled by a new netlink
   socket option
 * the ACK message format ends up being
   [nlmsg header]
   [ack header including errno]
   [request message nlmsg hdr]
   [request message payload - optional, controlled by socket option,
aligned to 4 byte length if present]
   [new TLVs - optional, controlled by socket option]

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Johannes Berg 
Date: Fri, 07 Apr 2017 21:09:45 +0200

> On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote:
>> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote:
>> [...]
>> > Heh. I think I really want to solve - at least partially -
>> > nla_parse()
>> > to see that it can be done this way. It'd be nice to even transform
>> > all
>> > the callers (I generated half of these patches with spatch anyway)
>> > to
>> > have at least that.
>> 
>> We can just have a modified version of nla_parse that deals with
>> this.
> 
> Yes, but we need to figure out a good way to have the offset.
> 
> We also need to see if we want to *force* having the offset. In some
> sense that'd be useful, in another it might be very complicated to fill
> it in at all times, if for example errors come from lower layers like
> drivers.

It has to be optional, some kinds of errors don't have an exact
context per-se.

Also another way to look at this is that we're providing a lot of
new power and expressability.  So even if only one aspect of the
new error reporting is used it's a positive step forward.

So allow offset "0" meaning "unspecified".

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Johannes Berg 
Date: Fri, 07 Apr 2017 20:59:12 +0200

> Alexander Shishkin's patch
> https://patchwork.kernel.org/patch/7162421/
> 
> was nice in a way because you could get away without passing the error
> structure down all the time, but like I said it doesn't deal with
> dynamic errors (even the offset/attr from nla_parse) and if it's not
> done *everywhere* it risks leaking those exterr numbers (-1024..-4095)
> to userspace through syscalls if it's used in a non-netlink path for
> some reason (e.g. a single driver call being called through both
> netlink and ioctl, and trying to return an extended error)

Alexander's patch is cute in that it annotates errors using the special
section blobs.

But what it lacks fundamentally is context.  Therefore it can't be
used to provide the offset or the bad attribute number.  So it can't
meet our requirements.

We've passed down new arguments into core method call chains for much
less useful facilitites than this.  So that doesn't bother me.  And
as that is a kernel internal matter, we can refine it later.

Wrt. nla_parse(), we can solve that problem as follow-on patches too.

So I consider this change ready as far as the implementation is
concerned.

Ok, Jiri, start reading.  I will try to make you happy here.

Let's just discuss the UAPI, since people complain we don't talk
about that enough :-)  For those playing at home it is three new
attributes returned in a netlink ACK when the application asks
for the extended response:

NLMSGERR_ATTR_MSG   string  Extended error string
NLMSGERR_ATTR_OFFS  u32 Byte offset to netlink element causing 
error
NLMSGERR_ATTR_CODE  u32 Subsystem specific error code
NLMSGERR_ATTR_ATTR  u16 Netlink attribute triggering error or 
missing

So please consider this interface carefully.

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Pablo Neira Ayuso

On Fri, Apr 07, 2017 at 09:09:45PM +0200, Johannes Berg wrote:
> On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote:
> > On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote:
> > [...]
> > > Heh. I think I really want to solve - at least partially -
> > > nla_parse()
> > > to see that it can be done this way. It'd be nice to even transform
> > > all
> > > the callers (I generated half of these patches with spatch anyway)
> > > to
> > > have at least that.
> > 
> > We can just have a modified version of nla_parse that deals with
> > this.
> 
> Yes, but we need to figure out a good way to have the offset.
> 
> We also need to see if we want to *force* having the offset. In some
> sense that'd be useful, in another it might be very complicated to fill
> it in at all times, if for example errors come from lower layers like
> drivers.

For my usecases in netfilter, the attributes and an specific error
code should be enough to figure out what is wrong. Will not need
strings.

BTW, we may not have an offset, eg. EINVAL because of missing
attribute. Given we have different requirements, I would leave it to
each subsystem to decide what netlink error attributes are specified.

> (It's ultimately always going to be optional since we'll continue
> returning errors without *any* extended error information - likely
> indefinitely - but if we have a wrong attribute, should we always have
> an offset? Would be nice, but could be difficult in practice)
> 
> > We can probably use struct nla_policy to place the extended error
> > code or the string (as we discussed I don't need string errors, but
> > I'm fine if other people find it useful).
> 
> I don't think for the error strings really would be useful for
> nla_parse() or a policy - we can return something generic like
> "malformed attribute" there as a string, and hopefully point to the
> attribute/offset from there generically. I just really want to see how
> to actually do that.

I think the most flexible way is to pass the container error structure
to nla_parse() so it sets this for you. This would also save tons of
"malformed attribute" strings.

Re: [RFC 2/3] genetlink: pass extended error report down

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 11:37 -0700, Ben Greear wrote:
> 
> I guess the error string must be constant and always available in
> memory in this implementation?

Yes.

> I think it would be nice to dynamically create strings (malloc,
> snprintf, etc) and have the err_str logic free it when done?

We can think about that later, but I don't actually think it makes a
lot of sense - if we point to the attribute and/or offset you really
ought to have enough info to figure out what's up.

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote:
> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote:
> [...]
> > Heh. I think I really want to solve - at least partially -
> > nla_parse()
> > to see that it can be done this way. It'd be nice to even transform
> > all
> > the callers (I generated half of these patches with spatch anyway)
> > to
> > have at least that.
> 
> We can just have a modified version of nla_parse that deals with
> this.

Yes, but we need to figure out a good way to have the offset.

We also need to see if we want to *force* having the offset. In some
sense that'd be useful, in another it might be very complicated to fill
it in at all times, if for example errors come from lower layers like
drivers.

(It's ultimately always going to be optional since we'll continue
returning errors without *any* extended error information - likely
indefinitely - but if we have a wrong attribute, should we always have
an offset? Would be nice, but could be difficult in practice)

> We can probably use struct nla_policy to place the extended error
> code or the string (as we discussed I don't need string errors, but
> I'm fine if other people find it useful).

I don't think for the error strings really would be useful for
nla_parse() or a policy - we can return something generic like
"malformed attribute" there as a string, and hopefully point to the
attribute/offset from there generically. I just really want to see how
to actually do that.

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 21:02 +0200, Pablo Neira Ayuso wrote:
> 
> > I'm tempted to apply this as-is just to show that person that
> > things do in fact happen eventually :-)
> 
> We can just send follow up patches to refine, I think it's a good
> start, Johannes?

I guess we can. Like I just said though - I do have a plan to make
nla_parse() work but the week's been exhausting enough that I don't
feel entirely comfortable saying that will definitely work.

OTOH, arguably this is something that seems to work - as evidenced by
the nl80211 changes - for at least some error reporting, so I guess we
can continue to refine it, and the only thing we're really locking in
to is the ABI, which is TLVs now, so that should be safe.

That said, the nl80211 patch really should be bigger, and I've not even
tested this at all yet, so I really need to do that first.

Also, I'd like to point to the genl_info change, does that seem like a
reasonable way to pass it around in genetlink? I'm much less familiar
with the "regular" netlink to say how to pass it around there beyond
what I already did.

johannes

Re: [Intel-wired-lan] [PATCH v2] igb: Allow to remove administratively set MAC on VFs

2017-04-07 Thread Jeff Kirsher

On Wed, 2017-04-05 at 15:46 +0200, Corinna Vinschen wrote:
>   Before libvirt modifies the MAC address and vlan tag for an SRIOV
> VF
>   for use by a virtual machine (either using vfio device assignment
> or
>   macvtap passthru mode), it saves the current MAC address and vlan
> tag
>   so that it can reset them to their original value when the guest is
>   done.  Libvirt can't leave the VF MAC set to the value used by the
>   now-defunct guest since it may be started again later using a
>   different VF, but it certainly shouldn't just pick any random
> value,
>   either. So it saves the state of everything prior to using the VF,
> and
>   resets it to that.
> 
>   The igb driver initializes the MAC addresses of all VFs to
>   00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
>   netlink message, also visible in the list of VFs in the output of
> "ip
>   link show"). But when libvirt attempts to restore the MAC address
> back
>   to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the
> kernel
>   responds with "Invalid argument".
> 
>   Forbidding a reset back to the original value leaves the VF MAC at
> the
>   value set for the now-defunct virtual machine. Especially on a
> system
>   with NetworkManager enabled, this has very bad consequences, since
>   NetworkManager forces all interfacess to be IFF_UP all the time -
> if
>   the same virtual machine is restarted using a different VF (or even
> on
>   a different host), there will be multiple interfaces watching for
>   traffic with the same MAC address.
> 
>   To allow libvirt to revert to the original state, we need a way to
>   remove the administrative set MAC on a VF, to allow normal host
>   operation again, and to reset/overwrite the VF MAC via VF netdev.
> 
>   This patch implements the outlined scenario by allowing to set the
>   VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
>   igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
>   so it's possible to reset the VF MAC back to the original value via
>   the VF netdev.
> 
>   Note: Recent patches to libvirt allow for a workaround if the NIC
>   isn't capable of resetting the administrative MAC back to all 0,
> but
>   in theory the NIC should allow resetting the MAC in the first
> place.
> 
> Signed-off-by: Corinna Vinschen 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 42
> +++
>  1 file changed, 31 insertions(+), 11 deletions(-)

This patch does not apply (not even close).  Please make sure to base
you patch off my dev-queue branch of my next-queue tree on kernel.org.

signature.asc
Description: This is a digitally signed message part

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Pablo Neira Ayuso

On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote:
[...]
> Heh. I think I really want to solve - at least partially - nla_parse()
> to see that it can be done this way. It'd be nice to even transform all
> the callers (I generated half of these patches with spatch anyway) to
> have at least that.

We can just have a modified version of nla_parse that deals with this.
We can probably use struct nla_policy to place the extended error code
or the string (as we discussed I don't need string errors, but I'm
fine if other people find it useful).

Thanks!

Re: [PATCH v2 1/2] bpf: remove struct bpf_prog_type_list

2017-04-07 Thread Johannes Berg

So two things about this, and they apply to the other patch as well:

> +extern const struct bpf_verifier_ops sk_filter_prog_ops;
> +extern const struct bpf_verifier_ops tc_cls_act_prog_ops;
> +extern const struct bpf_verifier_ops xdp_prog_ops;
> +extern const struct bpf_verifier_ops cg_skb_prog_ops;
> +extern const struct bpf_verifier_ops cg_sock_prog_ops;
> +extern const struct bpf_verifier_ops lwt_inout_prog_ops;
> +extern const struct bpf_verifier_ops lwt_xmit_prog_ops;
> +extern const struct bpf_verifier_ops kprobe_prog_ops;
> +extern const struct bpf_verifier_ops tracepoint_prog_ops;
> +extern const struct bpf_verifier_ops perf_event_prog_ops;

I'm not super happy with having to list it here - and maybe I should
add ifdefs here as well.

+static const struct bpf_verifier_ops * const bpf_prog_types[] = {
> +#ifdef CONFIG_NET
> + [BPF_PROG_TYPE_SOCKET_FILTER] = &sk_filter_prog_ops,
> + [BPF_PROG_TYPE_SCHED_CLS] = &tc_cls_act_prog_ops,
> + [BPF_PROG_TYPE_SCHED_ACT] = &tc_cls_act_prog_ops,
> + [BPF_PROG_TYPE_XDP] = &xdp_prog_ops,
> + [BPF_PROG_TYPE_CGROUP_SKB] = &cg_skb_prog_ops,
> + [BPF_PROG_TYPE_CGROUP_SOCK] = &cg_sock_prog_ops,
> + [BPF_PROG_TYPE_LWT_IN] = &lwt_inout_prog_ops,
> + [BPF_PROG_TYPE_LWT_OUT] = &lwt_inout_prog_ops,
> + [BPF_PROG_TYPE_LWT_XMIT] = &lwt_xmit_prog_ops,
> +#endif
> +#ifdef CONFIG_BPF_EVENTS
> + [BPF_PROG_TYPE_KPROBE] = &kprobe_prog_ops,
> + [BPF_PROG_TYPE_TRACEPOINT] = &tracepoint_prog_ops,
> + [BPF_PROG_TYPE_PERF_EVENT] = &perf_event_prog_ops,
> +#endif

And here I list it again.

Something I considered was to add the prog_type to the ops struct, and
use the linker (putting this into a special section) to assemble an
array. But that
 * makes the lwt_inout_prog_ops that are shared not work without
   duplicating
 * requires more complicated search code

This seems optimal for the resulting binary, but it's a bunch more
typing.

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Pablo Neira Ayuso

On Fri, Apr 07, 2017 at 11:53:15AM -0700, David Miller wrote:
> From: Johannes Berg 
> Date: Fri,  7 Apr 2017 20:26:17 +0200
> 
> > So this is my first draft of what we'd talked about at netconf.
> > I'm not super happy with the way we have to pass the extended
> > error struct, but I don't see a way to implement reporting any
> > dynamic information (like error offsets) in any other way.
> > 
> > Alexander Shishkin had a nice way of reporting static extended
> > error data, but that isn't really suitable for reporting the
> > offset or even reporting the broken attribute from nla_parse().
> > 
> > Speaking of nla_parse(), that'll be somewhat complicated to do
> > since we'll have to track the offsets of where we're parsing,
> > but it might be possible since the nlattrs are just pointers
> > into the message, so (optionally?) passing the skb as well can
> > allow us to fill the offset information.
> 
> I like it, nice work.
> 
> I know people want dynamically generated strings and stuff, and we can
> get there, but I prefer that the first thing we commit is super simple.
> 
> Someone gave me a hard time about the fact that we've been talking
> about this idea for years but nothing ever happens.
> 
> I'm tempted to apply this as-is just to show that person that things
> do in fact happen eventually :-)

We can just send follow up patches to refine, I think it's a good
start, Johannes?

BTW, for this co-authored effort in designing this:

Signed-off-by: Pablo Neira Ayuso 

Thanks!

[PATCH v2 2/2] bpf: remove struct bpf_map_type_list

2017-04-07 Thread Johannes Berg

From: Johannes Berg 

There's no need to have struct bpf_map_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer. Also
initialize this array statically to remove code needed
to initialize it.

Signed-off-by: Johannes Berg 
---
 include/linux/bpf.h   | 18 +--
 kernel/bpf/arraymap.c | 64 ---
 kernel/bpf/hashtab.c  | 38 --
 kernel/bpf/lpm_trie.c | 14 +--
 kernel/bpf/stackmap.c | 14 +--
 kernel/bpf/syscall.c  | 42 ++---
 6 files changed, 46 insertions(+), 144 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 75cea2890751..4a95a25186e5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -51,12 +51,6 @@ struct bpf_map {
atomic_t usercnt;
 };
 
-struct bpf_map_type_list {
-   struct list_head list_node;
-   const struct bpf_map_ops *ops;
-   enum bpf_map_type type;
-};
-
 /* function argument constraints */
 enum bpf_arg_type {
ARG_DONTCARE = 0,   /* unused argument in helper function */
@@ -239,7 +233,17 @@ extern const struct bpf_verifier_ops kprobe_prog_ops;
 extern const struct bpf_verifier_ops tracepoint_prog_ops;
 extern const struct bpf_verifier_ops perf_event_prog_ops;
 
-void bpf_register_map_type(struct bpf_map_type_list *tl);
+extern const struct bpf_map_ops array_map_ops;
+extern const struct bpf_map_ops percpu_array_map_ops;
+extern const struct bpf_map_ops prog_array_map_ops;
+extern const struct bpf_map_ops perf_event_array_map_ops;
+extern const struct bpf_map_ops cgroup_array_map_ops;
+extern const struct bpf_map_ops htab_map_ops;
+extern const struct bpf_map_ops htab_lru_map_ops;
+extern const struct bpf_map_ops htab_percpu_map_ops;
+extern const struct bpf_map_ops htab_lru_percpu_map_ops;
+extern const struct bpf_map_ops trie_map_ops;
+extern const struct bpf_map_ops stack_map_ops;
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 6b6f41f0b211..376afd14cf32 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -260,7 +260,7 @@ static void array_map_free(struct bpf_map *map)
bpf_map_area_free(array);
 }
 
-static const struct bpf_map_ops array_ops = {
+const struct bpf_map_ops array_map_ops = {
.map_alloc = array_map_alloc,
.map_free = array_map_free,
.map_get_next_key = array_map_get_next_key,
@@ -269,12 +269,7 @@ static const struct bpf_map_ops array_ops = {
.map_delete_elem = array_map_delete_elem,
 };
 
-static struct bpf_map_type_list array_type __ro_after_init = {
-   .ops = &array_ops,
-   .type = BPF_MAP_TYPE_ARRAY,
-};
-
-static const struct bpf_map_ops percpu_array_ops = {
+const struct bpf_map_ops percpu_array_map_ops = {
.map_alloc = array_map_alloc,
.map_free = array_map_free,
.map_get_next_key = array_map_get_next_key,
@@ -283,19 +278,6 @@ static const struct bpf_map_ops percpu_array_ops = {
.map_delete_elem = array_map_delete_elem,
 };
 
-static struct bpf_map_type_list percpu_array_type __ro_after_init = {
-   .ops = &percpu_array_ops,
-   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
-};
-
-static int __init register_array_map(void)
-{
-   bpf_register_map_type(&array_type);
-   bpf_register_map_type(&percpu_array_type);
-   return 0;
-}
-late_initcall(register_array_map);
-
 static struct bpf_map *fd_array_map_alloc(union bpf_attr *attr)
 {
/* only file descriptors can be stored in this type of map */
@@ -399,7 +381,7 @@ void bpf_fd_array_map_clear(struct bpf_map *map)
fd_array_map_delete_elem(map, &i);
 }
 
-static const struct bpf_map_ops prog_array_ops = {
+const struct bpf_map_ops prog_array_map_ops = {
.map_alloc = fd_array_map_alloc,
.map_free = fd_array_map_free,
.map_get_next_key = array_map_get_next_key,
@@ -409,18 +391,6 @@ static const struct bpf_map_ops prog_array_ops = {
.map_fd_put_ptr = prog_fd_array_put_ptr,
 };
 
-static struct bpf_map_type_list prog_array_type __ro_after_init = {
-   .ops = &prog_array_ops,
-   .type = BPF_MAP_TYPE_PROG_ARRAY,
-};
-
-static int __init register_prog_array_map(void)
-{
-   bpf_register_map_type(&prog_array_type);
-   return 0;
-}
-late_initcall(register_prog_array_map);
-
 static struct bpf_event_entry *bpf_event_entry_gen(struct file *perf_file,
   struct file *map_file)
 {
@@ -511,7 +481,7 @@ static void perf_event_fd_array_release(struct bpf_map *map,
rcu_read_unlock();
 }
 
-static const struct bpf_map_ops perf_event_array_ops = {
+const struct bpf_map_ops perf_event_array_map_ops = {
.map_alloc = fd_array

[PATCH v2 1/2] bpf: remove struct bpf_prog_type_list

2017-04-07 Thread Johannes Berg

From: Johannes Berg 

There's no need to have struct bpf_prog_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer. Also
initialize this array statically to remove code needed
to initialize it.

Signed-off-by: Johannes Berg 
---
 include/linux/bpf.h  | 22 +++---
 kernel/bpf/syscall.c | 39 ++---
 kernel/trace/bpf_trace.c | 30 ++-
 net/core/filter.c| 75 +---
 4 files changed, 44 insertions(+), 122 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 909fc033173a..75cea2890751 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -169,12 +169,6 @@ struct bpf_verifier_ops {
  struct bpf_prog *prog);
 };
 
-struct bpf_prog_type_list {
-   struct list_head list_node;
-   const struct bpf_verifier_ops *ops;
-   enum bpf_prog_type type;
-};
-
 struct bpf_prog_aux {
atomic_t refcnt;
u32 used_map_cnt;
@@ -234,7 +228,17 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void 
*meta, u64 meta_size,
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
 
-void bpf_register_prog_type(struct bpf_prog_type_list *tl);
+extern const struct bpf_verifier_ops sk_filter_prog_ops;
+extern const struct bpf_verifier_ops tc_cls_act_prog_ops;
+extern const struct bpf_verifier_ops xdp_prog_ops;
+extern const struct bpf_verifier_ops cg_skb_prog_ops;
+extern const struct bpf_verifier_ops cg_sock_prog_ops;
+extern const struct bpf_verifier_ops lwt_inout_prog_ops;
+extern const struct bpf_verifier_ops lwt_xmit_prog_ops;
+extern const struct bpf_verifier_ops kprobe_prog_ops;
+extern const struct bpf_verifier_ops tracepoint_prog_ops;
+extern const struct bpf_verifier_ops perf_event_prog_ops;
+
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
@@ -295,10 +299,6 @@ static inline void bpf_long_memcpy(void *dst, const void 
*src, u32 size)
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 #else
-static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
-{
-}
-
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
return ERR_PTR(-EOPNOTSUPP);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7af0dcc5d755..421aa81ef5d4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -564,26 +564,33 @@ static int map_get_next_key(union bpf_attr *attr)
return err;
 }
 
-static LIST_HEAD(bpf_prog_types);
+static const struct bpf_verifier_ops * const bpf_prog_types[] = {
+#ifdef CONFIG_NET
+   [BPF_PROG_TYPE_SOCKET_FILTER] = &sk_filter_prog_ops,
+   [BPF_PROG_TYPE_SCHED_CLS] = &tc_cls_act_prog_ops,
+   [BPF_PROG_TYPE_SCHED_ACT] = &tc_cls_act_prog_ops,
+   [BPF_PROG_TYPE_XDP] = &xdp_prog_ops,
+   [BPF_PROG_TYPE_CGROUP_SKB] = &cg_skb_prog_ops,
+   [BPF_PROG_TYPE_CGROUP_SOCK] = &cg_sock_prog_ops,
+   [BPF_PROG_TYPE_LWT_IN] = &lwt_inout_prog_ops,
+   [BPF_PROG_TYPE_LWT_OUT] = &lwt_inout_prog_ops,
+   [BPF_PROG_TYPE_LWT_XMIT] = &lwt_xmit_prog_ops,
+#endif
+#ifdef CONFIG_BPF_EVENTS
+   [BPF_PROG_TYPE_KPROBE] = &kprobe_prog_ops,
+   [BPF_PROG_TYPE_TRACEPOINT] = &tracepoint_prog_ops,
+   [BPF_PROG_TYPE_PERF_EVENT] = &perf_event_prog_ops,
+#endif
+};
 
 static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
 {
-   struct bpf_prog_type_list *tl;
-
-   list_for_each_entry(tl, &bpf_prog_types, list_node) {
-   if (tl->type == type) {
-   prog->aux->ops = tl->ops;
-   prog->type = type;
-   return 0;
-   }
-   }
-
-   return -EINVAL;
-}
+   if (type >= ARRAY_SIZE(bpf_prog_types) || !bpf_prog_types[type])
+   return -EINVAL;
 
-void bpf_register_prog_type(struct bpf_prog_type_list *tl)
-{
-   list_add(&tl->list_node, &bpf_prog_types);
+   prog->aux->ops = bpf_prog_types[type];
+   prog->type = type;
+   return 0;
 }
 
 /* fixup insn->imm field of bpf_call instructions:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cee9802cf3e0..8a4efac28710 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -501,16 +501,11 @@ static bool kprobe_prog_is_valid_access(int off, int 
size, enum bpf_access_type
return true;
 }
 
-static const struct bpf_verifier_ops kprobe_prog_ops = {
+const struct bpf_verifier_ops kprobe_prog_ops = {
.get_func_proto  = kprobe_prog_func_proto,
.is_valid_access = kprobe_prog_is_valid_access,
 };
 
-static struct bpf_prog_type_list kprobe_tl __ro_after_init = {
-   .ops= &kprobe_prog_ops,
-   .type   = BPF_PROG_TYPE_KPROBE,
-};
-
 BPF_CALL_5(bpf_perf_event_o

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread Johannes Berg

On Fri, 2017-04-07 at 11:53 -0700, David Miller wrote:

> > Alexander Shishkin had a nice way of reporting static extended
> > error data, but that isn't really suitable for reporting the
> > offset or even reporting the broken attribute from nla_parse().
> > 
> > Speaking of nla_parse(), that'll be somewhat complicated to do
> > since we'll have to track the offsets of where we're parsing,
> > but it might be possible since the nlattrs are just pointers
> > into the message, so (optionally?) passing the skb as well can
> > allow us to fill the offset information.
> 
> I like it, nice work.

I don't like it all that much ;-)

Alexander Shishkin's patch
https://patchwork.kernel.org/patch/7162421/

was nice in a way because you could get away without passing the error
structure down all the time, but like I said it doesn't deal with
dynamic errors (even the offset/attr from nla_parse) and if it's not
done *everywhere* it risks leaking those exterr numbers (-1024..-4095)
to userspace through syscalls if it's used in a non-netlink path for
some reason (e.g. a single driver call being called through both
netlink and ioctl, and trying to return an extended error)

> I know people want dynamically generated strings and stuff, and we
> can get there, but I prefer that the first thing we commit is super
> simple.
> 
> Someone gave me a hard time about the fact that we've been talking
> about this idea for years but nothing ever happens.
> 
> I'm tempted to apply this as-is just to show that person that things
> do in fact happen eventually :-)

Heh. I think I really want to solve - at least partially - nla_parse()
to see that it can be done this way. It'd be nice to even transform all
the callers (I generated half of these patches with spatch anyway) to
have at least that.

johannes

Re: [RFC 0/3] netlink: extended error reporting

2017-04-07 Thread David Miller

From: Johannes Berg 
Date: Fri,  7 Apr 2017 20:26:17 +0200

> So this is my first draft of what we'd talked about at netconf.
> I'm not super happy with the way we have to pass the extended
> error struct, but I don't see a way to implement reporting any
> dynamic information (like error offsets) in any other way.
> 
> Alexander Shishkin had a nice way of reporting static extended
> error data, but that isn't really suitable for reporting the
> offset or even reporting the broken attribute from nla_parse().
> 
> Speaking of nla_parse(), that'll be somewhat complicated to do
> since we'll have to track the offsets of where we're parsing,
> but it might be possible since the nlattrs are just pointers
> into the message, so (optionally?) passing the skb as well can
> allow us to fill the offset information.

I like it, nice work.

I know people want dynamically generated strings and stuff, and we can
get there, but I prefer that the first thing we commit is super simple.

Someone gave me a hard time about the fact that we've been talking
about this idea for years but nothing ever happens.

I'm tempted to apply this as-is just to show that person that things
do in fact happen eventually :-)

Re: [PATCH v2 3/4] bluetooth: hci_uart: add LL protocol serdev driver support

2017-04-07 Thread Rob Herring

On Fri, Apr 7, 2017 at 12:09 PM, Dan Williams  wrote:
> On Fri, 2017-04-07 at 09:35 -0500, Rob Herring wrote:
>> Turns out that the LL protocol and the TI-ST are the same thing
>> AFAICT.
>> The TI-ST adds firmware loading, GPIO control, and shared access for
>> NFC, FM radio, etc. For now, we're only implementing what is needed
>> for
>> BT. This mirrors other drivers like BCM and Intel, but uses the new
>> serdev bus.
>>
>> The firmware loading is greatly simplified by using existing
>> infrastructure to send commands. It may be a bit slower than the
>> original code using synchronous functions, but the real bottleneck is
>> likely doing firmware load at 115.2kbps.
>
> Is there no way to put the TI-specific stuff into a TI UART module
> rather than building it into the generic one?

In case it's not clear, all of HCI_LL is the TI specific part, not
just what I'm adding. So you are talking about putting each UART BT
protocol into a separate module. I'd assume that is doable, but seems
orthogonal to this patch set. I'd also assume there was some reason
that was not done already.

Rob

Re: [PATCH net] bridge: netlink: register netdevice before executing changelink

2017-04-07 Thread Stephen Hemminger

On Fri, 7 Apr 2017 21:16:49 +0300
Ido Schimmel  wrote:

> On Fri, Apr 07, 2017 at 01:37:53PM -0400, Stephen Hemminger wrote:
> > Fix it right now. The patch you propose is too half baked.  
> 
> Given this patch needs to be back ported to three different stable
> kernels I tried to keep the amount of changes to a minimum while fixing
> the bug.
> 
> I'll work on the ndo_uninit() patch and submit it. Then people can
> decide which course of action they're more comfortable with.

Other option for stable is to revert b6677449dff6 ("bridge: netlink: call 
br_changelink() during br_dev_newlink()")

Re: [PATCH net] tcp: restrict F-RTO to work-around broken middle-boxes

2017-04-07 Thread David Miller

From: Yuchung Cheng 
Date: Fri,  7 Apr 2017 11:42:05 -0700

> The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO
> to catch more spurious timeouts") interacts badly with certain
> broken middle-boxes.  These broken boxes modify and falsely raise
> the receive window on the ACKs. During a timeout induced recovery,
> F-RTO would send new data packets to probe if the timeout is false
> or not. Since the receive window is falsely raised, the receiver
> would silently drop these F-RTO packets. The recovery would take N
> (exponentially backoff) timeouts to repair N packet losses.  A TCP
> performance killer.
> 
> Due to this unfortunate situation, this patch removes this extension
> to revert F-RTO back to the RFC specification.
> 
> Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts")
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Soheil Hassas Yeganeh 
> Signed-off-by: Eric Dumazet 

Applied, thank you.

Re: [PATCH net] bridge: netlink: register netdevice before executing changelink

2017-04-07 Thread Stephen Hemminger

On Fri, 7 Apr 2017 21:16:49 +0300
Ido Schimmel  wrote:

> On Fri, Apr 07, 2017 at 01:37:53PM -0400, Stephen Hemminger wrote:
> > Fix it right now. The patch you propose is too half baked.  
> 
> Given this patch needs to be back ported to three different stable
> kernels I tried to keep the amount of changes to a minimum while fixing
> the bug.
> 
> I'll work on the ndo_uninit() patch and submit it. Then people can
> decide which course of action they're more comfortable with.

I don't want enterprise distros to ship without error reporting.
At a minimum just return correct errno and leave bridge in stuck/broken state.

[PATCH net] tcp: restrict F-RTO to work-around broken middle-boxes

2017-04-07 Thread Yuchung Cheng

The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO
to catch more spurious timeouts") interacts badly with certain
broken middle-boxes.  These broken boxes modify and falsely raise
the receive window on the ACKs. During a timeout induced recovery,
F-RTO would send new data packets to probe if the timeout is false
or not. Since the receive window is falsely raised, the receiver
would silently drop these F-RTO packets. The recovery would take N
(exponentially backoff) timeouts to repair N packet losses.  A TCP
performance killer.

Due to this unfortunate situation, this patch removes this extension
to revert F-RTO back to the RFC specification.

Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts")
Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2c1f59386a7b..659d1baefb2b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1935,6 +1935,7 @@ void tcp_enter_loss(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk);
struct sk_buff *skb;
+   bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
bool is_reneg;  /* is receiver reneging on SACKs? */
bool mark_lost;
 
@@ -1994,15 +1995,18 @@ void tcp_enter_loss(struct sock *sk)
tp->high_seq = tp->snd_nxt;
tcp_ecn_queue_cwr(tp);
 
-   /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO
-* if a previous recovery is underway, otherwise it may incorrectly
-* call a timeout spurious if some previously retransmitted packets
-* are s/acked (sec 3.2). We do not apply that retriction since
-* retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS
-* so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO
-* on PTMU discovery to avoid sending new data.
+   /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous
+* loss recovery is underway except recurring timeout(s) on
+* the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing
+*
+* In theory F-RTO can be used repeatedly during loss recovery.
+* In practice this interacts badly with broken middle-boxes that
+* falsely raise the receive window, which results in repeated
+* timeouts and stop-and-go behavior.
 */
-   tp->frto = sysctl_tcp_frto && !inet_csk(sk)->icsk_mtup.probe_size;
+   tp->frto = sysctl_tcp_frto &&
+  (new_recovery || icsk->icsk_retransmits) &&
+  !inet_csk(sk)->icsk_mtup.probe_size;
 }
 
 /* If ACK arrived pointing to a remembered SACK, it means that our
-- 
2.12.2.715.g7642488e1d-goog

Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

2017-04-07 Thread David Miller

From: Eric Dumazet 
Date: Fri, 07 Apr 2017 11:38:49 -0700

> I do not really see what could be wrong, the code should run just fine
> on UP.

One theory is that the interrupt masking isn't working properly
and interrupts are still arriving and hitting the NAPI state even
when we are actively polling NAPI.

And this problem was masked by the locking done here.

[PATCH net-next] netvsc: use napi_consume_skb

2017-04-07 Thread Stephen Hemminger

This allows using deferred skb freeing and with NAPI. And get buffer
recycling.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index e998e2f7a619..0b4df3f59ecf 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -601,7 +601,8 @@ static inline void netvsc_free_send_slot(struct 
netvsc_device *net_device,
 static void netvsc_send_tx_complete(struct netvsc_device *net_device,
struct vmbus_channel *incoming_channel,
struct hv_device *device,
-   const struct vmpacket_descriptor *desc)
+   const struct vmpacket_descriptor *desc,
+   int budget)
 {
struct sk_buff *skb = (struct sk_buff *)(unsigned long)desc->trans_id;
struct net_device *ndev = hv_get_drvdata(device);
@@ -628,7 +629,7 @@ static void netvsc_send_tx_complete(struct netvsc_device 
*net_device,
tx_stats->bytes += packet->total_bytes;
u64_stats_update_end(&tx_stats->syncp);
 
-   dev_consume_skb_any(skb);
+   napi_consume_skb(skb, budget);
}
 
queue_sends =
@@ -646,7 +647,8 @@ static void netvsc_send_tx_complete(struct netvsc_device 
*net_device,
 static void netvsc_send_completion(struct netvsc_device *net_device,
   struct vmbus_channel *incoming_channel,
   struct hv_device *device,
-  const struct vmpacket_descriptor *desc)
+  const struct vmpacket_descriptor *desc,
+  int budget)
 {
struct nvsp_message *nvsp_packet = hv_pkt_data(desc);
struct net_device *ndev = hv_get_drvdata(device);
@@ -664,7 +666,7 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
 
case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
netvsc_send_tx_complete(net_device, incoming_channel,
-   device, desc);
+   device, desc, budget);
break;
 
default:
@@ -1174,14 +1176,16 @@ static int netvsc_process_raw_pkt(struct hv_device 
*device,
  struct vmbus_channel *channel,
  struct netvsc_device *net_device,
  struct net_device *ndev,
- const struct vmpacket_descriptor *desc)
+ const struct vmpacket_descriptor *desc,
+ int budget)
 {
struct net_device_context *net_device_ctx = netdev_priv(ndev);
struct nvsp_message *nvmsg = hv_pkt_data(desc);
 
switch (desc->type) {
case VM_PKT_COMP:
-   netvsc_send_completion(net_device, channel, device, desc);
+   netvsc_send_completion(net_device, channel, device,
+  desc, budget);
break;
 
case VM_PKT_DATA_USING_XFER_PAGES:
@@ -1230,7 +1234,7 @@ int netvsc_poll(struct napi_struct *napi, int budget)
 
while (nvchan->desc && work_done < budget) {
work_done += netvsc_process_raw_pkt(device, channel, net_device,
-   ndev, nvchan->desc);
+   ndev, nvchan->desc, budget);
nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc);
}
 
-- 
2.11.0

Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

2017-04-07 Thread Eric Dumazet

On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote:
> Hi,
> 
> My old P3 laptop started to die on me in the middle of larger compile
> jobs (using distcc) after v4.11-rc. I bisected the problem
> to 617f01211baf ("8139too: use napi_complete_done()").
> 
> Unfortunately I wasn't able to capture a full oops as the machine doesn't
> have serial and ramoops failed me. I did get one partial oops on vgacon
> which showed rtl8139_poll() being involved (EIP was around
> _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> bisect result.
> 
> So maybe some kind of nasty thing going between the hard irq and
> softirq? Perhaps UP related? I tried to stare at the locking around
> rtl8139_poll() for a while but it looked mostly sane to me.
> 

Thanks a lot for the detective work, I am so sorry for this !

Could you try the following patch ?

I do not really see what could be wrong, the code should run just fine
on UP.

Thanks.

diff --git a/drivers/net/ethernet/realtek/8139too.c 
b/drivers/net/ethernet/realtek/8139too.c
index 
89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673
 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int 
budget)
if (likely(RTL_R16(IntrStatus) & RxAckBits))
work_done += rtl8139_rx(dev, tp, budget);
 
-   if (work_done < budget && napi_complete_done(napi, work_done)) {
+   if (work_done < budget) {
unsigned long flags;
 
spin_lock_irqsave(&tp->lock, flags);
-   RTL_W16_F(IntrMask, rtl8139_intr_mask);
+   if (napi_complete_done(napi, work_done))
+   RTL_W16_F(IntrMask, rtl8139_intr_mask);
spin_unlock_irqrestore(&tp->lock, flags);
}
spin_unlock(&tp->rx_lock);

Re: [RFC 2/3] genetlink: pass extended error report down

2017-04-07 Thread Ben Greear


On 04/07/2017 11:26 AM, Johannes Berg wrote:

From: Johannes Berg 

Signed-off-by: Johannes Berg 
---
 include/net/genetlink.h | 27 +++
 net/netlink/genetlink.c |  6 --
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index a34275be3600..67ad2326cfa6 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -84,6 +84,7 @@ struct nlattr **genl_family_attrbuf(const struct genl_family 
*family);
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
+ * @exterr: extended error report struct
  */
 struct genl_info {
u32 snd_seq;
@@ -94,6 +95,7 @@ struct genl_info {
struct nlattr **attrs;
possible_net_t  _net;
void *  user_ptr[2];
+   struct netlink_ext_err *exterr;
 };

 static inline struct net *genl_info_net(struct genl_info *info)
@@ -106,6 +108,31 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
write_pnet(&info->_net, net);
 }

+static inline int genl_err_str(struct genl_info *info, int err,
+  const char *msg)
+{
+   info->exterr->msg = msg;
+
+   return err;
+}


I guess the error string must be constant and always available in memory
in this implementation?

I think it would be nice to dynamically create strings (malloc, snprintf, etc)
and have the err_str logic free it when done?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

[net-next 7/7] net/mlx5e: Set default RX moderation parameters on driver load

2017-04-07 Thread Saeed Mahameed

RX moderation default parameters shouldn't be set in
mlx5e_build_rx_cq_param since it would reset the values every time on
netdev open/close.  Instead, it should be set in
mlx5e_set_rx_cq_mode_params which is called on driver load only.

Fixes: 6a9764efb255 ("net/mlx5e: Isolate open_channels from priv->params")
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 57844ffca37f..8b7b7e604ea0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1944,10 +1944,6 @@ static void mlx5e_build_rx_cq_param(struct mlx5e_priv 
*priv,
}
 
mlx5e_build_common_cq_param(priv, param);
-
-   if (params->rx_am_enabled)
-   params->rx_cq_moderation =
-   mlx5e_am_get_def_profile(params->rx_cq_period_mode);
 }
 
 static void mlx5e_build_tx_cq_param(struct mlx5e_priv *priv,
@@ -3787,6 +3783,10 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params, u8 cq_period_mode)
params->rx_cq_moderation.usec =
MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE;
 
+   if (params->rx_am_enabled)
+   params->rx_cq_moderation =
+   mlx5e_am_get_def_profile(params->rx_cq_period_mode);
+
MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_BASED_MODER,
params->rx_cq_period_mode == 
MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
 }
-- 
2.11.0

[net-next 4/7] net/mlx5e: Change FW sub_minor display to 4 zeros padding

2017-04-07 Thread Saeed Mahameed

From: Eran Ben Elisha 

FW version should be reported as X.Y., add leading zeroes to sub
minor in order to fix it.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index af039b6c0799..167a8379156e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -42,7 +42,7 @@ static void mlx5e_get_drvinfo(struct net_device *dev,
strlcpy(drvinfo->version, DRIVER_VERSION " (" DRIVER_RELDATE ")",
sizeof(drvinfo->version));
snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
-"%d.%d.%d",
+"%d.%d.%04d",
 fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev));
strlcpy(drvinfo->bus_info, pci_name(mdev->pdev),
sizeof(drvinfo->bus_info));
-- 
2.11.0

[net-next 3/7] net/mlx5e: Make mlx5e_modify_rqs_vsd a static function

2017-04-07 Thread Saeed Mahameed

From: Guy Ergas 

Make mlx5e_modify_rqs_vsd a static function and remove from en.h in
order to reduce redundant exposure of functions.

Signed-off-by: Guy Ergas 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a58031da7fad..b7feecfbb5a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -843,8 +843,6 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, 
__always_unused __be16 proto,
 void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv);
 void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv);
 
-int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd);
-
 struct mlx5e_redirect_rqt_param {
bool is_rss;
union {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 83796ce17fc3..b57a6e72cc86 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2950,7 +2950,7 @@ static int mlx5e_modify_channels_scatter_fcs(struct 
mlx5e_channels *chs, bool en
return 0;
 }
 
-int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd)
+static int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd)
 {
int err = 0;
int i;
-- 
2.11.0

1 2 3 >

1 - 100 of 250 matches

Mail list logo