Re: [PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions
Hi Florian, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-dsa-Receive-path-simplifications/20170408-074503 config: tile-allmodconfig (attached as .config) compiler: tilegx-linux-gcc (GCC) 4.6.2 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=tile All warnings (new ones prefixed by >>): >> net//dsa/tag_mtk.c:117:2: warning: initialization from incompatible pointer >> type [enabled by default] net//dsa/tag_mtk.c:117:2: warning: (near initialization for 'mtk_netdev_ops.rcv') [enabled by default] vim +117 net//dsa/tag_mtk.c 5cd8985a Sean Wang 2017-04-07 101 5cd8985a Sean Wang 2017-04-07 102 skb->dev->stats.rx_packets++; 5cd8985a Sean Wang 2017-04-07 103 skb->dev->stats.rx_bytes += skb->len; 5cd8985a Sean Wang 2017-04-07 104 5cd8985a Sean Wang 2017-04-07 105 netif_receive_skb(skb); 5cd8985a Sean Wang 2017-04-07 106 5cd8985a Sean Wang 2017-04-07 107 return 0; 5cd8985a Sean Wang 2017-04-07 108 5cd8985a Sean Wang 2017-04-07 109 out_drop: 5cd8985a Sean Wang 2017-04-07 110 kfree_skb(skb); 5cd8985a Sean Wang 2017-04-07 111 out: 5cd8985a Sean Wang 2017-04-07 112 return 0; 5cd8985a Sean Wang 2017-04-07 113 } 5cd8985a Sean Wang 2017-04-07 114 5cd8985a Sean Wang 2017-04-07 115 const struct dsa_device_ops mtk_netdev_ops = { 5cd8985a Sean Wang 2017-04-07 116 .xmit = mtk_tag_xmit, 5cd8985a Sean Wang 2017-04-07 @117 .rcv= mtk_tag_rcv, 5cd8985a Sean Wang 2017-04-07 118 }; :: The code at line 117 was first introduced by commit :: 5cd8985a19090f2b0ce8700ae3ec19e06a4fc5e9 net-next: dsa: add Mediatek tag RX/TX handler :: TO: Sean Wang :: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions
Hi Florian, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-dsa-Receive-path-simplifications/20170408-074503 config: ia64-allmodconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 All errors (new ones prefixed by >>): >> net//dsa/tag_mtk.c:117:9: error: initialization from incompatible pointer >> type [-Werror=incompatible-pointer-types] .rcv = mtk_tag_rcv, ^~~ net//dsa/tag_mtk.c:117:9: note: (near initialization for 'mtk_netdev_ops.rcv') cc1: some warnings being treated as errors vim +117 net//dsa/tag_mtk.c 5cd8985a Sean Wang 2017-04-07 101 5cd8985a Sean Wang 2017-04-07 102 skb->dev->stats.rx_packets++; 5cd8985a Sean Wang 2017-04-07 103 skb->dev->stats.rx_bytes += skb->len; 5cd8985a Sean Wang 2017-04-07 104 5cd8985a Sean Wang 2017-04-07 105 netif_receive_skb(skb); 5cd8985a Sean Wang 2017-04-07 106 5cd8985a Sean Wang 2017-04-07 107 return 0; 5cd8985a Sean Wang 2017-04-07 108 5cd8985a Sean Wang 2017-04-07 109 out_drop: 5cd8985a Sean Wang 2017-04-07 110 kfree_skb(skb); 5cd8985a Sean Wang 2017-04-07 111 out: 5cd8985a Sean Wang 2017-04-07 112 return 0; 5cd8985a Sean Wang 2017-04-07 113 } 5cd8985a Sean Wang 2017-04-07 114 5cd8985a Sean Wang 2017-04-07 115 const struct dsa_device_ops mtk_netdev_ops = { 5cd8985a Sean Wang 2017-04-07 116 .xmit = mtk_tag_xmit, 5cd8985a Sean Wang 2017-04-07 @117 .rcv= mtk_tag_rcv, 5cd8985a Sean Wang 2017-04-07 118 }; :: The code at line 117 was first introduced by commit :: 5cd8985a19090f2b0ce8700ae3ec19e06a4fc5e9 net-next: dsa: add Mediatek tag RX/TX handler :: TO: Sean Wang :: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe
>> - trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld", >> + trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu >> ts:%lld.%09ld", >>field->seqnum, >>field->duration, >>field->outer_duration, >> - field->timestamp.tv_sec, >> + (long long)field->timestamp.tv_sec, > > Refresh my memory. We need the cast because on 64 bit boxes > timestamp.tv_sec is just a long? This is only required until we change the definition of timespec64. Right now it is defined as #if __BITS_PER_LONG == 64 # define timespec64 timespec #else struct timespec64 { time64_t tv_sec; long tv_nsec; }; #endif And timespec.tv_sec is just long int on 64 bit machines. This is why we need the cast now. We will probably change this and only define __kernel_timespec instead of timespec, leaving only one definition of timespec64. At that time, we will not need this. -Deepa
Re: [PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe
On Fri, 7 Apr 2017 17:57:00 -0700 Deepa Dinamani wrote: > struct timespec is not y2038 safe on 32 bit machines > and needs to be replaced by struct timespec64 > in order to represent times beyond year 2038 on such > machines. > > Fix all the timestamp representation in struct trace_hwlat > and all the corresponding implementations. > > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c > index 02a4aeb..08f9bab 100644 > --- a/kernel/trace/trace_output.c > +++ b/kernel/trace/trace_output.c > @@ -4,7 +4,6 @@ > * Copyright (C) 2008 Red Hat Inc, Steven Rostedt > * > */ > - > #include > #include > #include > @@ -1161,11 +1160,11 @@ trace_hwlat_print(struct trace_iterator *iter, int > flags, > > trace_assign_type(field, entry); > > - trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld", > + trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%lld.%09ld", >field->seqnum, >field->duration, >field->outer_duration, > - field->timestamp.tv_sec, > + (long long)field->timestamp.tv_sec, Refresh my memory. We need the cast because on 64 bit boxes timestamp.tv_sec is just a long? Other than that. Reviewed-by: Steven Rostedt (VMware) -- Steve >field->timestamp.tv_nsec); > > if (field->nmi_count) { > @@ -1195,10 +1194,10 @@ trace_hwlat_raw(struct trace_iterator *iter, int > flags, > > trace_assign_type(field, iter->ent); > > - trace_seq_printf(s, "%llu %lld %ld %09ld %u\n", > + trace_seq_printf(s, "%llu %lld %lld %09ld %u\n", >field->duration, >field->outer_duration, > - field->timestamp.tv_sec, > + (long long)field->timestamp.tv_sec, >field->timestamp.tv_nsec, >field->seqnum); >
[PATCH 05/12] fs: ufs: Use ktime_get_real_ts64() for birthtime
CURRENT_TIME is not y2038 safe. Replace it with ktime_get_real_ts64(). Inode time formats are already 64 bit long and accommodates time64_t. Signed-off-by: Deepa Dinamani --- fs/ufs/ialloc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ufs/ialloc.c b/fs/ufs/ialloc.c index 9774555..d1dd8cc 100644 --- a/fs/ufs/ialloc.c +++ b/fs/ufs/ialloc.c @@ -176,6 +176,7 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode) struct ufs_cg_private_info * ucpi; struct ufs_cylinder_group * ucg; struct inode * inode; + struct timespec64 ts; unsigned cg, bit, i, j, start; struct ufs_inode_info *ufsi; int err = -ENOSPC; @@ -323,8 +324,9 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode) lock_buffer(bh); ufs2_inode = (struct ufs2_inode *)bh->b_data; ufs2_inode += ufs_inotofsbo(inode->i_ino); - ufs2_inode->ui_birthtime = cpu_to_fs64(sb, CURRENT_TIME.tv_sec); - ufs2_inode->ui_birthnsec = cpu_to_fs32(sb, CURRENT_TIME.tv_nsec); + ktime_get_real_ts64(&ts); + ufs2_inode->ui_birthtime = cpu_to_fs64(sb, ts.tv_sec); + ufs2_inode->ui_birthnsec = cpu_to_fs32(sb, ts.tv_nsec); mark_buffer_dirty(bh); unlock_buffer(bh); if (sb->s_flags & MS_SYNCHRONOUS) -- 2.7.4
[PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe
struct timespec is not y2038 safe on 32 bit machines and needs to be replaced by struct timespec64 in order to represent times beyond year 2038 on such machines. Fix all the timestamp representation in struct trace_hwlat and all the corresponding implementations. Signed-off-by: Deepa Dinamani --- kernel/trace/trace_entries.h | 6 +++--- kernel/trace/trace_hwlat.c | 14 +++--- kernel/trace/trace_output.c | 9 - 3 files changed, 14 insertions(+), 15 deletions(-) diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index c203ac4..adcdbbe 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -348,14 +348,14 @@ FTRACE_ENTRY(hwlat, hwlat_entry, __field(u64,duration) __field(u64,outer_duration ) __field(u64,nmi_total_ts) - __field_struct( struct timespec,timestamp ) - __field_desc( long, timestamp, tv_sec ) + __field_struct( struct timespec64, timestamp ) + __field_desc( s64,timestamp, tv_sec ) __field_desc( long, timestamp, tv_nsec ) __field(unsigned int, nmi_count ) __field(unsigned int, seqnum ) ), - F_printk("cnt:%u\tts:%010lu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n", + F_printk("cnt:%u\tts:%010llu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n", __entry->seqnum, __entry->tv_sec, __entry->tv_nsec, diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 21ea6ae..d7c8e4e 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -79,12 +79,12 @@ static u64 last_tracing_thresh = DEFAULT_LAT_THRESHOLD * NSEC_PER_USEC; /* Individual latency samples are stored here when detected. */ struct hwlat_sample { - u64 seqnum; /* unique sequence */ - u64 duration; /* delta */ - u64 outer_duration; /* delta (outer loop) */ - u64 nmi_total_ts; /* Total time spent in NMIs */ - struct timespec timestamp; /* wall time */ - int nmi_count; /* # NMIs during this sample */ + u64 seqnum; /* unique sequence */ + u64 duration; /* delta */ + u64 outer_duration; /* delta (outer loop) */ + u64 nmi_total_ts; /* Total time spent in NMIs */ + struct timespec64 timestamp; /* wall time */ + int nmi_count; /* # NMIs during this sample */ }; /* keep the global state somewhere. */ @@ -250,7 +250,7 @@ static int get_sample(void) s.seqnum = hwlat_data.count; s.duration = sample; s.outer_duration = outer_sample; - s.timestamp = CURRENT_TIME; + ktime_get_real_ts64(&s.timestamp); s.nmi_total_ts = nmi_total_ts; s.nmi_count = nmi_count; trace_hwlat_sample(&s); diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 02a4aeb..08f9bab 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -4,7 +4,6 @@ * Copyright (C) 2008 Red Hat Inc, Steven Rostedt * */ - #include #include #include @@ -1161,11 +1160,11 @@ trace_hwlat_print(struct trace_iterator *iter, int flags, trace_assign_type(field, entry); - trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld", + trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%lld.%09ld", field->seqnum, field->duration, field->outer_duration, -field->timestamp.tv_sec, +(long long)field->timestamp.tv_sec, field->timestamp.tv_nsec); if (field->nmi_count) { @@ -1195,10 +1194,10 @@ trace_hwlat_raw(struct trace_iterator *iter, int flags, trace_assign_type(field, iter->ent); - trace_seq_printf(s, "%llu %lld %ld %09ld %u\n", + trace_seq_printf(s, "%llu %lld %lld %09ld %u\n", field->duration, field->outer_duration, -field->timestamp.tv_sec, +(long long)field->timestamp.tv_sec, field->timestamp.tv_nsec, field->seqnum); -- 2.7.4
[PATCH 10/12] apparmorfs: Replace CURRENT_TIME with current_time()
CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time(). This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. current_time() is also planned to be transitioned to y2038 safe behavior along with this change. CURRENT_TIME macro will be deleted before merging the aforementioned change. Signed-off-by: Deepa Dinamani --- security/apparmor/apparmorfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c index be0b498..4f6ac9d 100644 --- a/security/apparmor/apparmorfs.c +++ b/security/apparmor/apparmorfs.c @@ -1357,7 +1357,7 @@ static int aa_mk_null_file(struct dentry *parent) inode->i_ino = get_next_ino(); inode->i_mode = S_IFCHR | S_IRUGO | S_IWUGO; - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); init_special_inode(inode, S_IFCHR | S_IRUGO | S_IWUGO, MKDEV(MEM_MAJOR, 3)); d_instantiate(dentry, inode); -- 2.7.4
[PATCH 11/12] time: Delete CURRENT_TIME_SEC and CURRENT_TIME
All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced by other time functions. These macros are also not y2038 safe. And, all their use cases can be fulfilled by y2038 safe ktime_get_* variants. Signed-off-by: Deepa Dinamani Acked-by: John Stultz Reviewed-by: Arnd Bergmann --- include/linux/time.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/linux/time.h b/include/linux/time.h index 23f0f5c..c0543f5 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -151,9 +151,6 @@ static inline bool timespec_inject_offset_valid(const struct timespec *ts) return true; } -#define CURRENT_TIME (current_kernel_time()) -#define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 }) - /* Some architectures do not supply their own clocksource. * This is mainly the case in architectures that get their * inter-tick times by reading the counter on their interval -- 2.7.4
[PATCH 09/12] lustre: Replace CURRENT_TIME macro
CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time() for filesystem times, and ktime_get_* functions for others. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as lustre uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. current_time() is also planned to be transitioned to y2038 safe behavior along with this change. CURRENT_TIME macro will be deleted before merging the aforementioned change. Signed-off-by: Deepa Dinamani --- drivers/staging/lustre/lustre/llite/llite_lib.c | 6 +++--- drivers/staging/lustre/lustre/osc/osc_io.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c index 7b80040..2b4b6b9 100644 --- a/drivers/staging/lustre/lustre/llite/llite_lib.c +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c @@ -1472,17 +1472,17 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, bool hsm_import) /* We mark all of the fields "set" so MDS/OST does not re-set them */ if (attr->ia_valid & ATTR_CTIME) { - attr->ia_ctime = CURRENT_TIME; + attr->ia_ctime = current_time(inode); attr->ia_valid |= ATTR_CTIME_SET; } if (!(attr->ia_valid & ATTR_ATIME_SET) && (attr->ia_valid & ATTR_ATIME)) { - attr->ia_atime = CURRENT_TIME; + attr->ia_atime = current_time(inode); attr->ia_valid |= ATTR_ATIME_SET; } if (!(attr->ia_valid & ATTR_MTIME_SET) && (attr->ia_valid & ATTR_MTIME)) { - attr->ia_mtime = CURRENT_TIME; + attr->ia_mtime = current_time(inode); attr->ia_valid |= ATTR_MTIME_SET; } diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c index f991bee..cbab800 100644 --- a/drivers/staging/lustre/lustre/osc/osc_io.c +++ b/drivers/staging/lustre/lustre/osc/osc_io.c @@ -216,7 +216,7 @@ static int osc_io_submit(const struct lu_env *env, struct cl_object *obj = ios->cis_obj; cl_object_attr_lock(obj); - attr->cat_mtime = LTIME_S(CURRENT_TIME); + attr->cat_mtime = ktime_get_real_seconds(); attr->cat_ctime = attr->cat_mtime; cl_object_attr_update(env, obj, attr, CAT_MTIME | CAT_CTIME); cl_object_attr_unlock(obj); @@ -256,7 +256,7 @@ static void osc_page_touch_at(const struct lu_env *env, kms > loi->loi_kms ? "" : "not ", loi->loi_kms, kms, loi->loi_lvb.lvb_size); - attr->cat_ctime = LTIME_S(CURRENT_TIME); + attr->cat_ctime = ktime_get_real_seconds(); attr->cat_mtime = attr->cat_ctime; valid = CAT_MTIME | CAT_CTIME; if (kms > loi->loi_kms) { -- 2.7.4
[PATCH 07/12] fs: btrfs: Use ktime_get_real_ts for root ctime
btrfs_root_item maintains the ctime for root updates. This is not part of vfs_inode. Since current_time() uses struct inode* as an argument as Linus suggested, this cannot be used to update root times unless, we modify the signature to use inode. Since btrfs uses nanosecond time granularity, it can also use ktime_get_real_ts directly to obtain timestamp for the root. It is necessary to use the timespec time api here because the same btrfs_set_stack_timespec_*() apis are used for vfs inode times as well. These can be transitioned to using timespec64 when btrfs internally changes to use timespec64 as well. Signed-off-by: Deepa Dinamani Acked-by: David Sterba Reviewed-by: Arnd Bergmann --- fs/btrfs/root-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index a08224e..7d6bc30 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -501,8 +501,9 @@ void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_root_item *item = &root->root_item; - struct timespec ct = current_fs_time(root->fs_info->sb); + struct timespec ct; + ktime_get_real_ts(&ct); spin_lock(&root->root_item_lock); btrfs_set_root_ctransid(item, trans->transid); btrfs_set_stack_timespec_sec(&item->ctime, ct.tv_sec); -- 2.7.4
[PATCH 12/12] time: Delete current_fs_time() function
All uses of the current_fs_time() function have been replaced by other time interfaces. And, its use cases can be fulfilled by current_time() or ktime_get_* variants. Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann --- include/linux/fs.h | 1 - kernel/time/time.c | 14 -- 2 files changed, 15 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index f1d7347..cce6c57 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1430,7 +1430,6 @@ static inline void i_gid_write(struct inode *inode, gid_t gid) inode->i_gid = make_kgid(inode->i_sb->s_user_ns, gid); } -extern struct timespec current_fs_time(struct super_block *sb); extern struct timespec current_time(struct inode *inode); /* diff --git a/kernel/time/time.c b/kernel/time/time.c index 25bdd25..cf69cca 100644 --- a/kernel/time/time.c +++ b/kernel/time/time.c @@ -230,20 +230,6 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p) return copy_to_user(txc_p, &txc, sizeof(struct timex)) ? -EFAULT : ret; } -/** - * current_fs_time - Return FS time - * @sb: Superblock. - * - * Return the current time truncated to the time granularity supported by - * the fs. - */ -struct timespec current_fs_time(struct super_block *sb) -{ - struct timespec now = current_kernel_time(); - return timespec_trunc(now, sb->s_time_gran); -} -EXPORT_SYMBOL(current_fs_time); - /* * Convert jiffies to milliseconds and back. * -- 2.7.4
[PATCH 08/12] fs: ubifs: Replace CURRENT_TIME_SEC with current_time
CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned to use 64 bit time along with vfs in a separate patch. There is no plan to transition CURRENT_TIME_SEC to use y2038 safe time interfaces. current_time() returns timestamps according to the granularities set in the inode's super_block. The granularity check to call current_fs_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to update inode timestamp. Use timespec_trunc during file system creation, before the first inode is created. Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann --- fs/ubifs/dir.c | 12 ++-- fs/ubifs/file.c | 12 ++-- fs/ubifs/ioctl.c | 2 +- fs/ubifs/misc.h | 10 -- fs/ubifs/sb.c| 14 ++ fs/ubifs/xattr.c | 6 +++--- 6 files changed, 26 insertions(+), 30 deletions(-) diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 30825d88..8510d79 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -121,7 +121,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir, inode_init_owner(inode, dir, mode); inode->i_mtime = inode->i_atime = inode->i_ctime = -ubifs_current_time(inode); +current_time(inode); inode->i_mapping->nrpages = 0; switch (mode & S_IFMT) { @@ -750,7 +750,7 @@ static int ubifs_link(struct dentry *old_dentry, struct inode *dir, lock_2_inodes(dir, inode); inc_nlink(inode); ihold(inode); - inode->i_ctime = ubifs_current_time(inode); + inode->i_ctime = current_time(inode); dir->i_size += sz_change; dir_ui->ui_size = dir->i_size; dir->i_mtime = dir->i_ctime = inode->i_ctime; @@ -823,7 +823,7 @@ static int ubifs_unlink(struct inode *dir, struct dentry *dentry) } lock_2_inodes(dir, inode); - inode->i_ctime = ubifs_current_time(dir); + inode->i_ctime = current_time(dir); drop_nlink(inode); dir->i_size -= sz_change; dir_ui->ui_size = dir->i_size; @@ -927,7 +927,7 @@ static int ubifs_rmdir(struct inode *dir, struct dentry *dentry) } lock_2_inodes(dir, inode); - inode->i_ctime = ubifs_current_time(dir); + inode->i_ctime = current_time(dir); clear_nlink(inode); drop_nlink(dir); dir->i_size -= sz_change; @@ -1405,7 +1405,7 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry, * Like most other Unix systems, set the @i_ctime for inodes on a * rename. */ - time = ubifs_current_time(old_dir); + time = current_time(old_dir); old_inode->i_ctime = time; /* We must adjust parent link count when renaming directories */ @@ -1578,7 +1578,7 @@ static int ubifs_xrename(struct inode *old_dir, struct dentry *old_dentry, lock_4_inodes(old_dir, new_dir, NULL, NULL); - time = ubifs_current_time(old_dir); + time = current_time(old_dir); fst_inode->i_ctime = time; snd_inode->i_ctime = time; old_dir->i_mtime = old_dir->i_ctime = time; diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index d9ae86f..2cda3d6 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1196,7 +1196,7 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode, mutex_lock(&ui->ui_mutex); ui->ui_size = inode->i_size; /* Truncation changes inode [mc]time */ - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); /* Other attributes may be changed at the same time as well */ do_attr_changes(inode, attr); err = ubifs_jnl_truncate(c, inode, old_size, new_size); @@ -1243,7 +1243,7 @@ static int do_setattr(struct ubifs_info *c, struct inode *inode, mutex_lock(&ui->ui_mutex); if (attr->ia_valid & ATTR_SIZE) { /* Truncation changes inode [mc]time */ - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); /* 'truncate_setsize()' changed @i_size, update @ui_size */ ui->ui_size = inode->i_size; } @@ -1420,7 +1420,7 @@ int ubifs_update_time(struct inode *inode, struct timespec *time, */ static int update_mctime(struct inode *inode) { - struct timespec now = ubifs_current_time(inode); + struct timespec now = current_time(inode); struct ubifs_inode *ui = ubifs_inode(inode); struct ubifs_info *c = inode->i_sb->s_fs_info; @@ -1434,7 +1434,7 @@ static int update_mctime(struct inode *inode) return err; mutex_lock(&ui->ui_mutex); - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); release = ui->dirty; mark_inode_dirty_sync(ino
[PATCH 03/12] fs: cifs: Replace CURRENT_TIME by other appropriate apis
CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time() for filesystem times, and ktime_get_* functions for authentication timestamps and timezone calculations. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. CURRENT_TIME macro will be deleted before merging the aforementioned change. The inode timestamps read from the server are assumed to have correct granularity and range. The patch also assumes that the difference between server and client times lie in the range INT_MIN..INT_MAX. This is valid because this is the difference between current times between server and client, and the largest timezone difference is in the range of one day. All cifs timestamps currently use timespec representation internally. Authentication and timezone timestamps can also be transitioned into using timespec64 when all other timestamps for cifs is transitioned to use timespec64. Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann --- fs/cifs/cifsencrypt.c | 4 +++- fs/cifs/cifssmb.c | 10 +- fs/cifs/inode.c | 28 +++- 3 files changed, 23 insertions(+), 19 deletions(-) diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 058ac9b..68abbb0 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -478,6 +478,7 @@ find_timestamp(struct cifs_ses *ses) unsigned char *blobptr; unsigned char *blobend; struct ntlmssp2_name *attrptr; + struct timespec ts; if (!ses->auth_key.len || !ses->auth_key.response) return 0; @@ -502,7 +503,8 @@ find_timestamp(struct cifs_ses *ses) blobptr += attrsize; /* advance attr value */ } - return cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME)); + ktime_get_real_ts(&ts); + return cpu_to_le64(cifs_UnixTimeToNT(ts)); } static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash, diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 0669506..2f279b7 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -478,14 +478,14 @@ decode_lanman_negprot_rsp(struct TCP_Server_Info *server, NEGOTIATE_RSP *pSMBr) * this requirement. */ int val, seconds, remain, result; - struct timespec ts, utc; - utc = CURRENT_TIME; + struct timespec ts; + unsigned long utc = ktime_get_real_seconds(); ts = cnvrtDosUnixTm(rsp->SrvTime.Date, rsp->SrvTime.Time, 0); cifs_dbg(FYI, "SrvTime %d sec since 1970 (utc: %d) diff: %d\n", -(int)ts.tv_sec, (int)utc.tv_sec, -(int)(utc.tv_sec - ts.tv_sec)); - val = (int)(utc.tv_sec - ts.tv_sec); +(int)ts.tv_sec, (int)utc, +(int)(utc - ts.tv_sec)); + val = (int)(utc - ts.tv_sec); seconds = abs(val); result = (seconds / MIN_TZ_ADJ) * MIN_TZ_ADJ; remain = seconds % MIN_TZ_ADJ; diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index b261db3..c3b2fa0 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -322,9 +322,9 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb) fattr->cf_mode = S_IFDIR | S_IXUGO | S_IRWXU; fattr->cf_uid = cifs_sb->mnt_uid; fattr->cf_gid = cifs_sb->mnt_gid; - fattr->cf_atime = CURRENT_TIME; - fattr->cf_ctime = CURRENT_TIME; - fattr->cf_mtime = CURRENT_TIME; + ktime_get_real_ts(&fattr->cf_mtime); + fattr->cf_mtime = timespec_trunc(fattr->cf_mtime, sb->s_time_gran); + fattr->cf_atime = fattr->cf_ctime = fattr->cf_mtime; fattr->cf_nlink = 2; fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL; } @@ -586,9 +586,10 @@ static int cifs_sfu_mode(struct cifs_fattr *fattr, const unsigned char *path, /* Fill a cifs_fattr struct with info from FILE_ALL_INFO */ static void cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, - struct cifs_sb_info *cifs_sb, bool adjust_tz, + struct super_block *sb, bool adjust_tz, bool symlink) { + struct cifs_sb_info *cifs_sb = CIFS_SB(sb); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); memset(fattr, 0, sizeof(*fattr)); @@ -598,8 +599,10 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, if (info->LastAccessTime) fattr->cf_atime = cifs_NTtimeToUnix(info->LastAccessTime); - else - fattr->cf_atime = CURRENT_TIME; + else { + ktime_get_real_ts(&fattr->cf_atime); + fattr->cf_atime = timespec_trunc(fattr->cf_atime, sb->s_time_gran); + } fattr->cf_ctime = cifs_NTtimeToUnix(info->
[PATCH 06/12] audit: Use timespec64 to represent audit timestamps
struct timespec is not y2038 safe. Audit timestamps are recorded in string format into an audit buffer for a given context. These mark the entry timestamps for the syscalls. Use y2038 safe struct timespec64 to represent the times. The log strings can handle this transition as strings can hold upto 1024 characters. Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann Acked-by: Paul Moore Acked-by: Richard Guy Briggs --- include/linux/audit.h | 4 ++-- kernel/audit.c| 10 +- kernel/audit.h| 2 +- kernel/auditsc.c | 6 +++--- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/linux/audit.h b/include/linux/audit.h index 6fdfefc..f830508 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -332,7 +332,7 @@ static inline void audit_ptrace(struct task_struct *t) /* Private API (for audit.c only) */ extern unsigned int audit_serial(void); extern int auditsc_get_stamp(struct audit_context *ctx, - struct timespec *t, unsigned int *serial); + struct timespec64 *t, unsigned int *serial); extern int audit_set_loginuid(kuid_t loginuid); static inline kuid_t audit_get_loginuid(struct task_struct *tsk) @@ -511,7 +511,7 @@ static inline void __audit_seccomp(unsigned long syscall, long signr, int code) static inline void audit_seccomp(unsigned long syscall, long signr, int code) { } static inline int auditsc_get_stamp(struct audit_context *ctx, - struct timespec *t, unsigned int *serial) + struct timespec64 *t, unsigned int *serial) { return 0; } diff --git a/kernel/audit.c b/kernel/audit.c index 2f4964c..fcbf377 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1625,10 +1625,10 @@ unsigned int audit_serial(void) } static inline void audit_get_stamp(struct audit_context *ctx, - struct timespec *t, unsigned int *serial) + struct timespec64 *t, unsigned int *serial) { if (!ctx || !auditsc_get_stamp(ctx, t, serial)) { - *t = CURRENT_TIME; + ktime_get_real_ts64(t); *serial = audit_serial(); } } @@ -1652,7 +1652,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type) { struct audit_buffer *ab; - struct timespec t; + struct timespec64 t; unsigned int uninitialized_var(serial); if (audit_initialized != AUDIT_INITIALIZED) @@ -1705,8 +1705,8 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, } audit_get_stamp(ab->ctx, &t, &serial); - audit_log_format(ab, "audit(%lu.%03lu:%u): ", -t.tv_sec, t.tv_nsec/100, serial); + audit_log_format(ab, "audit(%llu.%03lu:%u): ", +(unsigned long long)t.tv_sec, t.tv_nsec/100, serial); return ab; } diff --git a/kernel/audit.h b/kernel/audit.h index 0f1cf6d..cdf96f4 100644 --- a/kernel/audit.h +++ b/kernel/audit.h @@ -112,7 +112,7 @@ struct audit_context { enum audit_statestate, current_state; unsigned intserial; /* serial number for record */ int major; /* syscall number */ - struct timespec ctime; /* time of syscall entry */ + struct timespec64 ctime; /* time of syscall entry */ unsigned long argv[4];/* syscall arguments */ longreturn_code;/* syscall return code */ u64 prio; diff --git a/kernel/auditsc.c b/kernel/auditsc.c index e59ffc7..a2d9217 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1532,7 +1532,7 @@ void __audit_syscall_entry(int major, unsigned long a1, unsigned long a2, return; context->serial = 0; - context->ctime = CURRENT_TIME; + ktime_get_real_ts64(&context->ctime); context->in_syscall = 1; context->current_state = state; context->ppid = 0; @@ -1941,13 +1941,13 @@ EXPORT_SYMBOL_GPL(__audit_inode_child); /** * auditsc_get_stamp - get local copies of audit_context values * @ctx: audit_context for the task - * @t: timespec to store time recorded in the audit_context + * @t: timespec64 to store time recorded in the audit_context * @serial: serial value that is recorded in the audit_context * * Also sets the context as auditable. */ int auditsc_get_stamp(struct audit_context *ctx, - struct timespec *t, unsigned int *serial) + struct timespec64 *t, unsigned int *serial) { if (!ctx->in_syscall) return 0; -- 2.7.4
[PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
CURRENT_TIME is not y2038 safe. The macro will be deleted and all the references to it will be replaced by ktime_get_* apis. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as ceph uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. The current_fs_time() api is being changed to use vfs struct inode* as an argument instead of struct super_block*. Set the new mds client request r_stamp field using ktime_get_real_ts() instead of using current_fs_time(). Also, since r_stamp is used as mtime on the server, use timespec_trunc() to truncate the timestamp, using the right granularity from the superblock. This api will be transitioned to be y2038 safe along with vfs. Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann --- drivers/block/rbd.c | 2 +- fs/ceph/mds_client.c | 4 +++- net/ceph/messenger.c | 6 -- net/ceph/osd_client.c | 4 ++-- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 517838b..77204da 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) { struct ceph_osd_request *osd_req = obj_request->osd_req; - osd_req->r_mtime = CURRENT_TIME; + ktime_get_real_ts(&osd_req->r_mtime); osd_req->r_data_offset = obj_request->offset; } diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index c681762..1d3fa90 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1666,6 +1666,7 @@ struct ceph_mds_request * ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) { struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS); + struct timespec ts; if (!req) return ERR_PTR(-ENOMEM); @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item); - req->r_stamp = current_fs_time(mdsc->fsc->sb); + ktime_get_real_ts(&ts); + req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran); req->r_op = op; req->r_direct_mode = mode; diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index f76bb33..5766a6c 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1386,8 +1386,9 @@ static void prepare_write_keepalive(struct ceph_connection *con) dout("prepare_write_keepalive %p\n", con); con_out_kvec_reset(con); if (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2) { - struct timespec now = CURRENT_TIME; + struct timespec now; + ktime_get_real_ts(&now); con_out_kvec_add(con, sizeof(tag_keepalive2), &tag_keepalive2); ceph_encode_timespec(&con->out_temp_keepalive2, &now); con_out_kvec_add(con, sizeof(con->out_temp_keepalive2), @@ -3176,8 +3177,9 @@ bool ceph_con_keepalive_expired(struct ceph_connection *con, { if (interval > 0 && (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2)) { - struct timespec now = CURRENT_TIME; + struct timespec now; struct timespec ts; + ktime_get_real_ts(&now); jiffies_to_timespec(interval, &ts); ts = timespec_add(con->last_keepalive_ack, ts); return timespec_compare(&now, &ts) >= 0; diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index e15ea9e..242d7c0 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3574,7 +3574,7 @@ ceph_osdc_watch(struct ceph_osd_client *osdc, ceph_oid_copy(&lreq->t.base_oid, oid); ceph_oloc_copy(&lreq->t.base_oloc, oloc); lreq->t.flags = CEPH_OSD_FLAG_WRITE; - lreq->mtime = CURRENT_TIME; + ktime_get_real_ts(&lreq->mtime); lreq->reg_req = alloc_linger_request(lreq); if (!lreq->reg_req) { @@ -3632,7 +3632,7 @@ int ceph_osdc_unwatch(struct ceph_osd_client *osdc, ceph_oid_copy(&req->r_base_oid, &lreq->t.base_oid); ceph_oloc_copy(&req->r_base_oloc, &lreq->t.base_oloc); req->r_flags = CEPH_OSD_FLAG_WRITE; - req->r_mtime = CURRENT_TIME; + ktime_get_real_ts(&req->r_mtime); osd_req_op_watch_init(req, 0, lreq->linger_id, CEPH_OSD_WATCH_OP_UNWATCH); -- 2.7.4
[PATCH 01/12] fs: f2fs: Use ktime_get_real_seconds for sit_info times
CURRENT_TIME_SEC is not y2038 safe. Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment timestamps used by GC algorithm including the segment mtime timestamps. Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann --- fs/f2fs/segment.c | 2 +- fs/f2fs/segment.h | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 010324c..0531500 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2678,7 +2678,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi) sit_i->dirty_sentries = 0; sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK; sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time); - sit_i->mounted_time = CURRENT_TIME_SEC.tv_sec; + sit_i->mounted_time = ktime_get_real_seconds(); mutex_init(&sit_i->sentry_lock); return 0; } diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 57e36c1..156afc3 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -692,8 +692,9 @@ static inline void set_to_next_sit(struct sit_info *sit_i, unsigned int start) static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi) { struct sit_info *sit_i = SIT_I(sbi); - return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec - - sit_i->mounted_time; + time64_t now = ktime_get_real_seconds(); + + return sit_i->elapsed_time + now - sit_i->mounted_time; } static inline void set_summary(struct f2fs_summary *sum, nid_t nid, -- 2.7.4
[PATCH 00/12] Delete CURRENT_TIME, CURRENT_TIME_SEC and current_fs_time
The series contains the last unmerged uses of CURRENT_TIME, CURRENT_TIME_SEC, and current_fs_time(). The series also deletes these apis. All the patches except [PATCH 9/12] and [PATCH 10/12] are resend patches. These patches fix new instances of CURRENT_TIME. cifs and ceph patches have been squashed so that we have one patch per filesystem. We want to get these merged onto 4.12 release so that I can post the series that changes vfs timestamps to use 64 bits for 4.13 release. I'm proposing these to be merged through Andrew's tree. Filesystem maintainers, please let Andrew know if you will be picking up the patch in your trees. Let me know if anybody has other preferences for merging. Deepa Dinamani (12): fs: f2fs: Use ktime_get_real_seconds for sit_info times trace: Make trace_hwlat timestamp y2038 safe fs: cifs: Replace CURRENT_TIME by other appropriate apis fs: ceph: CURRENT_TIME with ktime_get_real_ts() fs: ufs: Use ktime_get_real_ts64() for birthtime audit: Use timespec64 to represent audit timestamps fs: btrfs: Use ktime_get_real_ts for root ctime fs: ubifs: Replace CURRENT_TIME_SEC with current_time lustre: Replace CURRENT_TIME macro apparmorfs: Replace CURRENT_TIME with current_time() time: Delete CURRENT_TIME_SEC and CURRENT_TIME time: Delete current_fs_time() function drivers/block/rbd.c | 2 +- drivers/staging/lustre/lustre/llite/llite_lib.c | 6 +++--- drivers/staging/lustre/lustre/osc/osc_io.c | 4 ++-- fs/btrfs/root-tree.c| 3 ++- fs/ceph/mds_client.c| 4 +++- fs/cifs/cifsencrypt.c | 4 +++- fs/cifs/cifssmb.c | 10 - fs/cifs/inode.c | 28 + fs/f2fs/segment.c | 2 +- fs/f2fs/segment.h | 5 +++-- fs/ubifs/dir.c | 12 +-- fs/ubifs/file.c | 12 +-- fs/ubifs/ioctl.c| 2 +- fs/ubifs/misc.h | 10 - fs/ubifs/sb.c | 14 + fs/ubifs/xattr.c| 6 +++--- fs/ufs/ialloc.c | 6 -- include/linux/audit.h | 4 ++-- include/linux/fs.h | 1 - include/linux/time.h| 3 --- kernel/audit.c | 10 - kernel/audit.h | 2 +- kernel/auditsc.c| 6 +++--- kernel/time/time.c | 14 - kernel/trace/trace_entries.h| 6 +++--- kernel/trace/trace_hwlat.c | 14 ++--- kernel/trace/trace_output.c | 9 net/ceph/messenger.c| 6 -- net/ceph/osd_client.c | 4 ++-- security/apparmor/apparmorfs.c | 2 +- 30 files changed, 100 insertions(+), 111 deletions(-) -- 2.7.4
Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
Hi Sebastian, On 09/09/2016 07:46 AM, Grygorii Strashko wrote: > On 09/08/2016 07:00 PM, Sebastian Andrzej Siewior wrote: >> On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote: >>> I've collected trace before first occurrence of "NOHZ: >>> local_softirq_pending 80" >>> > >> >>> irq/354-4848400-85[000]90.642393: softirq_exit: vec=3 >>> [action=NET_RX] >>> irq/354-4848400-85[000]90.642419: sched_switch: >>> irq/354-4848400:85 [49] S ==> rcuc/0:11 [98] >> >> We don't serve TIMER & SCHED because those two are pushed to the >> ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to >> the next best thing which is RCU. >> >>> rcuc/0-11[000]90.642430: irq_handler_entry:irq=354 >>> name=48484000.ethernet >> but not for long. >> >>> rcuc/0-11[000]90.642432: irq_handler_exit: irq=354 >>> ret=handled >>> rcuc/0-11[000]90.642435: sched_waking: >>> comm=irq/354-4848400 pid=85 prio=49 target_cpu=000 >>> rcuc/0-11[000]90.642442: sched_migrate_task: >>> comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1 >>> rcuc/0-11[000]90.642453: sched_wakeup: >>> irq/354-4848400:85 [49] success=1 CPU:001 >>>iperf-1284 [001]90.642462: sched_stat_runtime: comm=iperf >>> pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns] >>> rcuc/0-11[000]90.642464: irq_handler_entry:irq=355 >>> name=48484000.ethernet >>> rcuc/0-11[000]90.642466: irq_handler_exit: irq=355 >>> ret=handled >>> rcuc/0-11[000]90.642469: sched_waking: >>> comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 >>>iperf-1284 [001]90.642473: sched_switch: iperf:1284 >>> [120] R ==> irq/354-4848400:85 [49] >>> irq/354-4848400-85[001]90.642481: softirq_raise:vec=3 >>> [action=NET_RX] >>> rcuc/0-11[000]90.642483: sched_wakeup: >>> irq/355-4848400:86 [49] success=1 CPU:001 >>> irq/354-4848400-85[001]90.642493: softirq_entry:vec=3 >>> [action=NET_RX] >>> rcuc/0-11[000]90.642497: sched_migrate_task: >>> comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0 >> ach that IRQ thread no pinned. Good. We migrate. >> > > It looks like scheduler playing ping-pong between CPUs with threaded irqs > irq/354-355. > And seems this might be the case - if I pin both threaded IRQ handlers to CPU0 > I can see better latency and netperf improvement > cyclictest -m -Sp98 -q -D4m > T: 0 ( 1318) P:98 I:1000 C: 24 Min: 9 Act: 14 Avg: 15 Max: > 42 > T: 1 ( 1319) P:98 I:1500 C: 159909 Min: 9 Act: 14 Avg: 16 Max: > 39 > > if I arrange hwirqs and pin pin both threaded IRQ handlers on CPU1 > I can observe more less similar results as with this patch. > > >>> rcuc/0-11[000]90.642515: irq_handler_entry:irq=354 >>> name=48484000.ethernet >>> rcuc/0-11[000]90.642516: irq_handler_exit: irq=354 >>> ret=handled > >> >> As you see ksoftirqd left the CPU with a D so I would assume it is >> blocked on a lock and waits. >> NET_RX is in progress but scheduled out due to RCUC which is also >> scheduled out. >> >> I assume we got to softirq because nothing else can run. It will see >> that NET_RX is pending and tries it but blocks on the lock >> (lock_softirq()). It schedules out. Nothing left -> idle. >> >> The idle code checks to see if a softirq is pending and in fact there is >> SCHED on the list and ksoftirq was about to handle it but due to >> ordering complication (NET_RX before SCHED) it can't. And we have the >> warning. >> >> This >> >> --- a/kernel/softirq.c >> +++ b/kernel/softirq.c >> @@ -105,6 +105,7 @@ void softirq_check_pending_idle(void) >> { >> static int rate_limit; >> struct softirq_runner *sr = this_cpu_ptr(&softirq_runners); >> +struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd); >> u32 warnpending; >> int i; >> >> @@ -112,7 +113,7 @@ void softirq_check_pending_idle(void) >> return; >> >> warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK; >> -for (i = 0; i < NR_SOFTIRQS; i++) { >> +for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) { >> struct task_struct *tsk = sr->runner[i]; >> >> /* >> @@ -132,6 +133,15 @@ void softirq_check_pending_idle(void) >> } >> } >> >> +if (warnpending && ksoft_tsk) { >> +raw_spin_lock(&ksoft_tsk->pi_lock); >> +if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == >> TASK_RUNNING) { >> +/* Clear all bits pending in that task */ >> +warnpending &= ~(ksoft_tsk->softirqs_raised); >> +} >> +raw_spin_unlock(&ksoft_tsk->pi_lock); >> +} >> + >> if (warnpending) { >> printk(K
Re: [PATCH 02/12] ftgmac100: Move ftgmac100_hard_start_xmit() around
On Fri, 2017-04-07 at 05:49 -0700, David Miller wrote: > > /* Tx ring */ > > + struct sk_buff *tx_skbs[TX_QUEUE_ENTRIES]; > > unsigned int tx_clean_pointer; > > unsigned int tx_pointer; > > unsigned int tx_pending; > > Add this only in patch #6 or wherever it is that you start using it, > not this patch Yup, just an accident, funny that I didn't notice it (I did review my own series ;-) Thanks ! Ben.
Re: [PATCH 11/12] ftgmac100: Add support for fragmented tx
On Fri, 2017-04-07 at 06:26 -0700, Florian Fainelli wrote: > > On 04/06/2017 08:31 PM, Benjamin Herrenschmidt wrote: > > Add NETIF_F_SG and create multiple TX ring entries for skb fragments. > > > > On reclaim, the skb is only freed on the segment marked as "last". > > > > > > Signed-off-by: Benjamin Herrenschmidt > > > > [snip] > > > > > > - dma_unmap_single(priv->dev, map, skb_headlen(skb), > > > > DMA_TO_DEVICE); > > > > + if (skb_shinfo(skb)->nr_frags == 0 && len < ETH_ZLEN) > > + len = ETH_ZLEN; > > This is where skb_put_padto() would help you eliminate this test since > you'd be dealing skb->len >= ETH_ZLEN. Ok, thanks. > > + dma_unmap_single(priv->dev, map, len, DMA_TO_DEVICE); > > > > + } else { > > > > + dma_unmap_page(priv->dev, map, > > > > + ftgmac100_txdes_get_buffer_size(txdes), > > > > + DMA_TO_DEVICE); > > > > + } > > > > > > - dev_kfree_skb(skb); > > > > + if (ftgmac100_txdes_get_last_segment(txdes)) > > + dev_kfree_skb(skb); > > This makes you do an uncached access to the descriptor, right? is there > a way you could use bookeeping information to free the last fragment? Not a big deal, it's handled in a subsequent patch. I have to read the descriptor first word anyway to know the packet has been completed, a further patch I just pass that info along to ftgmac100_free_tx_packet > > priv->tx_skbs[pointer] = NULL; > > > > > > /* Clear txdes0 except end of ring bit, clear txdes1 as we > > @@ -623,10 +642,9 @@ static void ftgmac100_tx_complete(struct ftgmac100 > > *priv) > > static int ftgmac100_hard_start_xmit(struct sk_buff *skb, > > > > struct net_device *netdev) > > { > > > > - unsigned int len = (skb->len < ETH_ZLEN) ? ETH_ZLEN : skb->len; > > > > struct ftgmac100 *priv = netdev_priv(netdev); > > > > - struct ftgmac100_txdes *txdes; > > > > - unsigned int pointer; > > > > + struct ftgmac100_txdes *txdes, *first; > > > > + unsigned int pointer, nfrags, len, i, j; > > > > dma_addr_t map; > > > > > > /* The HW doesn't pad small frames */ > > @@ -642,26 +660,35 @@ static int ftgmac100_hard_start_xmit(struct sk_buff > > *skb, > > > > goto drop; > > > > } > > > > > > - map = dma_map_single(priv->dev, skb->data, skb_headlen(skb), > > > > DMA_TO_DEVICE); > > > > - if (unlikely(dma_mapping_error(priv->dev, map))) { > > > > - /* drop packet */ > > > > + /* Do we have a limit on #fragments ? I yet have to get a reply > > > > + * from Aspeed. If there's one I haven't hit it. > > > > + */ > > > > + nfrags = skb_shinfo(skb)->nr_frags; > > + > > > > + /* Get header len and pad for non-fragmented packets */ > > > > + len = skb_headlen(skb); > > > > + if (nfrags == 0 && len < ETH_ZLEN) > > + len = ETH_ZLEN; > > Same here skb_put_padto() would eliminate the test. Yup, I'll fix that, thx. > [snip] > > > > > + dma_err: > > > > + if (net_ratelimit()) > > + netdev_err(netdev, "map tx fragment failed\n"); > > You may consider adding a software counter that tracks mapping failures > (few drivers do that) in a subsequent set of changes. Ok. I want to add a bunch of SW counters for other things too so I'll add to the list. Cheers, Ben.
Re: [PATCH 05/12] ftgmac100: Pad small frames properly
On Fri, 2017-04-07 at 06:05 -0700, Florian Fainelli wrote: > You may want to use skb_put_padto() which also takes care of bumping > skb->len accordingly, in case that makes a difference for the > ftgmac100 hardware. Subsequent patch adds fragments, so it needs to bump headlen. I'll have a look this week-end, thanks. Cheers, Ben.
Re: [PATCH 02/12] ftgmac100: Move ftgmac100_hard_start_xmit() around
On Fri, 2017-04-07 at 13:09 +0300, Sergei Shtylyov wrote: > Hello! > > On 4/7/2017 6:30 AM, Benjamin Herrenschmidt wrote: > > > Move it below ftgmac100_xmit() and the rest of the tx path > > > > No code change. > > > > Signed-off-by: Benjamin Herrenschmidt > > --- > > drivers/net/ethernet/faraday/ftgmac100.c | 59 -- > > -- > > 1 file changed, 30 insertions(+), 29 deletions(-) > > > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c > > b/drivers/net/ethernet/faraday/ftgmac100.c > > index f4b719214..3966c7a 100644 > > --- a/drivers/net/ethernet/faraday/ftgmac100.c > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c > > @@ -63,6 +63,7 @@ struct ftgmac100 { > > u32 rxdes0_edorr_mask; > > > > /* Tx ring */ > > + struct sk_buff *tx_skbs[TX_QUEUE_ENTRIES]; > > "No code change", really? Patch splitting accident, thanks for pointing it out. I'll fix it up in the next round. > > > unsigned int tx_clean_pointer; > > unsigned int tx_pointer; > > unsigned int tx_pending; > > [...] > > MBR, Sergei
[PATCH net-next] gso: Support frag_list splitting with head_frag
From: Ilan Tayari A driver may use build_skb() for received packets. These SKBs then have a head_frag. Since commit d7e8883cfcf4 ("net: make GRO aware of skb->head_frag"), GRO may build frag_list SKBs out of head_frag received SKBs. In such a case, the chained SKBs end up with a head_frag. Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") adds partial segmentation of frag_list SKB chains into individual SKBs. However, this is not done if the chained SKBs have any linear part, because the device may not be able to DMA the private linear buffer. A chained frag_list SKB with head_frag is wrongfully detected in this case as having a private linear part and thus falls back to software GSO, while in fact the linear part is backed by a DMA page just like any other frag. This causes low performance when forwarding those packets that were built with build_skb() Allow partial segmentation at the frag_list pointer for chained SKBs with head_frag. Note that such SKBs can only be created by GRO, when applied to received packets with head_frag. Also note that this change only affects the data path that performs the partial segmentation at frag_list pointer, and not any of the other more common data paths. Signed-off-by: Ilan Tayari --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9f781092fda9..5d9a11eafbf5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3093,7 +3093,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, * containing the same amount of data. */ skb_walk_frags(head_skb, iter) { - if (skb_headlen(iter)) + if (skb_headlen(iter) && !iter->head_frag) goto normal; len -= iter->len; -- 2.11.0
[PATCH net-next 4/8] rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event
Changing the master device for a link generates many messages; the one generated for POST_TYPE_CHANGE is redundant: [LINK]11: dummy1: mtu 1500 qdisc noqueue master br1 state UNKNOWN group default event POST_TYPE_CHANGE link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff [LINK]11: dummy1: mtu 1500 qdisc noqueue master br1 state UNKNOWN group default link/ether 02:02:02:02:02:03 brd ff:ff:ff:ff:ff:ff Remove POST_TYPE_CHANGE from the list of notifiers that generate notifications. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 2 files changed, 5 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index df5ade1bc684..80f3574a4630 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -906,7 +906,6 @@ enum { IFLA_EVENT_CHANGE_NAME, IFLA_EVENT_FEAT_CHANGE, IFLA_EVENT_BONDING_FAILOVER, - IFLA_EVENT_POST_TYPE_CHANGE, IFLA_EVENT_NOTIFY_PEERS, IFLA_EVENT_CHANGE_UPPER, IFLA_EVENT_RESEND_IGMP, diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 1503138ebfe1..739b06ac3e7f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1294,9 +1294,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_BONDING_FAILOVER: rtnl_event = IFLA_EVENT_BONDING_FAILOVER; break; - case NETDEV_POST_TYPE_CHANGE: - rtnl_event = IFLA_EVENT_POST_TYPE_CHANGE; - break; case NETDEV_NOTIFY_PEERS: rtnl_event = IFLA_EVENT_NOTIFY_PEERS; break; @@ -4182,7 +4179,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_CHANGENAME: case NETDEV_FEAT_CHANGE: case NETDEV_BONDING_FAILOVER: - case NETDEV_POST_TYPE_CHANGE: case NETDEV_NOTIFY_PEERS: case NETDEV_CHANGEUPPER: case NETDEV_RESEND_IGMP: -- 2.1.4
[PATCH net-next 8/8] rtnetlink: Do not generate notifications for CHANGELOWERSTATE event
CHANGELOWERSTATE is an internal event; do not generate userspace notifications. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 2 files changed, 5 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8f23e9dde667..37e19966fbf3 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -908,7 +908,6 @@ enum { IFLA_EVENT_BONDING_FAILOVER, IFLA_EVENT_CHANGE_UPPER, IFLA_EVENT_RESEND_IGMP, - IFLA_EVENT_CHANGE_LOWER_STATE, IFLA_EVENT_CHANGE_TX_QUEUE_LEN, }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index d2587aa339c4..21c03861aaf3 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1300,9 +1300,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_RESEND_IGMP: rtnl_event = IFLA_EVENT_RESEND_IGMP; break; - case NETDEV_CHANGELOWERSTATE: - rtnl_event = IFLA_EVENT_CHANGE_LOWER_STATE; - break; case NETDEV_CHANGE_TX_QUEUE_LEN: rtnl_event = IFLA_EVENT_CHANGE_TX_QUEUE_LEN; break; @@ -4172,7 +4169,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_BONDING_FAILOVER: case NETDEV_CHANGEUPPER: case NETDEV_RESEND_IGMP: - case NETDEV_CHANGELOWERSTATE: case NETDEV_CHANGE_TX_QUEUE_LEN: rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL); break; -- 2.1.4
[PATCH net-next 5/8] rtnetlink: Remove NETDEV_CHANGEINFODATA netdev event
Nobody cares about the event, so don't send it. Signed-off-by: David Ahern --- drivers/net/bonding/bond_options.c | 2 -- include/linux/netdevice.h | 1 - include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 4 files changed, 8 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 1bcbb8913e17..533518a64496 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -673,8 +673,6 @@ int __bond_opt_set(struct bonding *bond, out: if (ret) bond_opt_error_interpret(bond, opt, ret, val); - else if (bond->dev->reg_state == NETREG_REGISTERED) - call_netdevice_notifiers(NETDEV_CHANGEINFODATA, bond->dev); return ret; } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cc07c3be2705..bd0d26fbec2b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2279,7 +2279,6 @@ struct netdev_lag_lower_state_info { #define NETDEV_CHANGEUPPER 0x0015 #define NETDEV_RESEND_IGMP 0x0016 #define NETDEV_PRECHANGEMTU0x0017 /* notify before mtu change happened */ -#define NETDEV_CHANGEINFODATA 0x0018 #define NETDEV_BONDING_INFO0x0019 #define NETDEV_PRECHANGEUPPER 0x001A #define NETDEV_CHANGELOWERSTATE0x001B diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 80f3574a4630..a500a12d0131 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -909,7 +909,6 @@ enum { IFLA_EVENT_NOTIFY_PEERS, IFLA_EVENT_CHANGE_UPPER, IFLA_EVENT_RESEND_IGMP, - IFLA_EVENT_CHANGE_INFO_DATA, IFLA_EVENT_PRE_CHANGE_UPPER, IFLA_EVENT_CHANGE_LOWER_STATE, IFLA_EVENT_CHANGE_TX_QUEUE_LEN, diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 739b06ac3e7f..702723c5db58 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1303,9 +1303,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_RESEND_IGMP: rtnl_event = IFLA_EVENT_RESEND_IGMP; break; - case NETDEV_CHANGEINFODATA: - rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA; - break; case NETDEV_PRECHANGEUPPER: rtnl_event = IFLA_EVENT_PRE_CHANGE_UPPER; break; @@ -4182,7 +4179,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_NOTIFY_PEERS: case NETDEV_CHANGEUPPER: case NETDEV_RESEND_IGMP: - case NETDEV_CHANGEINFODATA: case NETDEV_PRECHANGEUPPER: case NETDEV_CHANGELOWERSTATE: case NETDEV_CHANGE_TX_QUEUE_LEN: -- 2.1.4
[PATCH net-next 7/8] rtnetlink: Do not generate notifications for NOTIFY_PEERS event
NOTIFY_PEERS is an internal event; do not generate userspace notifications. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 2 files changed, 5 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 4fa3bf3eb21d..8f23e9dde667 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -906,7 +906,6 @@ enum { IFLA_EVENT_CHANGE_NAME, IFLA_EVENT_FEAT_CHANGE, IFLA_EVENT_BONDING_FAILOVER, - IFLA_EVENT_NOTIFY_PEERS, IFLA_EVENT_CHANGE_UPPER, IFLA_EVENT_RESEND_IGMP, IFLA_EVENT_CHANGE_LOWER_STATE, diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index e2b0ec5174e7..d2587aa339c4 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1294,9 +1294,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_BONDING_FAILOVER: rtnl_event = IFLA_EVENT_BONDING_FAILOVER; break; - case NETDEV_NOTIFY_PEERS: - rtnl_event = IFLA_EVENT_NOTIFY_PEERS; - break; case NETDEV_CHANGEUPPER: rtnl_event = IFLA_EVENT_CHANGE_UPPER; break; @@ -4173,7 +4170,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_CHANGENAME: case NETDEV_FEAT_CHANGE: case NETDEV_BONDING_FAILOVER: - case NETDEV_NOTIFY_PEERS: case NETDEV_CHANGEUPPER: case NETDEV_RESEND_IGMP: case NETDEV_CHANGELOWERSTATE: -- 2.1.4
[PATCH net-next 6/8] rtnetlink: Do not generate notifications for PRECHANGEUPPER event
PRECHANGEUPPER is an internal event; do not generate userspace notifications. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 2 files changed, 5 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index a500a12d0131..4fa3bf3eb21d 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -909,7 +909,6 @@ enum { IFLA_EVENT_NOTIFY_PEERS, IFLA_EVENT_CHANGE_UPPER, IFLA_EVENT_RESEND_IGMP, - IFLA_EVENT_PRE_CHANGE_UPPER, IFLA_EVENT_CHANGE_LOWER_STATE, IFLA_EVENT_CHANGE_TX_QUEUE_LEN, }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 702723c5db58..e2b0ec5174e7 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1303,9 +1303,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_RESEND_IGMP: rtnl_event = IFLA_EVENT_RESEND_IGMP; break; - case NETDEV_PRECHANGEUPPER: - rtnl_event = IFLA_EVENT_PRE_CHANGE_UPPER; - break; case NETDEV_CHANGELOWERSTATE: rtnl_event = IFLA_EVENT_CHANGE_LOWER_STATE; break; @@ -4179,7 +4176,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_NOTIFY_PEERS: case NETDEV_CHANGEUPPER: case NETDEV_RESEND_IGMP: - case NETDEV_PRECHANGEUPPER: case NETDEV_CHANGELOWERSTATE: case NETDEV_CHANGE_TX_QUEUE_LEN: rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL); -- 2.1.4
[PATCH net-next 2/8] rtnetlink: Do not generate notification for UDP_TUNNEL_PUSH_INFO
NETDEV_UDP_TUNNEL_PUSH_INFO is an internal notifier; nothing userspace can do so don't generate a netlink notification. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 2 files changed, 5 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index e8b7e9342cc0..064218a840b7 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -914,7 +914,6 @@ enum { IFLA_EVENT_CHANGE_INFO_DATA, IFLA_EVENT_PRE_CHANGE_UPPER, IFLA_EVENT_CHANGE_LOWER_STATE, - IFLA_EVENT_UDP_TUNNEL_PUSH_INFO, IFLA_EVENT_CHANGE_TX_QUEUE_LEN, }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 72d365ae14b3..1a46be074fd6 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1318,9 +1318,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_CHANGELOWERSTATE: rtnl_event = IFLA_EVENT_CHANGE_LOWER_STATE; break; - case NETDEV_UDP_TUNNEL_PUSH_INFO: - rtnl_event = IFLA_EVENT_UDP_TUNNEL_PUSH_INFO; - break; case NETDEV_CHANGE_TX_QUEUE_LEN: rtnl_event = IFLA_EVENT_CHANGE_TX_QUEUE_LEN; break; @@ -4196,7 +4193,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_CHANGEINFODATA: case NETDEV_PRECHANGEUPPER: case NETDEV_CHANGELOWERSTATE: - case NETDEV_UDP_TUNNEL_PUSH_INFO: case NETDEV_CHANGE_TX_QUEUE_LEN: rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL); break; -- 2.1.4
[PATCH net-next 1/8] rtnetlink: Do not generate notifications for MTU events
Changing MTU on a link currently causes 3 messages to be sent to userspace: [LINK]11: dummy1: mtu 1490 qdisc noqueue state UNKNOWN group default event PRE_CHANGE_MTU link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff [LINK]11: dummy1: mtu 1500 qdisc noqueue state UNKNOWN group default event CHANGE_MTU link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff [LINK]11: dummy1: mtu 1500 qdisc noqueue state UNKNOWN group default link/ether f2:52:5c:6d:21:f3 brd ff:ff:ff:ff:ff:ff Remove the PRE_CHANGE_MTU and CHANGE_MTU messages. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 2 -- net/core/rtnetlink.c | 8 2 files changed, 10 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 97f6d302f627..e8b7e9342cc0 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -903,7 +903,6 @@ enum { enum { IFLA_EVENT_UNSPEC, IFLA_EVENT_REBOOT, - IFLA_EVENT_CHANGE_MTU, IFLA_EVENT_CHANGE_ADDR, IFLA_EVENT_CHANGE_NAME, IFLA_EVENT_FEAT_CHANGE, @@ -912,7 +911,6 @@ enum { IFLA_EVENT_NOTIFY_PEERS, IFLA_EVENT_CHANGE_UPPER, IFLA_EVENT_RESEND_IGMP, - IFLA_EVENT_PRE_CHANGE_MTU, IFLA_EVENT_CHANGE_INFO_DATA, IFLA_EVENT_PRE_CHANGE_UPPER, IFLA_EVENT_CHANGE_LOWER_STATE, diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index b2bd4c9ee860..72d365ae14b3 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1285,9 +1285,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_REBOOT: rtnl_event = IFLA_EVENT_REBOOT; break; - case NETDEV_CHANGEMTU: - rtnl_event = IFLA_EVENT_CHANGE_MTU; - break; case NETDEV_CHANGEADDR: rtnl_event = IFLA_EVENT_CHANGE_ADDR; break; @@ -1312,9 +1309,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_RESEND_IGMP: rtnl_event = IFLA_EVENT_RESEND_IGMP; break; - case NETDEV_PRECHANGEMTU: - rtnl_event = IFLA_EVENT_PRE_CHANGE_MTU; - break; case NETDEV_CHANGEINFODATA: rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA; break; @@ -4191,7 +4185,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi switch (event) { case NETDEV_REBOOT: - case NETDEV_CHANGEMTU: case NETDEV_CHANGEADDR: case NETDEV_CHANGENAME: case NETDEV_FEAT_CHANGE: @@ -4200,7 +4193,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_NOTIFY_PEERS: case NETDEV_CHANGEUPPER: case NETDEV_RESEND_IGMP: - case NETDEV_PRECHANGEMTU: case NETDEV_CHANGEINFODATA: case NETDEV_PRECHANGEUPPER: case NETDEV_CHANGELOWERSTATE: -- 2.1.4
[PATCH net-next 3/8] rtnetlink: Do not generate notifications for CHANGEADDR event
Changing hardware address generates redundant messages: [LINK]11: dummy1: mtu 1500 qdisc noqueue state UNKNOWN group default event CHANGE_ADDR link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff [LINK]11: dummy1: mtu 1500 qdisc noqueue state UNKNOWN group default link/ether 02:02:02:02:02:02 brd ff:ff:ff:ff:ff:ff Do not send a notification for the CHANGEADDR notifier. Signed-off-by: David Ahern --- include/uapi/linux/if_link.h | 1 - net/core/rtnetlink.c | 4 2 files changed, 5 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 064218a840b7..df5ade1bc684 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -903,7 +903,6 @@ enum { enum { IFLA_EVENT_UNSPEC, IFLA_EVENT_REBOOT, - IFLA_EVENT_CHANGE_ADDR, IFLA_EVENT_CHANGE_NAME, IFLA_EVENT_FEAT_CHANGE, IFLA_EVENT_BONDING_FAILOVER, diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 1a46be074fd6..1503138ebfe1 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1285,9 +1285,6 @@ static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event) case NETDEV_REBOOT: rtnl_event = IFLA_EVENT_REBOOT; break; - case NETDEV_CHANGEADDR: - rtnl_event = IFLA_EVENT_CHANGE_ADDR; - break; case NETDEV_CHANGENAME: rtnl_event = IFLA_EVENT_CHANGE_NAME; break; @@ -4182,7 +4179,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi switch (event) { case NETDEV_REBOOT: - case NETDEV_CHANGEADDR: case NETDEV_CHANGENAME: case NETDEV_FEAT_CHANGE: case NETDEV_BONDING_FAILOVER: -- 2.1.4
[PATCH net-next 0/8] rtnetlink: Cleanup user notifications for netdev events
Vlad's recent patch to add the event type to rtnetlink notifications points out a number of redundant or unnecessary notifications sent to userspace for events that are essentially internal to the kernel. Trim the list before the new IFLA attributes become part of the UAPI. David Ahern (8): rtnetlink: Do not generate notifications for MTU events rtnetlink: Do not generate notification for UDP_TUNNEL_PUSH_INFO rtnetlink: Do not generate notifications for CHANGEADDR event rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event rtnetlink: Remove NETDEV_CHANGEINFODATA netdev event rtnetlink: Do not generate notifications for PRECHANGEUPPER event rtnetlink: Do not generate notifications for NOTIFY_PEERS event rtnetlink: Do not generate notifications for CHANGELOWERSTATE event drivers/net/bonding/bond_options.c | 2 -- include/linux/netdevice.h | 1 - include/uapi/linux/if_link.h | 9 - net/core/rtnetlink.c | 36 4 files changed, 48 deletions(-) -- 2.1.4
Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications
From: Florian Fainelli Date: Fri, 7 Apr 2017 13:41:51 -0700 > This patch series does factor the common code found in all tag implementations > into dsa_switch_rcv(). The original motivation was to add GRO support, but > this > may be a lot of work with unclear benefits at this point. Let's get the other new DSA driver merged and then you can respin this and make sure all the new tag drivers get handled properly.
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Fri, Apr 7, 2017 at 5:10 PM, Michael S. Tsirkin wrote: > On Fri, Apr 07, 2017 at 04:59:58PM -0400, Willem de Bruijn wrote: >> On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin wrote: >> > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote: >> >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin >> >> wrote: >> >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: >> >> >> From: Willem de Bruijn >> >> >> >> >> >> Amortize the cost of virtual interrupts by doing both rx and tx work >> >> >> on reception of a receive interrupt if tx napi is enabled. With >> >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion >> >> >> interrupts for bidirectional workloads. >> >> >> >> >> >> Signed-off-by: Willem de Bruijn >> > >> > This is a popular approach, but I think this will only work well if tx >> > and rx interrupts are processed on the same CPU and if tx queue is per >> > cpu. If they target different CPUs or if tx queue is used from multiple >> > CPUs they will conflict on the shared locks. >> >> Yes. As a result of this discussion I started running a few vcpu affinity >> tests. >> >> > This can even change dynamically as CPUs/queues are reconfigured. >> > How about adding a flag and skipping the tx poll if there's no match? >> >> I suspect that even with the cache invalidations this optimization >> will be an improvement over handling all tx interrupts in the tx napi >> handler. I will get the datapoint for that. >> >> That said, we can make this conditional. What flag exactly do you >> propose? Compare raw_smp_processor_id() in the rx softint with one >> previously stored in the napi tx callback? > > I'm not sure. Another idea is to check vi->affinity_hint_set. > If set we know rq and sq are on the same CPU. I was not aware of that flag, thanks. Yes, that looks like it should work.
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Fri, Apr 07, 2017 at 04:59:58PM -0400, Willem de Bruijn wrote: > On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin wrote: > > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote: > >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin > >> wrote: > >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: > >> >> From: Willem de Bruijn > >> >> > >> >> Amortize the cost of virtual interrupts by doing both rx and tx work > >> >> on reception of a receive interrupt if tx napi is enabled. With > >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion > >> >> interrupts for bidirectional workloads. > >> >> > >> >> Signed-off-by: Willem de Bruijn > > > > This is a popular approach, but I think this will only work well if tx > > and rx interrupts are processed on the same CPU and if tx queue is per > > cpu. If they target different CPUs or if tx queue is used from multiple > > CPUs they will conflict on the shared locks. > > Yes. As a result of this discussion I started running a few vcpu affinity > tests. > > The data is not complete. In particular, I don't have the data yet to > compare having tx and rx irq on the same cpu (0,0) vs on different > (0,2) for this patchset. Which is the relevant data to your point. > > Initial results for unmodified upstream driver at {1, 10, 100}x > TCP_STREAM, for irq cpu affinity (rx,tx). Process is always pinned to cpu > 1. This is a 4 vcpu system pinned by the host to 4 cores on the same > socket. The previously reported results were obtained with txq, rtx > and process on different vcpus (0,2). Running all on the same vcpu > lower cycle count considerably: > > irq 0,0 > 1throughput_Mbps=29767.14 391,488,924,526 cycles > 10 throughput_Mbps=40808.64 424,530,251,896 cycles > 100 throughput_Mbps=33475.13 414,622,071,167 cycles > > irq 0,2 > 1 throughput_Mbps=30176.05 395,673,200,747 cycles > 10 throughput_Mbps=40729.26 433,948,374,991 cycles > 100 throughput_Mbps=33758.68 436,291,949,393 cycles > > irq 1,1 > 1throughput_Mbps=26635.20 269,071,002,844 cycles > 10 throughput_Mbps=42385.05 299,945,944,516 cycles > 100 throughput_Mbps=33580.98 283,272,895,507 cycles > > With this patch set applied, cpu (1,1) > > 1 throughput_Mbps=34980.76 276,504,805,414 cycles > 10 throughput_Mbps=42519.92 298,105,889,785 cycles > 100 throughput_Mbps=35268.86 296,670,598,712 cycles > > I will need to get data for (0,2) vs (0,0). > > > This can even change dynamically as CPUs/queues are reconfigured. > > How about adding a flag and skipping the tx poll if there's no match? > > I suspect that even with the cache invalidations this optimization > will be an improvement over handling all tx interrupts in the tx napi > handler. I will get the datapoint for that. > > That said, we can make this conditional. What flag exactly do you > propose? Compare raw_smp_processor_id() in the rx softint with one > previously stored in the napi tx callback? I'm not sure. Another idea is to check vi->affinity_hint_set. If set we know rq and sq are on the same CPU. -- MST
Re: [PATCH] net: thunderx: Enable TSO and checksum offloads for ipv6
From: dev.srinivas...@gmail.com Date: Thu, 6 Apr 2017 16:12:26 +0530 > From: Thanneeru Srinivasulu > > Adding support for TSO and checksum hardware offloads for ipv6. > > Signed-off-by: Thanneeru Srinivasulu > Signed-off-by: Sunil Goutham Applied to net-next, thanks.
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin wrote: > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote: >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin wrote: >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: >> >> From: Willem de Bruijn >> >> >> >> Amortize the cost of virtual interrupts by doing both rx and tx work >> >> on reception of a receive interrupt if tx napi is enabled. With >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion >> >> interrupts for bidirectional workloads. >> >> >> >> Signed-off-by: Willem de Bruijn > > This is a popular approach, but I think this will only work well if tx > and rx interrupts are processed on the same CPU and if tx queue is per > cpu. If they target different CPUs or if tx queue is used from multiple > CPUs they will conflict on the shared locks. Yes. As a result of this discussion I started running a few vcpu affinity tests. The data is not complete. In particular, I don't have the data yet to compare having tx and rx irq on the same cpu (0,0) vs on different (0,2) for this patchset. Which is the relevant data to your point. Initial results for unmodified upstream driver at {1, 10, 100}x TCP_STREAM, for irq cpu affinity (rx,tx). Process is always pinned to cpu 1. This is a 4 vcpu system pinned by the host to 4 cores on the same socket. The previously reported results were obtained with txq, rtx and process on different vcpus (0,2). Running all on the same vcpu lower cycle count considerably: irq 0,0 1throughput_Mbps=29767.14 391,488,924,526 cycles 10 throughput_Mbps=40808.64 424,530,251,896 cycles 100 throughput_Mbps=33475.13 414,622,071,167 cycles irq 0,2 1 throughput_Mbps=30176.05 395,673,200,747 cycles 10 throughput_Mbps=40729.26 433,948,374,991 cycles 100 throughput_Mbps=33758.68 436,291,949,393 cycles irq 1,1 1throughput_Mbps=26635.20 269,071,002,844 cycles 10 throughput_Mbps=42385.05 299,945,944,516 cycles 100 throughput_Mbps=33580.98 283,272,895,507 cycles With this patch set applied, cpu (1,1) 1 throughput_Mbps=34980.76 276,504,805,414 cycles 10 throughput_Mbps=42519.92 298,105,889,785 cycles 100 throughput_Mbps=35268.86 296,670,598,712 cycles I will need to get data for (0,2) vs (0,0). > This can even change dynamically as CPUs/queues are reconfigured. > How about adding a flag and skipping the tx poll if there's no match? I suspect that even with the cache invalidations this optimization will be an improvement over handling all tx interrupts in the tx napi handler. I will get the datapoint for that. That said, we can make this conditional. What flag exactly do you propose? Compare raw_smp_processor_id() in the rx softint with one previously stored in the napi tx callback?
Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications
> Yes, that's a good point, this just became slightly obsolete since the > mediatek tag code got included. Yes, rebase, add the received tags, and Dave can merge it before the next talk ends :-) Andrew
Re: [PATCH net-next 2/3] net: dsa: Move skb_unshare() to dsa_switch_rcv()
On Fri, Apr 07, 2017 at 01:41:53PM -0700, Florian Fainelli wrote: > All DSA tag receive functions need to unshare the skb before mangling it, move > this to the generic dsa_switch_rcv() function which will allow us to make the > tag receive function return their mangled skb without caring about freeing a > NULL skb. > > Signed-off-by: Florian Fainelli Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 2/3] net: dsa: Move skb_unshare() to dsa_switch_rcv()
Florian Fainelli writes: > All DSA tag receive functions need to unshare the skb before mangling it, move > this to the generic dsa_switch_rcv() function which will allow us to make the > tag receive function return their mangled skb without caring about freeing a > NULL skb. > > Signed-off-by: Florian Fainelli Reviewed-by: Vivien Didelot
Re: [PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions
Florian Fainelli writes: > All DSA tag receive functions do strictly the same thing after they have > located > the originating source port from their tag specific protocol: > > - push ETH_HLEN bytes > - set pkt_type to PACKET_HOST > - call eth_type_trans() > - bump up counters > - call netif_receive_skb() > > Factor all of that into dsa_switch_rcv(). This also makes us return a pointer > to > a sk_buff, which makes us symetric with the xmit function. > > Signed-off-by: Florian Fainelli Reviewed-by: Vivien Didelot
Re: [PATCH net-next 1/3] net: dsa: Do not check for NULL dst in tag parsers
Florian Fainelli writes: > dsa_switch_rcv() already tests for dst == NULL, so there is no need to > duplicate > the same check within the tag receive functions. > > Signed-off-by: Florian Fainelli Reviewed-by: Vivien Didelot
Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications
On 04/07/2017 01:53 PM, Andrew Lunn wrote: > On Fri, Apr 07, 2017 at 01:41:51PM -0700, Florian Fainelli wrote: >> Hi all, >> >> This patch series does factor the common code found in all tag >> implementations >> into dsa_switch_rcv(). The original motivation was to add GRO support, but >> this >> may be a lot of work with unclear benefits at this point. >> >> Florian Fainelli (3): >> net: dsa: Do not check for NULL dst in tag parsers >> net: dsa: Move skb_unshare() to dsa_switch_rcv() >> net: dsa: Factor bottom tag receive functions > > Hey, are you looking in my git repo and stealing my patches? I have a > pretty much identical set :-) Actually, I did not at all, good thing we are at the same conference and not talking about that :) > > I was holding off for the microchip and mediatek drivers to land, so > to not cause conflicts. Yes, that's a good point, this just became slightly obsolete since the mediatek tag code got included. -- Florian
Re: [PATCH net-next 1/3] net: dsa: Do not check for NULL dst in tag parsers
On Fri, Apr 07, 2017 at 01:41:52PM -0700, Florian Fainelli wrote: > dsa_switch_rcv() already tests for dst == NULL, so there is no need to > duplicate > the same check within the tag receive functions. > > Signed-off-by: Florian Fainelli Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 0/3] net: dsa: Receive path simplifications
On Fri, Apr 07, 2017 at 01:41:51PM -0700, Florian Fainelli wrote: > Hi all, > > This patch series does factor the common code found in all tag implementations > into dsa_switch_rcv(). The original motivation was to add GRO support, but > this > may be a lot of work with unclear benefits at this point. > > Florian Fainelli (3): > net: dsa: Do not check for NULL dst in tag parsers > net: dsa: Move skb_unshare() to dsa_switch_rcv() > net: dsa: Factor bottom tag receive functions Hey, are you looking in my git repo and stealing my patches? I have a pretty much identical set :-) I was holding off for the microchip and mediatek drivers to land, so to not cause conflicts. Andrew
Re: [PATCH net-next v4 0/5] net-next: dsa: add Mediatek MT7530 support
From: Date: Fri, 7 Apr 2017 16:45:04 +0800 > MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on > Mediatek router platforms such as MT7623A or MT7623N which includes 7-port > Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. Among these ports, > The port from 0 to 4 are the user ports connecting with the remote devices > while the port 5 and 6 are the CPU ports connecting into Mediatek Ethernet > GMAC. > > The patch series integrated Mediatek MT7530 into DSA support which > includes the most of the essential callbacks such as tag insertion for > port distinguishing, port control, bridge offloading, STP setup and > ethtool operations to allow DSA to model each user port into independently > standalone netdevice as the other DSA driver had done. Series applied, thanks Sean.
Re: [PATCH RFC v2 0/2] phylib non-autoneg speed setting
On 04/06/2017 03:00 AM, Russell King - ARM Linux wrote: > This set of two patches fixes and cleans up the phylib code associated > with selecting the fixed mode. > > phylib currently assumes that all PHYs will support the 10baseT/Half mode > of operation irrespective of the supported bitmask, because of the way > phy_find_valid() and phy_find_setting() operate. > > This series resolves that by reimplementing the table lookup, filtering > out all unsupported modes. This results in an unsupported speed setting > always choosing one of the speeds that are supported by the PHY. Full > details are in the patch commit log. > > The second patch cleans up the loop in phy_supported_speeds(). This looks totally fine to me, do you mind resubmitting as non RFC? > > drivers/net/phy/phy.c | 119 > -- > 1 file changed, 68 insertions(+), 51 deletions(-) > > v1->v2: fixed 0-day build errors > -- Florian
[PATCH net-next 2/3] net: dsa: Move skb_unshare() to dsa_switch_rcv()
All DSA tag receive functions need to unshare the skb before mangling it, move this to the generic dsa_switch_rcv() function which will allow us to make the tag receive function return their mangled skb without caring about freeing a NULL skb. Signed-off-by: Florian Fainelli --- net/dsa/dsa.c | 4 net/dsa/tag_brcm.c| 5 - net/dsa/tag_dsa.c | 5 - net/dsa/tag_edsa.c| 5 - net/dsa/tag_qca.c | 5 - net/dsa/tag_trailer.c | 5 - 6 files changed, 4 insertions(+), 25 deletions(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 95d1a756202c..c20da7bfc771 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -903,6 +903,10 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev, return 0; } + skb = skb_unshare(skb, GFP_ATOMIC); + if (!skb) + return 0; + return dst->rcv(skb, dev, pt, orig_dev); } diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c index 68d4feef96d4..263941769c88 100644 --- a/net/dsa/tag_brcm.c +++ b/net/dsa/tag_brcm.c @@ -102,10 +102,6 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, ds = dst->cpu_switch; - skb = skb_unshare(skb, GFP_ATOMIC); - if (skb == NULL) - goto out; - if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN))) goto out_drop; @@ -151,7 +147,6 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, out_drop: kfree_skb(skb); -out: return 0; } diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c index 377569c0e4f7..b7032699eaad 100644 --- a/net/dsa/tag_dsa.c +++ b/net/dsa/tag_dsa.c @@ -77,10 +77,6 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device *dev, int source_device; int source_port; - skb = skb_unshare(skb, GFP_ATOMIC); - if (skb == NULL) - goto out; - if (unlikely(!pskb_may_pull(skb, DSA_HLEN))) goto out_drop; @@ -175,7 +171,6 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device *dev, out_drop: kfree_skb(skb); -out: return 0; } diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c index 30520ff9c9a2..b87009672b40 100644 --- a/net/dsa/tag_edsa.c +++ b/net/dsa/tag_edsa.c @@ -90,10 +90,6 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device *dev, int source_device; int source_port; - skb = skb_unshare(skb, GFP_ATOMIC); - if (skb == NULL) - goto out; - if (unlikely(!pskb_may_pull(skb, EDSA_HLEN))) goto out_drop; @@ -194,7 +190,6 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device *dev, out_drop: kfree_skb(skb); -out: return 0; } diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c index 6579d6db1bc6..d1324649808c 100644 --- a/net/dsa/tag_qca.c +++ b/net/dsa/tag_qca.c @@ -75,10 +75,6 @@ static int qca_tag_rcv(struct sk_buff *skb, struct net_device *dev, int port; __be16 *phdr, hdr; - skb = skb_unshare(skb, GFP_ATOMIC); - if (!skb) - goto out; - if (unlikely(!pskb_may_pull(skb, QCA_HDR_LEN))) goto out_drop; @@ -126,7 +122,6 @@ static int qca_tag_rcv(struct sk_buff *skb, struct net_device *dev, out_drop: kfree_skb(skb); -out: return 0; } diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c index f5c764ee2968..1fc0b221a70f 100644 --- a/net/dsa/tag_trailer.c +++ b/net/dsa/tag_trailer.c @@ -68,10 +68,6 @@ static int trailer_rcv(struct sk_buff *skb, struct net_device *dev, ds = dst->cpu_switch; - skb = skb_unshare(skb, GFP_ATOMIC); - if (skb == NULL) - goto out; - if (skb_linearize(skb)) goto out_drop; @@ -100,7 +96,6 @@ static int trailer_rcv(struct sk_buff *skb, struct net_device *dev, out_drop: kfree_skb(skb); -out: return 0; } -- 2.9.3
[PATCH net-next 1/3] net: dsa: Do not check for NULL dst in tag parsers
dsa_switch_rcv() already tests for dst == NULL, so there is no need to duplicate the same check within the tag receive functions. Signed-off-by: Florian Fainelli --- net/dsa/tag_brcm.c| 3 --- net/dsa/tag_dsa.c | 3 --- net/dsa/tag_edsa.c| 3 --- net/dsa/tag_qca.c | 3 --- net/dsa/tag_trailer.c | 2 -- 5 files changed, 14 deletions(-) diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c index e2ed6cf68261..68d4feef96d4 100644 --- a/net/dsa/tag_brcm.c +++ b/net/dsa/tag_brcm.c @@ -100,9 +100,6 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, int source_port; u8 *brcm_tag; - if (unlikely(dst == NULL)) - goto out_drop; - ds = dst->cpu_switch; skb = skb_unshare(skb, GFP_ATOMIC); diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c index e42ba906100c..377569c0e4f7 100644 --- a/net/dsa/tag_dsa.c +++ b/net/dsa/tag_dsa.c @@ -77,9 +77,6 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device *dev, int source_device; int source_port; - if (unlikely(dst == NULL)) - goto out_drop; - skb = skb_unshare(skb, GFP_ATOMIC); if (skb == NULL) goto out; diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c index 6a9b7a9e4e15..30520ff9c9a2 100644 --- a/net/dsa/tag_edsa.c +++ b/net/dsa/tag_edsa.c @@ -90,9 +90,6 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device *dev, int source_device; int source_port; - if (unlikely(dst == NULL)) - goto out_drop; - skb = skb_unshare(skb, GFP_ATOMIC); if (skb == NULL) goto out; diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c index 4e0dad759d04..6579d6db1bc6 100644 --- a/net/dsa/tag_qca.c +++ b/net/dsa/tag_qca.c @@ -75,9 +75,6 @@ static int qca_tag_rcv(struct sk_buff *skb, struct net_device *dev, int port; __be16 *phdr, hdr; - if (unlikely(!dst)) - goto out_drop; - skb = skb_unshare(skb, GFP_ATOMIC); if (!skb) goto out; diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c index 74c948512550..f5c764ee2968 100644 --- a/net/dsa/tag_trailer.c +++ b/net/dsa/tag_trailer.c @@ -66,8 +66,6 @@ static int trailer_rcv(struct sk_buff *skb, struct net_device *dev, u8 *trailer; int source_port; - if (unlikely(dst == NULL)) - goto out_drop; ds = dst->cpu_switch; skb = skb_unshare(skb, GFP_ATOMIC); -- 2.9.3
[PATCH net-next 3/3] net: dsa: Factor bottom tag receive functions
All DSA tag receive functions do strictly the same thing after they have located the originating source port from their tag specific protocol: - push ETH_HLEN bytes - set pkt_type to PACKET_HOST - call eth_type_trans() - bump up counters - call netif_receive_skb() Factor all of that into dsa_switch_rcv(). This also makes us return a pointer to a sk_buff, which makes us symetric with the xmit function. Signed-off-by: Florian Fainelli --- include/net/dsa.h | 2 +- net/dsa/dsa.c | 20 +++- net/dsa/dsa_priv.h| 5 +++-- net/dsa/tag_brcm.c| 18 +- net/dsa/tag_dsa.c | 18 +- net/dsa/tag_edsa.c| 18 +- net/dsa/tag_qca.c | 18 +- net/dsa/tag_trailer.c | 18 +- 8 files changed, 48 insertions(+), 69 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index ffe56cc338fe..671a00aa35de 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -123,7 +123,7 @@ struct dsa_switch_tree { * protocol to use. */ struct net_device *master_netdev; - int (*rcv)(struct sk_buff *skb, + struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev); diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index c20da7bfc771..21f960ef8138 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include "dsa_priv.h" @@ -897,6 +898,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { struct dsa_switch_tree *dst = dev->dsa_ptr; + struct sk_buff *nskb = NULL; if (unlikely(dst == NULL)) { kfree_skb(skb); @@ -907,7 +909,23 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev, if (!skb) return 0; - return dst->rcv(skb, dev, pt, orig_dev); + nskb = dst->rcv(skb, dev, pt, orig_dev); + if (!nskb) { + kfree_skb(skb); + return 0; + } + + skb = nskb; + skb_push(skb, ETH_HLEN); + skb->pkt_type = PACKET_HOST; + skb->protocol = eth_type_trans(skb, skb->dev); + + skb->dev->stats.rx_packets++; + skb->dev->stats.rx_bytes += skb->len; + + netif_receive_skb(skb); + + return 0; } static struct packet_type dsa_pack_type __read_mostly = { diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 0706a511244e..f4ac8d5fb8d5 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -17,8 +17,9 @@ struct dsa_device_ops { struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev); - int (*rcv)(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pt, struct net_device *orig_dev); + struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, + struct net_device *orig_dev); }; struct dsa_slave_priv { diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c index 263941769c88..2a9b52c5af86 100644 --- a/net/dsa/tag_brcm.c +++ b/net/dsa/tag_brcm.c @@ -92,8 +92,9 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev return NULL; } -static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pt, struct net_device *orig_dev) +static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, + struct net_device *orig_dev) { struct dsa_switch_tree *dst = dev->dsa_ptr; struct dsa_switch *ds; @@ -133,21 +134,12 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, skb->data - ETH_HLEN - BRCM_TAG_LEN, 2 * ETH_ALEN); - skb_push(skb, ETH_HLEN); - skb->pkt_type = PACKET_HOST; skb->dev = ds->ports[source_port].netdev; - skb->protocol = eth_type_trans(skb, skb->dev); - skb->dev->stats.rx_packets++; - skb->dev->stats.rx_bytes += skb->len; - - netif_receive_skb(skb); - - return 0; + return skb; out_drop: - kfree_skb(skb); - return 0; + return NULL; } const struct dsa_device_ops brcm_netdev_ops = { diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c index b7032699eaad..1c6633f0de01 100644 --- a/net/dsa/tag_dsa.c +++ b/net/dsa/tag_dsa.c @@ -68,8 +68,9 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct net_device *dev) return NULL; } -static int dsa_rcv(struct sk_buff *skb, struct net_device *dev, - struct pa
[PATCH net-next 0/3] net: dsa: Receive path simplifications
Hi all, This patch series does factor the common code found in all tag implementations into dsa_switch_rcv(). The original motivation was to add GRO support, but this may be a lot of work with unclear benefits at this point. Florian Fainelli (3): net: dsa: Do not check for NULL dst in tag parsers net: dsa: Move skb_unshare() to dsa_switch_rcv() net: dsa: Factor bottom tag receive functions include/net/dsa.h | 2 +- net/dsa/dsa.c | 24 +++- net/dsa/dsa_priv.h| 5 +++-- net/dsa/tag_brcm.c| 26 +- net/dsa/tag_dsa.c | 26 +- net/dsa/tag_edsa.c| 26 +- net/dsa/tag_qca.c | 26 +- net/dsa/tag_trailer.c | 25 + 8 files changed, 52 insertions(+), 108 deletions(-) -- 2.9.3
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 12:55 -0700, David Miller wrote: > > One idea. We use the macro thing to generate a "netlink.pot" file > and then some userland tree can contain the latest netlink.pot and > the translations. Right. For the record - since we just talked about it - I was thinking of putting it into a special section so that we could easily write a script to generate the pot file from the kernel binary after doing "allyesconfig" or collecting it from modules or so. Thinking of modules though - do we need to adjust the module loading code to load a new section? I'll have to look into that. johannes
[vhost:linux-next 13/26] drivers//virtio/virtio_pci_common.c:186:7: error: too few arguments to function 'vp_dev->setup_vq'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next head: 4293ed1476ec42e45e54f812341058d812d820a5 commit: d5edad95c2f89cced19a23713f752442b620f0e1 [13/26] virtio: add context flag to find vqs config: x86_64-randconfig-x012-201714 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: git checkout d5edad95c2f89cced19a23713f752442b620f0e1 # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers//virtio/virtio_pci_common.c: In function 'vp_setup_vq': >> drivers//virtio/virtio_pci_common.c:186:7: error: too few arguments to >> function 'vp_dev->setup_vq' vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, ^~ drivers//virtio/virtio_pci_common.c: In function 'vp_find_vqs_msix': >> drivers//virtio/virtio_pci_common.c:318:12: error: too many arguments to >> function 'vp_setup_vq' vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], ^~~ drivers//virtio/virtio_pci_common.c:172:26: note: declared here static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned index, ^~~ drivers//virtio/virtio_pci_common.c: In function 'vp_find_vqs_intx': drivers//virtio/virtio_pci_common.c:371:12: error: too many arguments to function 'vp_setup_vq' vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i], ^~~ drivers//virtio/virtio_pci_common.c:172:26: note: declared here static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned index, ^~~ vim +186 drivers//virtio/virtio_pci_common.c f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 180 unsigned long flags; f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 181 f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 182 /* fill out our structure that represents an active queue */ f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 183 if (!info) f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 184 return ERR_PTR(-ENOMEM); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 185 f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 @186 vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 187 msix_vec); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 188 if (IS_ERR(vq)) f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 189 goto out_info; f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 190 f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 191 info->vq = vq; f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 192 if (callback) { f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 193 spin_lock_irqsave(&vp_dev->lock, flags); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 194 list_add(&info->node, &vp_dev->virtqueues); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 195 spin_unlock_irqrestore(&vp_dev->lock, flags); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 196 } else { f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 197 INIT_LIST_HEAD(&info->node); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 198 } f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 199 f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 200 vp_dev->vqs[index] = info; f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 201 return vq; f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 202 f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 203 out_info: f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 204 kfree(info); f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 205 return vq; f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 206 } f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 207 f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 208 static void vp_del_vq(struct virtqueue *vq) f6813487d drivers/virtio/virtio_pci_common.c Michael S. Tsirkin 2017-04-04 209 { f6813487d drivers/virtio/virtio_pci_common.c
Re: [PATCH v2 3/4] bluetooth: hci_uart: add LL protocol serdev driver support
On Fri, 2017-04-07 at 13:48 -0500, Rob Herring wrote: > On Fri, Apr 7, 2017 at 12:09 PM, Dan Williams > wrote: > > On Fri, 2017-04-07 at 09:35 -0500, Rob Herring wrote: > > > Turns out that the LL protocol and the TI-ST are the same thing > > > AFAICT. > > > The TI-ST adds firmware loading, GPIO control, and shared access > > > for > > > NFC, FM radio, etc. For now, we're only implementing what is > > > needed > > > for > > > BT. This mirrors other drivers like BCM and Intel, but uses the > > > new > > > serdev bus. > > > > > > The firmware loading is greatly simplified by using existing > > > infrastructure to send commands. It may be a bit slower than the > > > original code using synchronous functions, but the real > > > bottleneck is > > > likely doing firmware load at 115.2kbps. > > > > Is there no way to put the TI-specific stuff into a TI UART module > > rather than building it into the generic one? > > In case it's not clear, all of HCI_LL is the TI specific part, not > just what I'm adding. So you are talking about putting each UART BT > protocol into a separate module. I'd assume that is doable, but seems > orthogonal to this patch set. I'd also assume there was some reason > that was not done already. Ok, thanks for the explanation. Wasn't clear at all from the file paths in the source tree; it looked generic. Dan
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes Berg Date: Fri, 07 Apr 2017 21:46:46 +0200 > On Fri, 2017-04-07 at 12:43 -0700, David Miller wrote: >> I suspect that someone will eventually give us a hard time about the >> strings wrt. internationalization. :-) It's solvable at least >> partially in userspace I suppose. > > I tend to think of the strings more of a debugging aid, but that's a > good point. > > Perhaps we should have a macro that has the strings - similar to the > inline function I put into netlink - but if we make that a macro we can > put the strings into a separate section to be able to find them more > easily for later translation, and optionally even omit them entirely > (to satisfy the kernel tinification folks)? One idea. We use the macro thing to generate a "netlink.pot" file and then some userland tree can contain the latest netlink.pot and the translations. I'm suggesting it this way because whilst putting the netlink.pot file in the kernel tree is probably OK, putting all the translation there might not be.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:45 +0200, Pablo Neira Ayuso wrote: > > We only need to store the pointer to the attribute in the error > container structure. We can calculate the offset from nl_err() by > pasing the skbuff as parameter there, right? Ah, that's a good point, we could store the nlattr pointer and calculate the offset later. I'll need to try this, but yeah, good idea! johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 12:43 -0700, David Miller wrote: > > I have no strong opinions about string length. > > In my opinion, I would like to believe that if someone tried to get a > networking patch applied that emitted a rediculously long string then > we would give them feedback about how that is not acceptable. Right? Agree with this, I don't really see much point in adding an extra warning. There's an implicit (multi-KB though) limit as well in the message size ;-) > I suspect that someone will eventually give us a hard time about the > strings wrt. internationalization. :-) It's solvable at least > partially in userspace I suppose. I tend to think of the strings more of a debugging aid, but that's a good point. Perhaps we should have a macro that has the strings - similar to the inline function I put into netlink - but if we make that a macro we can put the strings into a separate section to be able to find them more easily for later translation, and optionally even omit them entirely (to satisfy the kernel tinification folks)? johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 09:29:17PM +0200, Johannes Berg wrote: > On Fri, 2017-04-07 at 21:21 +0200, Pablo Neira Ayuso wrote: > > I think the most flexible way is to pass the container error > > structure to nla_parse() so it sets this for you. This would also > > save tons of "malformed attribute" strings. > > Yes, for sure. The thing is we'll probalby have to pass down the > request skb *and* the error struct so that we can get the offset, and > this seems like the generic thing that we really should try to get the > most information generated. We only need to store the pointer to the attribute in the error container structure. We can calculate the offset from nl_err() by pasing the skbuff as parameter there, right?
Re: [PATCH net-next v4 1/5] dt-bindings: net: dsa: add Mediatek MT7530 binding
On 04/07/2017 01:45 AM, sean.w...@mediatek.com wrote: > From: Sean Wang > > Add device-tree binding for Mediatek MT7530 switch. > > Cc: devicet...@vger.kernel.org > Signed-off-by: Sean Wang > Acked-by: Rob Herring Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v4 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch
On 04/07/2017 01:45 AM, sean.w...@mediatek.com wrote: > From: Sean Wang > > MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on > Mediatek router platforms such as MT7623A or MT7623N platform which > includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. > Among these ports, The port from 0 to 4 are the user ports connecting > with the remote devices while the port 5 and 6 are the CPU ports > connecting into Mediatek Ethernet GMAC. > > For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC > through either the TRGMII or RGMII which could be controlled by phy-mode > in the dt-bindings to specify which mode is preferred to use. And for > port 5, only RGMII can be specified. However, currently, only port 6 is > being supported in this DSA driver. > > The driver is made with the reference to qca8k and other existing DSA > driver. The most of the essential callbacks of the DSA are already > support in the driver, including tag insert for user port distinguishing, > port control, bridge offloading, STP setup and ethtool operation to allow > DSA to model each user port into a standalone netdevice as the other DSA > driver had done. > > Signed-off-by: Sean Wang > Signed-off-by: Landen Chao Reviewed-by: Florian Fainelli -- Florian
[RFC] netlink: send exterr cookie on success
From: Johannes Berg This is for Jamal, it allows the subsystem to return an arbitrary cookie to identify a new object/operation, perhaps to delete or cancel it later. This is actually something we use a lot in wifi too, though I'm not sure how we could do the backport necessary for that (since we sadly need to ship to all kinds of old kernels) Also contains the missing (err) check I pointed out - I'll squash this patch into the first of the exterr after I test it. Signed-off-by: Johannes Berg --- include/linux/netlink.h | 2 ++ include/uapi/linux/netlink.h | 4 net/netlink/af_netlink.c | 28 +++- 3 files changed, 25 insertions(+), 9 deletions(-) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 662f343dc68b..76a0b0a22349 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -67,6 +67,8 @@ struct netlink_ext_err { u32 ext_code; u32 msg_offset; u16 attr; + const u8 *cookie; + u8 cookie_len; }; extern void netlink_kernel_release(struct sock *sk); diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index 0ef970e6f9f5..72a398e0f332 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -119,6 +119,9 @@ struct nlmsgerr { * @NLMSGERR_ATTR_CODE: extended per-subsystem error code (u32) * @NLMSGERR_ATTR_ATTR: top-level attribute that caused the error * (or is missing, u16) + * @NLMSGERR_ATTR_COOKIE: arbitrary subsystem specific cookie to + * be used - in the success case - to identify a created + * object or operation or similar (binary) */ enum nlmsgerr_attrs { NLMSGERR_ATTR_UNUSED, @@ -126,6 +129,7 @@ enum nlmsgerr_attrs { NLMSGERR_ATTR_OFFS, NLMSGERR_ATTR_CODE, NLMSGERR_ATTR_ATTR, + NLMSGERR_ATTR_COOKIE, }; #define NETLINK_ADD_MEMBERSHIP 1 diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 532d906af3be..6b256e540bc1 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2277,6 +2277,9 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err, if (exterr && exterr->attr) payload += nla_total_size(sizeof(u16)); } + } else if (nlk->flags & NETLINK_F_EXT_ACK) { + if (exterr && exterr->cookie_len) + payload += nla_total_size(exterr->cookie_len); } skb = nlmsg_new(payload, GFP_KERNEL); @@ -2301,15 +2304,22 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err, memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : sizeof(*nlh)); if (nlk->flags & NETLINK_F_EXT_ACK) { - if (exterr && exterr->msg) - WARN_ON(nla_put_string(skb, NLMSGERR_ATTR_MSG, - exterr->msg)); - if (exterr && exterr->msg_offset) - WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS, - exterr->msg_offset)); - if (exterr && exterr->attr) - WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR, - exterr->attr)); + if (err) { + if (exterr && exterr->msg) + WARN_ON(nla_put_string(skb, NLMSGERR_ATTR_MSG, + exterr->msg)); + if (exterr && exterr->msg_offset) + WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS, + exterr->msg_offset)); + if (exterr && exterr->attr) + WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR, + exterr->attr)); + } else { + if (exterr && exterr->attr) + WARN_ON(nla_put(skb, NLMSGERR_ATTR_COOKIE, + exterr->cookie_len, + exterr->cookie)); + } } netlink_unicast(in_skb->sk, skb, NETLINK_CB(in_skb).portid, MSG_DONTWAIT); -- 2.11.0
Re: [RFC 0/3] netlink: extended error reporting
From: Pablo Neira Ayuso Date: Fri, 7 Apr 2017 21:35:50 +0200 > On Fri, Apr 07, 2017 at 12:20:53PM -0700, David Miller wrote: > [...] >> Let's just discuss the UAPI, since people complain we don't talk >> about that enough :-) For those playing at home it is three new >> attributes returned in a netlink ACK when the application asks >> for the extended response: >> >> NLMSGERR_ATTR_MSG string Extended error string >> NLMSGERR_ATTR_OFFS u32 Byte offset to netlink element causing >> error >> NLMSGERR_ATTR_CODE u32 Subsystem specific error code >> NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering error or >> missing > > I think it would be good if we get a definition to cap the maximum > string length to something reasonable? This can be added in a follow > up patch BTW. Thus, we get people coming back to us and request larger > strings with a reason why they need more room for this. > > In general, my main concern with strings is that they can be used in a > more freely way than these u32 offsets and error codes, and > specifically how inconsistent these string will look like across > different netlink subsystems. > > Anyway, as long as this is optional (not every subsystem if forced to > use strings) I'm fine with it :). I have no strong opinions about string length. In my opinion, I would like to believe that if someone tried to get a networking patch applied that emitted a rediculously long string then we would give them feedback about how that is not acceptable. Right? I suspect that someone will eventually give us a hard time about the strings wrt. internationalization. :-) It's solvable at least partially in userspace I suppose.
Re: [RFC 1/3] netlink: extended error reporting
On Fri, 2017-04-07 at 20:26 +0200, Johannes Berg wrote: > > + if (nlk->flags & NETLINK_F_EXT_ACK) { > + if (exterr && exterr->msg) > + WARN_ON(nla_put_string(skb, > NLMSGERR_ATTR_MSG, > + exterr->msg)); > + if (exterr && exterr->msg_offset) > + WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS, > + exterr->msg_offset)); > + if (exterr && exterr->attr) > + WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR, > + exterr->attr)); > + } I forgot to check (err != 0) here, which can cause inconsistencies - noticed that while just adding the error case Jamal wanted. I'll send out that as a separate patch, and squash it later when I resubmit. johannes
Re: [PATCH net-next v4 1/5] dt-bindings: net: dsa: add Mediatek MT7530 binding
On Fri, Apr 07, 2017 at 04:45:05PM +0800, sean.w...@mediatek.com wrote: > From: Sean Wang > > Add device-tree binding for Mediatek MT7530 switch. > > Cc: devicet...@vger.kernel.org > Signed-off-by: Sean Wang > Acked-by: Rob Herring Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next v4 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch
On Fri, Apr 07, 2017 at 04:45:09PM +0800, sean.w...@mediatek.com wrote: > From: Sean Wang > > MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on > Mediatek router platforms such as MT7623A or MT7623N platform which > includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. > Among these ports, The port from 0 to 4 are the user ports connecting > with the remote devices while the port 5 and 6 are the CPU ports > connecting into Mediatek Ethernet GMAC. > > For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC > through either the TRGMII or RGMII which could be controlled by phy-mode > in the dt-bindings to specify which mode is preferred to use. And for > port 5, only RGMII can be specified. However, currently, only port 6 is > being supported in this DSA driver. > > The driver is made with the reference to qca8k and other existing DSA > driver. The most of the essential callbacks of the DSA are already > support in the driver, including tag insert for user port distinguishing, > port control, bridge offloading, STP setup and ethtool operation to allow > DSA to model each user port into a standalone netdevice as the other DSA > driver had done. > > Signed-off-by: Sean Wang > Signed-off-by: Landen Chao Hi Sean This is looking good now. Thanks Reviewed-by: Andrew Lunn Andrew
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 12:20:53PM -0700, David Miller wrote: [...] > Let's just discuss the UAPI, since people complain we don't talk > about that enough :-) For those playing at home it is three new > attributes returned in a netlink ACK when the application asks > for the extended response: > > NLMSGERR_ATTR_MSG string Extended error string > NLMSGERR_ATTR_OFFS u32 Byte offset to netlink element causing > error > NLMSGERR_ATTR_CODE u32 Subsystem specific error code > NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering error or > missing I think it would be good if we get a definition to cap the maximum string length to something reasonable? This can be added in a follow up patch BTW. Thus, we get people coming back to us and request larger strings with a reason why they need more room for this. In general, my main concern with strings is that they can be used in a more freely way than these u32 offsets and error codes, and specifically how inconsistent these string will look like across different netlink subsystems. Anyway, as long as this is optional (not every subsystem if forced to use strings) I'm fine with it :).
Re: [RFC 0/3] netlink: extended error reporting
From: Pablo Neira Ayuso Date: Fri, 7 Apr 2017 21:27:14 +0200 > On Fri, Apr 07, 2017 at 12:22:23PM -0700, David Miller wrote: >> From: Johannes Berg >> Date: Fri, 07 Apr 2017 21:09:45 +0200 >> >> > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: >> >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: >> >> [...] >> >> > Heh. I think I really want to solve - at least partially - >> >> > nla_parse() >> >> > to see that it can be done this way. It'd be nice to even transform >> >> > all >> >> > the callers (I generated half of these patches with spatch anyway) >> >> > to >> >> > have at least that. >> >> >> >> We can just have a modified version of nla_parse that deals with >> >> this. >> > >> > Yes, but we need to figure out a good way to have the offset. >> > >> > We also need to see if we want to *force* having the offset. In some >> > sense that'd be useful, in another it might be very complicated to fill >> > it in at all times, if for example errors come from lower layers like >> > drivers. >> >> It has to be optional, some kinds of errors don't have an exact >> context per-se. >> >> Also another way to look at this is that we're providing a lot of >> new power and expressability. So even if only one aspect of the >> new error reporting is used it's a positive step forward. >> >> So allow offset "0" meaning "unspecified". > > Instead, we can just not send the offset attribute to userspace if > it's not specified. So missing attribute means "unspecified". Agreed, not providing the attribute to indicate this is fine.
Re: [RFC 0/3] netlink: extended error reporting
From: Pablo Neira Ayuso Date: Fri, 7 Apr 2017 21:21:34 +0200 > For my usecases in netfilter, the attributes and an specific error > code should be enough to figure out what is wrong. Will not need > strings. > > BTW, we may not have an offset, eg. EINVAL because of missing > attribute. Given we have different requirements, I would leave it to > each subsystem to decide what netlink error attributes are specified. Yep, completely agreed. The use cases for offset and missing attribute I see as follows: 1) Top-level attribute is missing. Here, offset is set to zero and the missing attribute number is given as well. 2) Nested attribute is missing. Offset is set to the location of the beginning of nesting, inside of which the missing attribute was needed.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:27 +0200, Pablo Neira Ayuso wrote: > > > Also another way to look at this is that we're providing a lot of > > new power and expressability. So even if only one aspect of the > > new error reporting is used it's a positive step forward. True. > > So allow offset "0" meaning "unspecified". > > Instead, we can just not send the offset attribute to userspace if > it's not specified. So missing attribute means "unspecified". > > I'm always a bit worried this "0 means something" semantics :) Yeah, I have that. If it's 0 internally the attribute will be omitted. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:21 +0200, Pablo Neira Ayuso wrote: > > For my usecases in netfilter, the attributes and an specific error > code should be enough to figure out what is wrong. Will not need > strings. Fair enough. > BTW, we may not have an offset, eg. EINVAL because of missing > attribute. Given we have different requirements, I would leave it to > each subsystem to decide what netlink error attributes are specified. > > > (It's ultimately always going to be optional since we'll continue > > returning errors without *any* extended error information - likely > > indefinitely - but if we have a wrong attribute, should we always > > have > > an offset? Would be nice, but could be difficult in practice) > > > > > We can probably use struct nla_policy to place the extended error > > > code or the string (as we discussed I don't need string errors, > > > but > > > I'm fine if other people find it useful). > > > > I don't think for the error strings really would be useful for > > nla_parse() or a policy - we can return something generic like > > "malformed attribute" there as a string, and hopefully point to the > > attribute/offset from there generically. I just really want to see > > how > > to actually do that. > > I think the most flexible way is to pass the container error > structure to nla_parse() so it sets this for you. This would also > save tons of "malformed attribute" strings. Yes, for sure. The thing is we'll probalby have to pass down the request skb *and* the error struct so that we can get the offset, and this seems like the generic thing that we really should try to get the most information generated. johannes
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote: > On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin wrote: > > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: > >> From: Willem de Bruijn > >> > >> Amortize the cost of virtual interrupts by doing both rx and tx work > >> on reception of a receive interrupt if tx napi is enabled. With > >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion > >> interrupts for bidirectional workloads. > >> > >> Signed-off-by: Willem de Bruijn This is a popular approach, but I think this will only work well if tx and rx interrupts are processed on the same CPU and if tx queue is per cpu. If they target different CPUs or if tx queue is used from multiple CPUs they will conflict on the shared locks. This can even change dynamically as CPUs/queues are reconfigured. How about adding a flag and skipping the tx poll if there's no match? -- MST
Re: [RFC 2/3] genetlink: pass extended error report down
On 04/07/2017 12:12 PM, Johannes Berg wrote: On Fri, 2017-04-07 at 11:37 -0700, Ben Greear wrote: I guess the error string must be constant and always available in memory in this implementation? Yes. I think it would be nice to dynamically create strings (malloc, snprintf, etc) and have the err_str logic free it when done? We can think about that later, but I don't actually think it makes a lot of sense - if we point to the attribute and/or offset you really ought to have enough info to figure out what's up. We can think about it later, but lots of things in the wifi stack could use a descriptive message specific to the failure. Often these messages are much more useful if you explain why the failure conflicts with regulatory, channel, virtual-dev combination, etc info, so that needs to be dynamic. The code that is failing knows, so I'd like to pass it back to user-space. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 12:22:23PM -0700, David Miller wrote: > From: Johannes Berg > Date: Fri, 07 Apr 2017 21:09:45 +0200 > > > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: > >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: > >> [...] > >> > Heh. I think I really want to solve - at least partially - > >> > nla_parse() > >> > to see that it can be done this way. It'd be nice to even transform > >> > all > >> > the callers (I generated half of these patches with spatch anyway) > >> > to > >> > have at least that. > >> > >> We can just have a modified version of nla_parse that deals with > >> this. > > > > Yes, but we need to figure out a good way to have the offset. > > > > We also need to see if we want to *force* having the offset. In some > > sense that'd be useful, in another it might be very complicated to fill > > it in at all times, if for example errors come from lower layers like > > drivers. > > It has to be optional, some kinds of errors don't have an exact > context per-se. > > Also another way to look at this is that we're providing a lot of > new power and expressability. So even if only one aspect of the > new error reporting is used it's a positive step forward. > > So allow offset "0" meaning "unspecified". Instead, we can just not send the offset attribute to userspace if it's not specified. So missing attribute means "unspecified". I'm always a bit worried this "0 means something" semantics :)
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 12:20 -0700, David Miller wrote: > But what it lacks fundamentally is context. Therefore it can't be > used to provide the offset or the bad attribute number. So it can't > meet our requirements. Yes, it doesn't address the requirements here, and in a sense I suspect this will be true anywhere else too - perhaps that's the reason why it was never applied even over in perf. I've not gotten a good answer for that question yet :) > We've passed down new arguments into core method call chains for much > less useful facilitites than this. So that doesn't bother me. And > as that is a kernel internal matter, we can refine it later. Yes, both are true. > Wrt. nla_parse(), we can solve that problem as follow-on patches too. Yes. I just want to be sure we can solve it, but OTOH we're not locking down anything but the UAPI so whatever way we find to solve it, even if it requires a complete rearchitecture internally, it still doesn't matter. > So I consider this change ready as far as the implementation is > concerned. Fair enough. I still do need to test it though :) > Ok, Jiri, start reading. I will try to make you happy here. > > Let's just discuss the UAPI, since people complain we don't talk > about that enough :-) For those playing at home it is three new > attributes returned in a netlink ACK when the application asks > for the extended response: > > NLMSGERR_ATTR_MSG string Extended error string > NLMSGERR_ATTR_OFFS u32 Byte offset to netlink > element causing error > NLMSGERR_ATTR_CODE u32 Subsystem specific error code > NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering > error or missing That's four ;-) Since you're carefully listing it here, you should also say that * the new behaviour is entirely optional, enabled by a new netlink socket option * the ACK message format ends up being [nlmsg header] [ack header including errno] [request message nlmsg hdr] [request message payload - optional, controlled by socket option, aligned to 4 byte length if present] [new TLVs - optional, controlled by socket option] johannes
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes Berg Date: Fri, 07 Apr 2017 21:09:45 +0200 > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: >> [...] >> > Heh. I think I really want to solve - at least partially - >> > nla_parse() >> > to see that it can be done this way. It'd be nice to even transform >> > all >> > the callers (I generated half of these patches with spatch anyway) >> > to >> > have at least that. >> >> We can just have a modified version of nla_parse that deals with >> this. > > Yes, but we need to figure out a good way to have the offset. > > We also need to see if we want to *force* having the offset. In some > sense that'd be useful, in another it might be very complicated to fill > it in at all times, if for example errors come from lower layers like > drivers. It has to be optional, some kinds of errors don't have an exact context per-se. Also another way to look at this is that we're providing a lot of new power and expressability. So even if only one aspect of the new error reporting is used it's a positive step forward. So allow offset "0" meaning "unspecified".
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes Berg Date: Fri, 07 Apr 2017 20:59:12 +0200 > Alexander Shishkin's patch > https://patchwork.kernel.org/patch/7162421/ > > was nice in a way because you could get away without passing the error > structure down all the time, but like I said it doesn't deal with > dynamic errors (even the offset/attr from nla_parse) and if it's not > done *everywhere* it risks leaking those exterr numbers (-1024..-4095) > to userspace through syscalls if it's used in a non-netlink path for > some reason (e.g. a single driver call being called through both > netlink and ioctl, and trying to return an extended error) Alexander's patch is cute in that it annotates errors using the special section blobs. But what it lacks fundamentally is context. Therefore it can't be used to provide the offset or the bad attribute number. So it can't meet our requirements. We've passed down new arguments into core method call chains for much less useful facilitites than this. So that doesn't bother me. And as that is a kernel internal matter, we can refine it later. Wrt. nla_parse(), we can solve that problem as follow-on patches too. So I consider this change ready as far as the implementation is concerned. Ok, Jiri, start reading. I will try to make you happy here. Let's just discuss the UAPI, since people complain we don't talk about that enough :-) For those playing at home it is three new attributes returned in a netlink ACK when the application asks for the extended response: NLMSGERR_ATTR_MSG string Extended error string NLMSGERR_ATTR_OFFS u32 Byte offset to netlink element causing error NLMSGERR_ATTR_CODE u32 Subsystem specific error code NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering error or missing So please consider this interface carefully.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 09:09:45PM +0200, Johannes Berg wrote: > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: > > On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: > > [...] > > > Heh. I think I really want to solve - at least partially - > > > nla_parse() > > > to see that it can be done this way. It'd be nice to even transform > > > all > > > the callers (I generated half of these patches with spatch anyway) > > > to > > > have at least that. > > > > We can just have a modified version of nla_parse that deals with > > this. > > Yes, but we need to figure out a good way to have the offset. > > We also need to see if we want to *force* having the offset. In some > sense that'd be useful, in another it might be very complicated to fill > it in at all times, if for example errors come from lower layers like > drivers. For my usecases in netfilter, the attributes and an specific error code should be enough to figure out what is wrong. Will not need strings. BTW, we may not have an offset, eg. EINVAL because of missing attribute. Given we have different requirements, I would leave it to each subsystem to decide what netlink error attributes are specified. > (It's ultimately always going to be optional since we'll continue > returning errors without *any* extended error information - likely > indefinitely - but if we have a wrong attribute, should we always have > an offset? Would be nice, but could be difficult in practice) > > > We can probably use struct nla_policy to place the extended error > > code or the string (as we discussed I don't need string errors, but > > I'm fine if other people find it useful). > > I don't think for the error strings really would be useful for > nla_parse() or a policy - we can return something generic like > "malformed attribute" there as a string, and hopefully point to the > attribute/offset from there generically. I just really want to see how > to actually do that. I think the most flexible way is to pass the container error structure to nla_parse() so it sets this for you. This would also save tons of "malformed attribute" strings.
Re: [RFC 2/3] genetlink: pass extended error report down
On Fri, 2017-04-07 at 11:37 -0700, Ben Greear wrote: > > I guess the error string must be constant and always available in > memory in this implementation? Yes. > I think it would be nice to dynamically create strings (malloc, > snprintf, etc) and have the err_str logic free it when done? We can think about that later, but I don't actually think it makes a lot of sense - if we point to the attribute and/or offset you really ought to have enough info to figure out what's up. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: > On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: > [...] > > Heh. I think I really want to solve - at least partially - > > nla_parse() > > to see that it can be done this way. It'd be nice to even transform > > all > > the callers (I generated half of these patches with spatch anyway) > > to > > have at least that. > > We can just have a modified version of nla_parse that deals with > this. Yes, but we need to figure out a good way to have the offset. We also need to see if we want to *force* having the offset. In some sense that'd be useful, in another it might be very complicated to fill it in at all times, if for example errors come from lower layers like drivers. (It's ultimately always going to be optional since we'll continue returning errors without *any* extended error information - likely indefinitely - but if we have a wrong attribute, should we always have an offset? Would be nice, but could be difficult in practice) > We can probably use struct nla_policy to place the extended error > code or the string (as we discussed I don't need string errors, but > I'm fine if other people find it useful). I don't think for the error strings really would be useful for nla_parse() or a policy - we can return something generic like "malformed attribute" there as a string, and hopefully point to the attribute/offset from there generically. I just really want to see how to actually do that. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:02 +0200, Pablo Neira Ayuso wrote: > > > I'm tempted to apply this as-is just to show that person that > > things do in fact happen eventually :-) > > We can just send follow up patches to refine, I think it's a good > start, Johannes? I guess we can. Like I just said though - I do have a plan to make nla_parse() work but the week's been exhausting enough that I don't feel entirely comfortable saying that will definitely work. OTOH, arguably this is something that seems to work - as evidenced by the nl80211 changes - for at least some error reporting, so I guess we can continue to refine it, and the only thing we're really locking in to is the ABI, which is TLVs now, so that should be safe. That said, the nl80211 patch really should be bigger, and I've not even tested this at all yet, so I really need to do that first. Also, I'd like to point to the genl_info change, does that seem like a reasonable way to pass it around in genetlink? I'm much less familiar with the "regular" netlink to say how to pass it around there beyond what I already did. johannes
Re: [Intel-wired-lan] [PATCH v2] igb: Allow to remove administratively set MAC on VFs
On Wed, 2017-04-05 at 15:46 +0200, Corinna Vinschen wrote: > Before libvirt modifies the MAC address and vlan tag for an SRIOV > VF > for use by a virtual machine (either using vfio device assignment > or > macvtap passthru mode), it saves the current MAC address and vlan > tag > so that it can reset them to their original value when the guest is > done. Libvirt can't leave the VF MAC set to the value used by the > now-defunct guest since it may be started again later using a > different VF, but it certainly shouldn't just pick any random > value, > either. So it saves the state of everything prior to using the VF, > and > resets it to that. > > The igb driver initializes the MAC addresses of all VFs to > 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK > netlink message, also visible in the list of VFs in the output of > "ip > link show"). But when libvirt attempts to restore the MAC address > back > to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the > kernel > responds with "Invalid argument". > > Forbidding a reset back to the original value leaves the VF MAC at > the > value set for the now-defunct virtual machine. Especially on a > system > with NetworkManager enabled, this has very bad consequences, since > NetworkManager forces all interfacess to be IFF_UP all the time - > if > the same virtual machine is restarted using a different VF (or even > on > a different host), there will be multiple interfaces watching for > traffic with the same MAC address. > > To allow libvirt to revert to the original state, we need a way to > remove the administrative set MAC on a VF, to allow normal host > operation again, and to reset/overwrite the VF MAC via VF netdev. > > This patch implements the outlined scenario by allowing to set the > VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. > igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, > so it's possible to reset the VF MAC back to the original value via > the VF netdev. > > Note: Recent patches to libvirt allow for a workaround if the NIC > isn't capable of resetting the administrative MAC back to all 0, > but > in theory the NIC should allow resetting the MAC in the first > place. > > Signed-off-by: Corinna Vinschen > --- > drivers/net/ethernet/intel/igb/igb_main.c | 42 > +++ > 1 file changed, 31 insertions(+), 11 deletions(-) This patch does not apply (not even close). Please make sure to base you patch off my dev-queue branch of my next-queue tree on kernel.org. signature.asc Description: This is a digitally signed message part
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: [...] > Heh. I think I really want to solve - at least partially - nla_parse() > to see that it can be done this way. It'd be nice to even transform all > the callers (I generated half of these patches with spatch anyway) to > have at least that. We can just have a modified version of nla_parse that deals with this. We can probably use struct nla_policy to place the extended error code or the string (as we discussed I don't need string errors, but I'm fine if other people find it useful). Thanks!
Re: [PATCH v2 1/2] bpf: remove struct bpf_prog_type_list
So two things about this, and they apply to the other patch as well: > +extern const struct bpf_verifier_ops sk_filter_prog_ops; > +extern const struct bpf_verifier_ops tc_cls_act_prog_ops; > +extern const struct bpf_verifier_ops xdp_prog_ops; > +extern const struct bpf_verifier_ops cg_skb_prog_ops; > +extern const struct bpf_verifier_ops cg_sock_prog_ops; > +extern const struct bpf_verifier_ops lwt_inout_prog_ops; > +extern const struct bpf_verifier_ops lwt_xmit_prog_ops; > +extern const struct bpf_verifier_ops kprobe_prog_ops; > +extern const struct bpf_verifier_ops tracepoint_prog_ops; > +extern const struct bpf_verifier_ops perf_event_prog_ops; I'm not super happy with having to list it here - and maybe I should add ifdefs here as well. +static const struct bpf_verifier_ops * const bpf_prog_types[] = { > +#ifdef CONFIG_NET > + [BPF_PROG_TYPE_SOCKET_FILTER] = &sk_filter_prog_ops, > + [BPF_PROG_TYPE_SCHED_CLS] = &tc_cls_act_prog_ops, > + [BPF_PROG_TYPE_SCHED_ACT] = &tc_cls_act_prog_ops, > + [BPF_PROG_TYPE_XDP] = &xdp_prog_ops, > + [BPF_PROG_TYPE_CGROUP_SKB] = &cg_skb_prog_ops, > + [BPF_PROG_TYPE_CGROUP_SOCK] = &cg_sock_prog_ops, > + [BPF_PROG_TYPE_LWT_IN] = &lwt_inout_prog_ops, > + [BPF_PROG_TYPE_LWT_OUT] = &lwt_inout_prog_ops, > + [BPF_PROG_TYPE_LWT_XMIT] = &lwt_xmit_prog_ops, > +#endif > +#ifdef CONFIG_BPF_EVENTS > + [BPF_PROG_TYPE_KPROBE] = &kprobe_prog_ops, > + [BPF_PROG_TYPE_TRACEPOINT] = &tracepoint_prog_ops, > + [BPF_PROG_TYPE_PERF_EVENT] = &perf_event_prog_ops, > +#endif And here I list it again. Something I considered was to add the prog_type to the ops struct, and use the linker (putting this into a special section) to assemble an array. But that * makes the lwt_inout_prog_ops that are shared not work without duplicating * requires more complicated search code This seems optimal for the resulting binary, but it's a bunch more typing. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 11:53:15AM -0700, David Miller wrote: > From: Johannes Berg > Date: Fri, 7 Apr 2017 20:26:17 +0200 > > > So this is my first draft of what we'd talked about at netconf. > > I'm not super happy with the way we have to pass the extended > > error struct, but I don't see a way to implement reporting any > > dynamic information (like error offsets) in any other way. > > > > Alexander Shishkin had a nice way of reporting static extended > > error data, but that isn't really suitable for reporting the > > offset or even reporting the broken attribute from nla_parse(). > > > > Speaking of nla_parse(), that'll be somewhat complicated to do > > since we'll have to track the offsets of where we're parsing, > > but it might be possible since the nlattrs are just pointers > > into the message, so (optionally?) passing the skb as well can > > allow us to fill the offset information. > > I like it, nice work. > > I know people want dynamically generated strings and stuff, and we can > get there, but I prefer that the first thing we commit is super simple. > > Someone gave me a hard time about the fact that we've been talking > about this idea for years but nothing ever happens. > > I'm tempted to apply this as-is just to show that person that things > do in fact happen eventually :-) We can just send follow up patches to refine, I think it's a good start, Johannes? BTW, for this co-authored effort in designing this: Signed-off-by: Pablo Neira Ayuso Thanks!
[PATCH v2 2/2] bpf: remove struct bpf_map_type_list
From: Johannes Berg There's no need to have struct bpf_map_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. Signed-off-by: Johannes Berg --- include/linux/bpf.h | 18 +-- kernel/bpf/arraymap.c | 64 --- kernel/bpf/hashtab.c | 38 -- kernel/bpf/lpm_trie.c | 14 +-- kernel/bpf/stackmap.c | 14 +-- kernel/bpf/syscall.c | 42 ++--- 6 files changed, 46 insertions(+), 144 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 75cea2890751..4a95a25186e5 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -51,12 +51,6 @@ struct bpf_map { atomic_t usercnt; }; -struct bpf_map_type_list { - struct list_head list_node; - const struct bpf_map_ops *ops; - enum bpf_map_type type; -}; - /* function argument constraints */ enum bpf_arg_type { ARG_DONTCARE = 0, /* unused argument in helper function */ @@ -239,7 +233,17 @@ extern const struct bpf_verifier_ops kprobe_prog_ops; extern const struct bpf_verifier_ops tracepoint_prog_ops; extern const struct bpf_verifier_ops perf_event_prog_ops; -void bpf_register_map_type(struct bpf_map_type_list *tl); +extern const struct bpf_map_ops array_map_ops; +extern const struct bpf_map_ops percpu_array_map_ops; +extern const struct bpf_map_ops prog_array_map_ops; +extern const struct bpf_map_ops perf_event_array_map_ops; +extern const struct bpf_map_ops cgroup_array_map_ops; +extern const struct bpf_map_ops htab_map_ops; +extern const struct bpf_map_ops htab_lru_map_ops; +extern const struct bpf_map_ops htab_percpu_map_ops; +extern const struct bpf_map_ops htab_lru_percpu_map_ops; +extern const struct bpf_map_ops trie_map_ops; +extern const struct bpf_map_ops stack_map_ops; struct bpf_prog *bpf_prog_get(u32 ufd); struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type); diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 6b6f41f0b211..376afd14cf32 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -260,7 +260,7 @@ static void array_map_free(struct bpf_map *map) bpf_map_area_free(array); } -static const struct bpf_map_ops array_ops = { +const struct bpf_map_ops array_map_ops = { .map_alloc = array_map_alloc, .map_free = array_map_free, .map_get_next_key = array_map_get_next_key, @@ -269,12 +269,7 @@ static const struct bpf_map_ops array_ops = { .map_delete_elem = array_map_delete_elem, }; -static struct bpf_map_type_list array_type __ro_after_init = { - .ops = &array_ops, - .type = BPF_MAP_TYPE_ARRAY, -}; - -static const struct bpf_map_ops percpu_array_ops = { +const struct bpf_map_ops percpu_array_map_ops = { .map_alloc = array_map_alloc, .map_free = array_map_free, .map_get_next_key = array_map_get_next_key, @@ -283,19 +278,6 @@ static const struct bpf_map_ops percpu_array_ops = { .map_delete_elem = array_map_delete_elem, }; -static struct bpf_map_type_list percpu_array_type __ro_after_init = { - .ops = &percpu_array_ops, - .type = BPF_MAP_TYPE_PERCPU_ARRAY, -}; - -static int __init register_array_map(void) -{ - bpf_register_map_type(&array_type); - bpf_register_map_type(&percpu_array_type); - return 0; -} -late_initcall(register_array_map); - static struct bpf_map *fd_array_map_alloc(union bpf_attr *attr) { /* only file descriptors can be stored in this type of map */ @@ -399,7 +381,7 @@ void bpf_fd_array_map_clear(struct bpf_map *map) fd_array_map_delete_elem(map, &i); } -static const struct bpf_map_ops prog_array_ops = { +const struct bpf_map_ops prog_array_map_ops = { .map_alloc = fd_array_map_alloc, .map_free = fd_array_map_free, .map_get_next_key = array_map_get_next_key, @@ -409,18 +391,6 @@ static const struct bpf_map_ops prog_array_ops = { .map_fd_put_ptr = prog_fd_array_put_ptr, }; -static struct bpf_map_type_list prog_array_type __ro_after_init = { - .ops = &prog_array_ops, - .type = BPF_MAP_TYPE_PROG_ARRAY, -}; - -static int __init register_prog_array_map(void) -{ - bpf_register_map_type(&prog_array_type); - return 0; -} -late_initcall(register_prog_array_map); - static struct bpf_event_entry *bpf_event_entry_gen(struct file *perf_file, struct file *map_file) { @@ -511,7 +481,7 @@ static void perf_event_fd_array_release(struct bpf_map *map, rcu_read_unlock(); } -static const struct bpf_map_ops perf_event_array_ops = { +const struct bpf_map_ops perf_event_array_map_ops = { .map_alloc = fd_array
[PATCH v2 1/2] bpf: remove struct bpf_prog_type_list
From: Johannes Berg There's no need to have struct bpf_prog_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. Signed-off-by: Johannes Berg --- include/linux/bpf.h | 22 +++--- kernel/bpf/syscall.c | 39 ++--- kernel/trace/bpf_trace.c | 30 ++- net/core/filter.c| 75 +--- 4 files changed, 44 insertions(+), 122 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 909fc033173a..75cea2890751 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -169,12 +169,6 @@ struct bpf_verifier_ops { struct bpf_prog *prog); }; -struct bpf_prog_type_list { - struct list_head list_node; - const struct bpf_verifier_ops *ops; - enum bpf_prog_type type; -}; - struct bpf_prog_aux { atomic_t refcnt; u32 used_map_cnt; @@ -234,7 +228,17 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, #ifdef CONFIG_BPF_SYSCALL DECLARE_PER_CPU(int, bpf_prog_active); -void bpf_register_prog_type(struct bpf_prog_type_list *tl); +extern const struct bpf_verifier_ops sk_filter_prog_ops; +extern const struct bpf_verifier_ops tc_cls_act_prog_ops; +extern const struct bpf_verifier_ops xdp_prog_ops; +extern const struct bpf_verifier_ops cg_skb_prog_ops; +extern const struct bpf_verifier_ops cg_sock_prog_ops; +extern const struct bpf_verifier_ops lwt_inout_prog_ops; +extern const struct bpf_verifier_ops lwt_xmit_prog_ops; +extern const struct bpf_verifier_ops kprobe_prog_ops; +extern const struct bpf_verifier_ops tracepoint_prog_ops; +extern const struct bpf_verifier_ops perf_event_prog_ops; + void bpf_register_map_type(struct bpf_map_type_list *tl); struct bpf_prog *bpf_prog_get(u32 ufd); @@ -295,10 +299,6 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) /* verify correctness of eBPF program */ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); #else -static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl) -{ -} - static inline struct bpf_prog *bpf_prog_get(u32 ufd) { return ERR_PTR(-EOPNOTSUPP); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7af0dcc5d755..421aa81ef5d4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -564,26 +564,33 @@ static int map_get_next_key(union bpf_attr *attr) return err; } -static LIST_HEAD(bpf_prog_types); +static const struct bpf_verifier_ops * const bpf_prog_types[] = { +#ifdef CONFIG_NET + [BPF_PROG_TYPE_SOCKET_FILTER] = &sk_filter_prog_ops, + [BPF_PROG_TYPE_SCHED_CLS] = &tc_cls_act_prog_ops, + [BPF_PROG_TYPE_SCHED_ACT] = &tc_cls_act_prog_ops, + [BPF_PROG_TYPE_XDP] = &xdp_prog_ops, + [BPF_PROG_TYPE_CGROUP_SKB] = &cg_skb_prog_ops, + [BPF_PROG_TYPE_CGROUP_SOCK] = &cg_sock_prog_ops, + [BPF_PROG_TYPE_LWT_IN] = &lwt_inout_prog_ops, + [BPF_PROG_TYPE_LWT_OUT] = &lwt_inout_prog_ops, + [BPF_PROG_TYPE_LWT_XMIT] = &lwt_xmit_prog_ops, +#endif +#ifdef CONFIG_BPF_EVENTS + [BPF_PROG_TYPE_KPROBE] = &kprobe_prog_ops, + [BPF_PROG_TYPE_TRACEPOINT] = &tracepoint_prog_ops, + [BPF_PROG_TYPE_PERF_EVENT] = &perf_event_prog_ops, +#endif +}; static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog) { - struct bpf_prog_type_list *tl; - - list_for_each_entry(tl, &bpf_prog_types, list_node) { - if (tl->type == type) { - prog->aux->ops = tl->ops; - prog->type = type; - return 0; - } - } - - return -EINVAL; -} + if (type >= ARRAY_SIZE(bpf_prog_types) || !bpf_prog_types[type]) + return -EINVAL; -void bpf_register_prog_type(struct bpf_prog_type_list *tl) -{ - list_add(&tl->list_node, &bpf_prog_types); + prog->aux->ops = bpf_prog_types[type]; + prog->type = type; + return 0; } /* fixup insn->imm field of bpf_call instructions: diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cee9802cf3e0..8a4efac28710 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -501,16 +501,11 @@ static bool kprobe_prog_is_valid_access(int off, int size, enum bpf_access_type return true; } -static const struct bpf_verifier_ops kprobe_prog_ops = { +const struct bpf_verifier_ops kprobe_prog_ops = { .get_func_proto = kprobe_prog_func_proto, .is_valid_access = kprobe_prog_is_valid_access, }; -static struct bpf_prog_type_list kprobe_tl __ro_after_init = { - .ops= &kprobe_prog_ops, - .type = BPF_PROG_TYPE_KPROBE, -}; - BPF_CALL_5(bpf_perf_event_o
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 11:53 -0700, David Miller wrote: > > Alexander Shishkin had a nice way of reporting static extended > > error data, but that isn't really suitable for reporting the > > offset or even reporting the broken attribute from nla_parse(). > > > > Speaking of nla_parse(), that'll be somewhat complicated to do > > since we'll have to track the offsets of where we're parsing, > > but it might be possible since the nlattrs are just pointers > > into the message, so (optionally?) passing the skb as well can > > allow us to fill the offset information. > > I like it, nice work. I don't like it all that much ;-) Alexander Shishkin's patch https://patchwork.kernel.org/patch/7162421/ was nice in a way because you could get away without passing the error structure down all the time, but like I said it doesn't deal with dynamic errors (even the offset/attr from nla_parse) and if it's not done *everywhere* it risks leaking those exterr numbers (-1024..-4095) to userspace through syscalls if it's used in a non-netlink path for some reason (e.g. a single driver call being called through both netlink and ioctl, and trying to return an extended error) > I know people want dynamically generated strings and stuff, and we > can get there, but I prefer that the first thing we commit is super > simple. > > Someone gave me a hard time about the fact that we've been talking > about this idea for years but nothing ever happens. > > I'm tempted to apply this as-is just to show that person that things > do in fact happen eventually :-) Heh. I think I really want to solve - at least partially - nla_parse() to see that it can be done this way. It'd be nice to even transform all the callers (I generated half of these patches with spatch anyway) to have at least that. johannes
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes Berg Date: Fri, 7 Apr 2017 20:26:17 +0200 > So this is my first draft of what we'd talked about at netconf. > I'm not super happy with the way we have to pass the extended > error struct, but I don't see a way to implement reporting any > dynamic information (like error offsets) in any other way. > > Alexander Shishkin had a nice way of reporting static extended > error data, but that isn't really suitable for reporting the > offset or even reporting the broken attribute from nla_parse(). > > Speaking of nla_parse(), that'll be somewhat complicated to do > since we'll have to track the offsets of where we're parsing, > but it might be possible since the nlattrs are just pointers > into the message, so (optionally?) passing the skb as well can > allow us to fill the offset information. I like it, nice work. I know people want dynamically generated strings and stuff, and we can get there, but I prefer that the first thing we commit is super simple. Someone gave me a hard time about the fact that we've been talking about this idea for years but nothing ever happens. I'm tempted to apply this as-is just to show that person that things do in fact happen eventually :-)
Re: [PATCH v2 3/4] bluetooth: hci_uart: add LL protocol serdev driver support
On Fri, Apr 7, 2017 at 12:09 PM, Dan Williams wrote: > On Fri, 2017-04-07 at 09:35 -0500, Rob Herring wrote: >> Turns out that the LL protocol and the TI-ST are the same thing >> AFAICT. >> The TI-ST adds firmware loading, GPIO control, and shared access for >> NFC, FM radio, etc. For now, we're only implementing what is needed >> for >> BT. This mirrors other drivers like BCM and Intel, but uses the new >> serdev bus. >> >> The firmware loading is greatly simplified by using existing >> infrastructure to send commands. It may be a bit slower than the >> original code using synchronous functions, but the real bottleneck is >> likely doing firmware load at 115.2kbps. > > Is there no way to put the TI-specific stuff into a TI UART module > rather than building it into the generic one? In case it's not clear, all of HCI_LL is the TI specific part, not just what I'm adding. So you are talking about putting each UART BT protocol into a separate module. I'd assume that is doable, but seems orthogonal to this patch set. I'd also assume there was some reason that was not done already. Rob
Re: [PATCH net] bridge: netlink: register netdevice before executing changelink
On Fri, 7 Apr 2017 21:16:49 +0300 Ido Schimmel wrote: > On Fri, Apr 07, 2017 at 01:37:53PM -0400, Stephen Hemminger wrote: > > Fix it right now. The patch you propose is too half baked. > > Given this patch needs to be back ported to three different stable > kernels I tried to keep the amount of changes to a minimum while fixing > the bug. > > I'll work on the ndo_uninit() patch and submit it. Then people can > decide which course of action they're more comfortable with. Other option for stable is to revert b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
Re: [PATCH net] tcp: restrict F-RTO to work-around broken middle-boxes
From: Yuchung Cheng Date: Fri, 7 Apr 2017 11:42:05 -0700 > The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO > to catch more spurious timeouts") interacts badly with certain > broken middle-boxes. These broken boxes modify and falsely raise > the receive window on the ACKs. During a timeout induced recovery, > F-RTO would send new data packets to probe if the timeout is false > or not. Since the receive window is falsely raised, the receiver > would silently drop these F-RTO packets. The recovery would take N > (exponentially backoff) timeouts to repair N packet losses. A TCP > performance killer. > > Due to this unfortunate situation, this patch removes this extension > to revert F-RTO back to the RFC specification. > > Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") > Signed-off-by: Yuchung Cheng > Signed-off-by: Neal Cardwell > Signed-off-by: Soheil Hassas Yeganeh > Signed-off-by: Eric Dumazet Applied, thank you.
Re: [PATCH net] bridge: netlink: register netdevice before executing changelink
On Fri, 7 Apr 2017 21:16:49 +0300 Ido Schimmel wrote: > On Fri, Apr 07, 2017 at 01:37:53PM -0400, Stephen Hemminger wrote: > > Fix it right now. The patch you propose is too half baked. > > Given this patch needs to be back ported to three different stable > kernels I tried to keep the amount of changes to a minimum while fixing > the bug. > > I'll work on the ndo_uninit() patch and submit it. Then people can > decide which course of action they're more comfortable with. I don't want enterprise distros to ship without error reporting. At a minimum just return correct errno and leave bridge in stuck/broken state.
[PATCH net] tcp: restrict F-RTO to work-around broken middle-boxes
The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO to catch more spurious timeouts") interacts badly with certain broken middle-boxes. These broken boxes modify and falsely raise the receive window on the ACKs. During a timeout induced recovery, F-RTO would send new data packets to probe if the timeout is false or not. Since the receive window is falsely raised, the receiver would silently drop these F-RTO packets. The recovery would take N (exponentially backoff) timeouts to repair N packet losses. A TCP performance killer. Due to this unfortunate situation, this patch removes this extension to revert F-RTO back to the RFC specification. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Eric Dumazet --- net/ipv4/tcp_input.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2c1f59386a7b..659d1baefb2b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1935,6 +1935,7 @@ void tcp_enter_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct sk_buff *skb; + bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ bool mark_lost; @@ -1994,15 +1995,18 @@ void tcp_enter_loss(struct sock *sk) tp->high_seq = tp->snd_nxt; tcp_ecn_queue_cwr(tp); - /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO -* if a previous recovery is underway, otherwise it may incorrectly -* call a timeout spurious if some previously retransmitted packets -* are s/acked (sec 3.2). We do not apply that retriction since -* retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS -* so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO -* on PTMU discovery to avoid sending new data. + /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous +* loss recovery is underway except recurring timeout(s) on +* the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing +* +* In theory F-RTO can be used repeatedly during loss recovery. +* In practice this interacts badly with broken middle-boxes that +* falsely raise the receive window, which results in repeated +* timeouts and stop-and-go behavior. */ - tp->frto = sysctl_tcp_frto && !inet_csk(sk)->icsk_mtup.probe_size; + tp->frto = sysctl_tcp_frto && + (new_recovery || icsk->icsk_retransmits) && + !inet_csk(sk)->icsk_mtup.probe_size; } /* If ACK arrived pointing to a remembered SACK, it means that our -- 2.12.2.715.g7642488e1d-goog
Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")
From: Eric Dumazet Date: Fri, 07 Apr 2017 11:38:49 -0700 > I do not really see what could be wrong, the code should run just fine > on UP. One theory is that the interrupt masking isn't working properly and interrupts are still arriving and hitting the NAPI state even when we are actively polling NAPI. And this problem was masked by the locking done here.
[PATCH net-next] netvsc: use napi_consume_skb
This allows using deferred skb freeing and with NAPI. And get buffer recycling. Signed-off-by: Stephen Hemminger --- drivers/net/hyperv/netvsc.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index e998e2f7a619..0b4df3f59ecf 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -601,7 +601,8 @@ static inline void netvsc_free_send_slot(struct netvsc_device *net_device, static void netvsc_send_tx_complete(struct netvsc_device *net_device, struct vmbus_channel *incoming_channel, struct hv_device *device, - const struct vmpacket_descriptor *desc) + const struct vmpacket_descriptor *desc, + int budget) { struct sk_buff *skb = (struct sk_buff *)(unsigned long)desc->trans_id; struct net_device *ndev = hv_get_drvdata(device); @@ -628,7 +629,7 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device, tx_stats->bytes += packet->total_bytes; u64_stats_update_end(&tx_stats->syncp); - dev_consume_skb_any(skb); + napi_consume_skb(skb, budget); } queue_sends = @@ -646,7 +647,8 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device, static void netvsc_send_completion(struct netvsc_device *net_device, struct vmbus_channel *incoming_channel, struct hv_device *device, - const struct vmpacket_descriptor *desc) + const struct vmpacket_descriptor *desc, + int budget) { struct nvsp_message *nvsp_packet = hv_pkt_data(desc); struct net_device *ndev = hv_get_drvdata(device); @@ -664,7 +666,7 @@ static void netvsc_send_completion(struct netvsc_device *net_device, case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE: netvsc_send_tx_complete(net_device, incoming_channel, - device, desc); + device, desc, budget); break; default: @@ -1174,14 +1176,16 @@ static int netvsc_process_raw_pkt(struct hv_device *device, struct vmbus_channel *channel, struct netvsc_device *net_device, struct net_device *ndev, - const struct vmpacket_descriptor *desc) + const struct vmpacket_descriptor *desc, + int budget) { struct net_device_context *net_device_ctx = netdev_priv(ndev); struct nvsp_message *nvmsg = hv_pkt_data(desc); switch (desc->type) { case VM_PKT_COMP: - netvsc_send_completion(net_device, channel, device, desc); + netvsc_send_completion(net_device, channel, device, + desc, budget); break; case VM_PKT_DATA_USING_XFER_PAGES: @@ -1230,7 +1234,7 @@ int netvsc_poll(struct napi_struct *napi, int budget) while (nvchan->desc && work_done < budget) { work_done += netvsc_process_raw_pkt(device, channel, net_device, - ndev, nvchan->desc); + ndev, nvchan->desc, budget); nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc); } -- 2.11.0
Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")
On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote: > Hi, > > My old P3 laptop started to die on me in the middle of larger compile > jobs (using distcc) after v4.11-rc. I bisected the problem > to 617f01211baf ("8139too: use napi_complete_done()"). > > Unfortunately I wasn't able to capture a full oops as the machine doesn't > have serial and ramoops failed me. I did get one partial oops on vgacon > which showed rtl8139_poll() being involved (EIP was around > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my > bisect result. > > So maybe some kind of nasty thing going between the hard irq and > softirq? Perhaps UP related? I tried to stare at the locking around > rtl8139_poll() for a while but it looked mostly sane to me. > Thanks a lot for the detective work, I am so sorry for this ! Could you try the following patch ? I do not really see what could be wrong, the code should run just fine on UP. Thanks. diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644 --- a/drivers/net/ethernet/realtek/8139too.c +++ b/drivers/net/ethernet/realtek/8139too.c @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget) if (likely(RTL_R16(IntrStatus) & RxAckBits)) work_done += rtl8139_rx(dev, tp, budget); - if (work_done < budget && napi_complete_done(napi, work_done)) { + if (work_done < budget) { unsigned long flags; spin_lock_irqsave(&tp->lock, flags); - RTL_W16_F(IntrMask, rtl8139_intr_mask); + if (napi_complete_done(napi, work_done)) + RTL_W16_F(IntrMask, rtl8139_intr_mask); spin_unlock_irqrestore(&tp->lock, flags); } spin_unlock(&tp->rx_lock);
Re: [RFC 2/3] genetlink: pass extended error report down
On 04/07/2017 11:26 AM, Johannes Berg wrote: From: Johannes Berg Signed-off-by: Johannes Berg --- include/net/genetlink.h | 27 +++ net/netlink/genetlink.c | 6 -- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/include/net/genetlink.h b/include/net/genetlink.h index a34275be3600..67ad2326cfa6 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -84,6 +84,7 @@ struct nlattr **genl_family_attrbuf(const struct genl_family *family); * @attrs: netlink attributes * @_net: network namespace * @user_ptr: user pointers + * @exterr: extended error report struct */ struct genl_info { u32 snd_seq; @@ -94,6 +95,7 @@ struct genl_info { struct nlattr **attrs; possible_net_t _net; void * user_ptr[2]; + struct netlink_ext_err *exterr; }; static inline struct net *genl_info_net(struct genl_info *info) @@ -106,6 +108,31 @@ static inline void genl_info_net_set(struct genl_info *info, struct net *net) write_pnet(&info->_net, net); } +static inline int genl_err_str(struct genl_info *info, int err, + const char *msg) +{ + info->exterr->msg = msg; + + return err; +} I guess the error string must be constant and always available in memory in this implementation? I think it would be nice to dynamically create strings (malloc, snprintf, etc) and have the err_str logic free it when done? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
[net-next 7/7] net/mlx5e: Set default RX moderation parameters on driver load
RX moderation default parameters shouldn't be set in mlx5e_build_rx_cq_param since it would reset the values every time on netdev open/close. Instead, it should be set in mlx5e_set_rx_cq_mode_params which is called on driver load only. Fixes: 6a9764efb255 ("net/mlx5e: Isolate open_channels from priv->params") Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 57844ffca37f..8b7b7e604ea0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1944,10 +1944,6 @@ static void mlx5e_build_rx_cq_param(struct mlx5e_priv *priv, } mlx5e_build_common_cq_param(priv, param); - - if (params->rx_am_enabled) - params->rx_cq_moderation = - mlx5e_am_get_def_profile(params->rx_cq_period_mode); } static void mlx5e_build_tx_cq_param(struct mlx5e_priv *priv, @@ -3787,6 +3783,10 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode) params->rx_cq_moderation.usec = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE; + if (params->rx_am_enabled) + params->rx_cq_moderation = + mlx5e_am_get_def_profile(params->rx_cq_period_mode); + MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_BASED_MODER, params->rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE); } -- 2.11.0
[net-next 4/7] net/mlx5e: Change FW sub_minor display to 4 zeros padding
From: Eran Ben Elisha FW version should be reported as X.Y., add leading zeroes to sub minor in order to fix it. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index af039b6c0799..167a8379156e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -42,7 +42,7 @@ static void mlx5e_get_drvinfo(struct net_device *dev, strlcpy(drvinfo->version, DRIVER_VERSION " (" DRIVER_RELDATE ")", sizeof(drvinfo->version)); snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), -"%d.%d.%d", +"%d.%d.%04d", fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev)); strlcpy(drvinfo->bus_info, pci_name(mdev->pdev), sizeof(drvinfo->bus_info)); -- 2.11.0
[net-next 3/7] net/mlx5e: Make mlx5e_modify_rqs_vsd a static function
From: Guy Ergas Make mlx5e_modify_rqs_vsd a static function and remove from en.h in order to reduce redundant exposure of functions. Signed-off-by: Guy Ergas Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 -- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index a58031da7fad..b7feecfbb5a5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -843,8 +843,6 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto, void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv); void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv); -int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd); - struct mlx5e_redirect_rqt_param { bool is_rss; union { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 83796ce17fc3..b57a6e72cc86 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2950,7 +2950,7 @@ static int mlx5e_modify_channels_scatter_fcs(struct mlx5e_channels *chs, bool en return 0; } -int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd) +static int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd) { int err = 0; int i; -- 2.11.0