Re: [PATCH v4] selftest: size: Add size test for Linux kernel
On Wed, Nov 26, 2014 at 08:27:23PM -0800, Tim Bird wrote: --- /dev/null +++ b/tools/testing/selftests/size/Makefile [...] +LIBGCC=$(shell $(CC) -print-libgcc-file-name) + +get_size: get_size.c + $(CC) --static -ffreestanding -nostartfiles \ + -Wl,--entry=_start get_size.c $(LIBGCC) \ + -o get_size You don't need -Wl,--entry=_start; that's the default. You shouldn't need to manually find libgcc, either; the compiler should do that for you. What goes wrong if you don't include that? If you're trying to link libgcc statically, try -static-libgcc. Also, static is normally spelled -static, not --static. --- /dev/null +++ b/tools/testing/selftests/size/get_size.c [...] +int print(const char *s) This function, and all the others apart from _start, should be declared static. +void num_to_str(unsigned long num, char *s) Likewise, static. +{ + unsigned long long temp, div; + int started; + + temp = num; + div = 100LL; + started = 0; + while (div) { + if (temp/div || started) { + *s++ = (unsigned char)(temp/div + '0'); + started = 1; + } + temp -= (temp/div)*div; + div /= 10; + } + *s = 0; +} You'd probably end up with significantly smaller code (and no divisions, and thus no corner cases on architectures that need a special function to do unsigned long long division) if you print in hex. You could also drop the no leading zeros logic, and just *always* print a 64-bit value as 16 hex digits. +int print_num(unsigned long num) Likewise, static. +{ + char num_buf[30]; + + num_to_str(num, num_buf); + return print(num_buf); +} + +int print_k_value(const char *s, unsigned long num, unsigned long units) +{ + unsigned long long temp; + int ccode; + + print(s); + + temp = num; + temp = (temp * units)/1024; + num = temp; + ccode = print_num(num); + print(\n); + return ccode; +} I'd suggest dropping this entirely, and just always printing the exact values returned by sysinfo. Drop the multiply, too, and just print info.mem_unit as well. It's easy to post-process the value in a more capable environment. +/* this program has no main(), as startup libraries are not used */ +void _start(void) +{ + int ccode; + struct sysinfo info; + unsigned long used; + + print(Testing system size.\n); + print(1..1\n); + + ccode = sysinfo(info); + if (ccode 0) { + print(not ok 1 get size runtime size\n); Shouldn't the not ok here and the ok below have the same test description? + print(# could not get sysinfo\n); + _exit(ccode); + } + /* ignore cache complexities for now */ + used = info.totalram - info.freeram - info.bufferram; + print_k_value(ok 1 get runtime memory use # size = , used, + info.mem_unit); + + print(# System runtime memory report (units in Kilobytes):\n); + print_k_value(# Total: , info.totalram, info.mem_unit); + print_k_value(# Free: , info.freeram, info.mem_unit); + print_k_value(# Buffer: , info.bufferram, info.mem_unit); + print_k_value(# In use: , used, info.mem_unit); + + _exit(0); +} -- 1.8.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the tip tree
On Mon, Nov 24, 2014 at 10:30:10PM -0800, John Stultz wrote: > On Mon, Nov 24, 2014 at 10:16 PM, Ingo Molnar wrote: > > > > * Stephen Rothwell wrote: > > > >> Hi Josh, > >> > >> Today's linux-next merge of the tiny tree got a conflict in > >> kernel/time/Makefile between commit fd866e2b116b ("time: Rename > >> udelay_test.c to test_udelay.c") from the tip tree and commit > >> d1f6d68d03ea ("kernel: time: Compile out NTP support") from the tiny > >> tree. > > > > So I think a timer subsystem commit d1f6d68d03ea with this > > magnitude of linecount increase: > > > > Signed-off-by: Catalina Mocanu > > [josh: Handle CONFIG_COMPAT=y.] > > Reviewed-by: Josh Triplett > > Signed-off-by: Josh Triplett > > --- > > drivers/pps/Kconfig| 2 +- > > include/linux/timex.h | 15 +-- > > init/Kconfig | 10 ++ > > kernel/compat.c| 8 ++-- > > kernel/sys_ni.c| 4 > > kernel/time/Makefile | 3 ++- > > kernel/time/ntp_internal.h | 28 > > kernel/time/posix-timers.c | 2 ++ > > kernel/time/time.c | 2 ++ > > kernel/time/timekeeping.c | 2 ++ > > 10 files changed, 70 insertions(+), 6 deletions(-) > > > > at minimum needs the ack of timer folks, before it can be > > committed to Git. Or is the tiny tree plan to submit all > > patches to the appropriate subsystem or gather acks, before > > sending it upstream? > > Yea. From first glance d1f6d68d03ea looks fairly broken. > > Returning 0 for ntp_tick_length() (which should be the current tick > length in NTP_SCALE_SHIFT shifted ns), seems like it would cause major > timekeeping problems. Ouch, yeah; I'm impressed the kernel successfully booted that way (which I did test). Computing the tick_length to return seems to require a div_u64; is it safe to initialize a static const with the result of calling div_u64, or does the intializer need manual constant-folding to make the expression compile-time computable? Going by the logic in ntp_update_frequency, it looks like the stub ntp_tick_length needs to return: tick_length_base = 0; tick_usec = TICK_USEC; second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << NTP_SCALE_SHIFT; new_base = div_u64(second_length, NTP_INTERVAL_FREQ); tick_length += new_base - tick_length_base; (tick_length starts out 0, gets new_base - 0 added initially, and every subsequent time gets 0 added since tick_length_base won't change.) Substituting and simplifying: tick_length = new_base = div_u64(second_length, NTP_INTERVAL_FREQ) = div_u64((TICK_USEC * NSEC_PER_USEC * USER_HZ) << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ) The numerator there could potentially be simplified, but I don't see an obvious way around the division by NTP_INTERVAL_FREQ (defined as HZ). - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the tip tree
On Tue, Nov 25, 2014 at 07:16:45AM +0100, Ingo Molnar wrote: > * Stephen Rothwell wrote: > > Hi Josh, > > > > Today's linux-next merge of the tiny tree got a conflict in > > kernel/time/Makefile between commit fd866e2b116b ("time: Rename > > udelay_test.c to test_udelay.c") from the tip tree and commit > > d1f6d68d03ea ("kernel: time: Compile out NTP support") from the tiny > > tree. > > So I think a timer subsystem commit d1f6d68d03ea with this > magnitude of linecount increase: > > Signed-off-by: Catalina Mocanu > [josh: Handle CONFIG_COMPAT=y.] > Reviewed-by: Josh Triplett > Signed-off-by: Josh Triplett > --- > drivers/pps/Kconfig| 2 +- > include/linux/timex.h | 15 +-- > init/Kconfig | 10 ++ > kernel/compat.c| 8 ++-- > kernel/sys_ni.c| 4 > kernel/time/Makefile | 3 ++- > kernel/time/ntp_internal.h | 28 > kernel/time/posix-timers.c | 2 ++ > kernel/time/time.c | 2 ++ > kernel/time/timekeeping.c | 2 ++ > 10 files changed, 70 insertions(+), 6 deletions(-) > > at minimum needs the ack of timer folks, before it can be > committed to Git. Or is the tiny tree plan to submit all > patches to the appropriate subsystem or gather acks, before > sending it upstream? Yes, absolutely. I planned to send out a tinification patch review series later this week with all 10 current patches (both those reviewed on LKML and those only reviewed elsewhere). - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/7] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Tue, Nov 25, 2014 at 12:00:59AM +0100, Pieter Smith wrote: > REPO: https://github.com/smipi1/linux-tinification.git > > BRANCH: tiny/config-syscall-splice > > BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort > ( > https://tiny.wiki.kernel.org/). > > GOAL: Support compiling out the splice family of syscalls (splice, vmsplice, > tee and sendfile) along with all supporting infrastructure if not needed. > Many embedded systems will not need the splice-family syscalls. Omitting > them > saves space. > > HISTORY: > PATCH v4: > - Drops __splice_p() > - Let nfsd fall back to non-splice support when splice is compiled out > - Style fixes [...] > RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set: > > add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399) I replied to one patch with a minor nit in the commit message. Other than that, I don't see any obvious issues with this. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/7] fs/splice: support compiling out splice-family syscalls
On Tue, Nov 25, 2014 at 12:01:02AM +0100, Pieter Smith wrote: > Many embedded systems will not need the splice-family syscalls (splice, > vmsplice, tee and sendfile). Omitting them saves space. This adds a new > EXPERT > config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out. > > The goal is to completely compile out fs/splice along with the syscalls. To > achieve this, the remaining patch-set will deal with fs/splice exports. As far > as possible, the impact on other device drivers will be minimized so as to > reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE. > > The use of exported functions will be solved by transparently mocking them out > with static inlines. Uses of the exported pipe_buf_operations struct however > require direct modification in fs/fuse and net/core. The next two patches will > deal with this. A macro is defined that will assist with NULL'ing out > callbacks > when CONFIG_SYSCALL_SPLICE is undefined: __splice_p(). This message needs updating, since the patch series doesn't introduce or use __splice_p anymore. > Once all exports are solved, fs/splice can be compiled out. > > The bloat benefit of this patch given a tinyconfig is: > > add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579) > function old new delta > splice_direct_to_actor 348 416 +68 > splice_to_pipe 371 417 +46 > splice_from_pipe_next107 106 -1 > fdput 11 - -11 > signal_pending39 26 -13 > fdget 56 42 -14 > user_page_pipe_buf_ops20 - -20 > user_page_pipe_buf_steal 25 - -25 > file_end_write58 29 -29 > file_start_write 68 34 -34 > pipe_to_user 43 - -43 > wakeup_pipe_readers 54 - -54 > do_splice_to 87 - -87 > ipipe_prep.part 92 - -92 > opipe_prep.part 119 --119 > sys_sendfile 122 --122 > sys_sendfile64 126 --126 > sys_vmsplice 137 --137 > vmsplice_to_user 205 --205 > sys_tee 491 --491 > do_sendfile 492 --492 > vmsplice_to_pipe 558 --558 > sys_splice 1020 - -1020 > > Signed-off-by: Pieter Smith > --- > fs/splice.c | 2 ++ > init/Kconfig| 10 ++ > kernel/sys_ni.c | 8 > 3 files changed, 20 insertions(+) > > diff --git a/fs/splice.c b/fs/splice.c > index 44b201b..7c4c695 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, > struct file *out, > return ret; > } > > +#ifdef CONFIG_SYSCALL_SPLICE > static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, > struct pipe_inode_info *opipe, > size_t len, unsigned int flags); > @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, > in_fd, > return do_sendfile(out_fd, in_fd, NULL, count, 0); > } > #endif > +#endif > > diff --git a/init/Kconfig b/init/Kconfig > index d811d5f..dec9819 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1571,6 +1571,16 @@ config NTP > system clock to an NTP server, you can disable this option to save > space. > > +config SYSCALL_SPLICE > + bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT > + default y > + help > + This option enables the splice, vmsplice, tee and sendfile syscalls. > These > + are used by applications to: move data between buffers and arbitrary > file > + descriptors; "copy" data between buffers; or copy data from userspace > into > + buffers. If building an embedded system where no applications use > these > + syscalls, you can disable this option to save space. > + > config PCI_QUIRKS > default y > bool "Enable PCI quirk workarounds" if EXPERT > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index d2f5b00..25d5551 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -170,6 +170,14 @@ cond_syscall(sys_fstat); > cond_syscall(sys_stat); > cond_syscall(sys_uname); > cond_syscall(sys_olduname); > +cond_syscall(sys_vmsplice); > +cond_syscall(sys_splice); > +cond_syscall(sys_tee); > +cond_syscall(sys_sendfile); > +cond_syscall(sys_sendfile64);
Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice
On Mon, Nov 24, 2014 at 10:49:31AM +0100, Pieter Smith wrote: > On Sun, Nov 23, 2014 at 03:23:02PM -0800, Josh Triplett wrote: > > On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote: > > > On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith wrote: > > > > To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. > > > > This > > > > struct is exported by fs/splice. The goal of the larger patch set is to > > > > completely compile out fs/splice, so uses of the exported struct need > > > > to be > > > > compiled out along with fs/splice. > > > > > > > > This patch therefore compiles out splice support in fs/fuse when > > > > CONFIG_SYSCALL_SPLICE is undefined. > > > > > > > > Signed-off-by: Pieter Smith > > > > --- > > > > fs/fuse/dev.c | 4 ++-- > > > > include/linux/fs.h | 6 ++ > > > > 2 files changed, 8 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > > > index ca88731..f8f92a4 100644 > > > > --- a/fs/fuse/dev.c > > > > +++ b/fs/fuse/dev.c > > > > @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, > > > > const struct iovec *iov, > > > > return fuse_dev_do_read(fc, file, , iov_length(iov, > > > > nr_segs)); > > > > } > > > > > > > > -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, > > > > +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, > > > > loff_t *ppos, > > > > struct pipe_inode_info *pipe, > > > > size_t len, unsigned int flags) > > > > { > > > > @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations > > > > = { > > > > .llseek = no_llseek, > > > > .read = do_sync_read, > > > > .aio_read = fuse_dev_read, > > > > - .splice_read= fuse_dev_splice_read, > > > > + .splice_read= __splice_p(fuse_dev_splice_read), > > > > .write = do_sync_write, > > > > .aio_write = fuse_dev_write, > > > > .splice_write = fuse_dev_splice_write, > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > > > index a957d43..04c0975 100644 > > > > --- a/include/linux/fs.h > > > > +++ b/include/linux/fs.h > > > > @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, > > > > loff_t start, loff_t end, > > > > int datasync); > > > > extern void block_sync_page(struct page *page); > > > > > > > > +#ifdef CONFIG_SYSCALL_SPLICE > > > > +#define __splice_p(x) x > > > > +#else > > > > +#define __splice_p(x) NULL > > > > +#endif > > > > + > > > > > > This needs to go into a different patch. > > > One logical change per patch please. :-) > > > > Easy enough to merge this one into the patch introducing > > CONFIG_SYSCALL_SPLICE, then. > > > > - Josh Triplett > > The patch introducing CONFIG_SYSCALL_SPLICE (PATCH 3) only compiles out the > syscalls. PATCH 6 on the other hand, compiles out fs/splice.c. This patch > allows fs/fuse to be compiled when fs/splice.c is compiled out. If I am to > squash it, it would be logical to include it in PATCH 6, not 3. The suggestion wasn't to move the fs/fuse/dev.c bits; those should definitely stay in this patch. The suggestion was just to move the bit of the patch defining __splice_p from this patch to patch 3. (Note that you need to define it before you use it, so it can't go in patch 6.) - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Mon, Nov 24, 2014 at 11:01:38AM +0100, Pieter Smith wrote: > On Sun, Nov 23, 2014 at 04:32:51PM -0800, Josh Triplett wrote: > > On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote: > > > On Sun, 23 Nov 2014 15:36:37 -0800 > > > Josh Triplett wrote: > > > > > > > On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: > > > > > On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: > > > > > > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: > > > > > > > Truly removing sendfile/sendpage means that you can't even > > > > > > > compile NFS > > > > > > > into the tree. > > > > > > > > > > > > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a > > > > > > large > > > > > > stack of "select" and "depends on", both directly and indirectly; > > > > > > adding > > > > > > a "select SPLICE_SYSCALL" to it seems fine. (That select does need > > > > > > adding, though. Pieter, you need to test-compile more than just > > > > > > tinyconfig and defconfig. Try an allyesconfig with *just* splice > > > > > > turned > > > > > > off, and make sure that compiles.) > > > > > > > > > > Did exacly that. Took forever on my hardware, but no problems. > > > > > > > > Ah, I see. Looking more closely at nfsd, it looks like it already has a > > > > code path for filesystems that don't do splice. I think, rather than > > > > making nfsd select SPLICE_SYSCALL, that it would suffice to change the > > > > "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c) > > > > to: > > > > > > > > rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); > > > > > > > > Then nfsd should simply *always* fall back to its non-splice support. > > > > > > > > > > I'd probably prefer the above, actually. We have to keep supporting > > > non-splice enabled fs' for the forseeable future, so we may as well > > > allow people to run nfsd in such configurations. It could even be > > > useful for testing the non-splice-enabled codepaths. > > > > Good point! > > > > - Josh Triplett > > I'll add this to svc_process_common. I can squash this into PATCH 3, which is > where the syscalls can be compiled out. The log entry may however get a little > crowded and multi-functional. > > Should I keep this as a separate patch? I'd keep it as a separate patch, yes. It doesn't become necessary until patch 6, so you can add it as a new patch between patches 3 and 6. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Mon, Nov 24, 2014 at 09:38:20AM +0100, Geert Uytterhoeven wrote: > On Mon, Nov 24, 2014 at 12:36 AM, Josh Triplett wrote: > > On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: > >> On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: > >> > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: > >> > > Truly removing sendfile/sendpage means that you can't even compile NFS > >> > > into the tree. > >> > > >> > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large > >> > stack of "select" and "depends on", both directly and indirectly; adding > >> > a "select SPLICE_SYSCALL" to it seems fine. (That select does need > >> > adding, though. Pieter, you need to test-compile more than just > >> > tinyconfig and defconfig. Try an allyesconfig with *just* splice turned > >> > off, and make sure that compiles.) > >> > >> Did exacly that. Took forever on my hardware, but no problems. > > > > Ah, I see. Looking more closely at nfsd, it looks like it already has a > > code path for filesystems that don't do splice. I think, rather than > > making nfsd select SPLICE_SYSCALL, that it would suffice to change the > > "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c) > > to: > > > > rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); > > > > Then nfsd should simply *always* fall back to its non-splice support. > > Hence I suggest adding to the nfsd help text: > > While nfsd works without SPLICE_SYSCALL, you may want to enable > SPLICE_SYSCALL for <...> (performance?) reasons. It already seems sufficiently unlikely to enable NFSD while disabling SPLICE_SYSCALL (in the latter case, turning on EXPERT to do so). It doesn't seem worth adding such a note to NFSD. At most, I'd say that NFSD might want a note somewhere in its documentation saying that it takes advantage of filesystems with splice support if serving files from one, and if running on a kernel that has splice. > (Hmm, does Kconfig need a "suggests", cfr. Debian package dependencies?) Perhaps, though that seems much lower priority than, for instance, transitive "select". - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Mon, Nov 24, 2014 at 09:38:20AM +0100, Geert Uytterhoeven wrote: On Mon, Nov 24, 2014 at 12:36 AM, Josh Triplett j...@joshtriplett.org wrote: On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: Truly removing sendfile/sendpage means that you can't even compile NFS into the tree. If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large stack of select and depends on, both directly and indirectly; adding a select SPLICE_SYSCALL to it seems fine. (That select does need adding, though. Pieter, you need to test-compile more than just tinyconfig and defconfig. Try an allyesconfig with *just* splice turned off, and make sure that compiles.) Did exacly that. Took forever on my hardware, but no problems. Ah, I see. Looking more closely at nfsd, it looks like it already has a code path for filesystems that don't do splice. I think, rather than making nfsd select SPLICE_SYSCALL, that it would suffice to change the rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c) to: rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); Then nfsd should simply *always* fall back to its non-splice support. Hence I suggest adding to the nfsd help text: While nfsd works without SPLICE_SYSCALL, you may want to enable SPLICE_SYSCALL for ... (performance?) reasons. It already seems sufficiently unlikely to enable NFSD while disabling SPLICE_SYSCALL (in the latter case, turning on EXPERT to do so). It doesn't seem worth adding such a note to NFSD. At most, I'd say that NFSD might want a note somewhere in its documentation saying that it takes advantage of filesystems with splice support if serving files from one, and if running on a kernel that has splice. (Hmm, does Kconfig need a suggests, cfr. Debian package dependencies?) Perhaps, though that seems much lower priority than, for instance, transitive select. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Mon, Nov 24, 2014 at 11:01:38AM +0100, Pieter Smith wrote: On Sun, Nov 23, 2014 at 04:32:51PM -0800, Josh Triplett wrote: On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote: On Sun, 23 Nov 2014 15:36:37 -0800 Josh Triplett j...@joshtriplett.org wrote: On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: Truly removing sendfile/sendpage means that you can't even compile NFS into the tree. If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large stack of select and depends on, both directly and indirectly; adding a select SPLICE_SYSCALL to it seems fine. (That select does need adding, though. Pieter, you need to test-compile more than just tinyconfig and defconfig. Try an allyesconfig with *just* splice turned off, and make sure that compiles.) Did exacly that. Took forever on my hardware, but no problems. Ah, I see. Looking more closely at nfsd, it looks like it already has a code path for filesystems that don't do splice. I think, rather than making nfsd select SPLICE_SYSCALL, that it would suffice to change the rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c) to: rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); Then nfsd should simply *always* fall back to its non-splice support. I'd probably prefer the above, actually. We have to keep supporting non-splice enabled fs' for the forseeable future, so we may as well allow people to run nfsd in such configurations. It could even be useful for testing the non-splice-enabled codepaths. Good point! - Josh Triplett I'll add this to svc_process_common. I can squash this into PATCH 3, which is where the syscalls can be compiled out. The log entry may however get a little crowded and multi-functional. Should I keep this as a separate patch? I'd keep it as a separate patch, yes. It doesn't become necessary until patch 6, so you can add it as a new patch between patches 3 and 6. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice
On Mon, Nov 24, 2014 at 10:49:31AM +0100, Pieter Smith wrote: On Sun, Nov 23, 2014 at 03:23:02PM -0800, Josh Triplett wrote: On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote: On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith pie...@boesman.nl wrote: To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This struct is exported by fs/splice. The goal of the larger patch set is to completely compile out fs/splice, so uses of the exported struct need to be compiled out along with fs/splice. This patch therefore compiles out splice support in fs/fuse when CONFIG_SYSCALL_SPLICE is undefined. Signed-off-by: Pieter Smith pie...@boesman.nl --- fs/fuse/dev.c | 4 ++-- include/linux/fs.h | 6 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ca88731..f8f92a4 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov, return fuse_dev_do_read(fc, file, cs, iov_length(iov, nr_segs)); } -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = { .llseek = no_llseek, .read = do_sync_read, .aio_read = fuse_dev_read, - .splice_read= fuse_dev_splice_read, + .splice_read= __splice_p(fuse_dev_splice_read), .write = do_sync_write, .aio_write = fuse_dev_write, .splice_write = fuse_dev_splice_write, diff --git a/include/linux/fs.h b/include/linux/fs.h index a957d43..04c0975 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync); extern void block_sync_page(struct page *page); +#ifdef CONFIG_SYSCALL_SPLICE +#define __splice_p(x) x +#else +#define __splice_p(x) NULL +#endif + This needs to go into a different patch. One logical change per patch please. :-) Easy enough to merge this one into the patch introducing CONFIG_SYSCALL_SPLICE, then. - Josh Triplett The patch introducing CONFIG_SYSCALL_SPLICE (PATCH 3) only compiles out the syscalls. PATCH 6 on the other hand, compiles out fs/splice.c. This patch allows fs/fuse to be compiled when fs/splice.c is compiled out. If I am to squash it, it would be logical to include it in PATCH 6, not 3. The suggestion wasn't to move the fs/fuse/dev.c bits; those should definitely stay in this patch. The suggestion was just to move the bit of the patch defining __splice_p from this patch to patch 3. (Note that you need to define it before you use it, so it can't go in patch 6.) - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/7] fs/splice: support compiling out splice-family syscalls
On Tue, Nov 25, 2014 at 12:01:02AM +0100, Pieter Smith wrote: Many embedded systems will not need the splice-family syscalls (splice, vmsplice, tee and sendfile). Omitting them saves space. This adds a new EXPERT config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out. The goal is to completely compile out fs/splice along with the syscalls. To achieve this, the remaining patch-set will deal with fs/splice exports. As far as possible, the impact on other device drivers will be minimized so as to reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE. The use of exported functions will be solved by transparently mocking them out with static inlines. Uses of the exported pipe_buf_operations struct however require direct modification in fs/fuse and net/core. The next two patches will deal with this. A macro is defined that will assist with NULL'ing out callbacks when CONFIG_SYSCALL_SPLICE is undefined: __splice_p(). This message needs updating, since the patch series doesn't introduce or use __splice_p anymore. Once all exports are solved, fs/splice can be compiled out. The bloat benefit of this patch given a tinyconfig is: add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579) function old new delta splice_direct_to_actor 348 416 +68 splice_to_pipe 371 417 +46 splice_from_pipe_next107 106 -1 fdput 11 - -11 signal_pending39 26 -13 fdget 56 42 -14 user_page_pipe_buf_ops20 - -20 user_page_pipe_buf_steal 25 - -25 file_end_write58 29 -29 file_start_write 68 34 -34 pipe_to_user 43 - -43 wakeup_pipe_readers 54 - -54 do_splice_to 87 - -87 ipipe_prep.part 92 - -92 opipe_prep.part 119 --119 sys_sendfile 122 --122 sys_sendfile64 126 --126 sys_vmsplice 137 --137 vmsplice_to_user 205 --205 sys_tee 491 --491 do_sendfile 492 --492 vmsplice_to_pipe 558 --558 sys_splice 1020 - -1020 Signed-off-by: Pieter Smith pie...@boesman.nl --- fs/splice.c | 2 ++ init/Kconfig| 10 ++ kernel/sys_ni.c | 8 3 files changed, 20 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 44b201b..7c4c695 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, return ret; } +#ifdef CONFIG_SYSCALL_SPLICE static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, struct pipe_inode_info *opipe, size_t len, unsigned int flags); @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, return do_sendfile(out_fd, in_fd, NULL, count, 0); } #endif +#endif diff --git a/init/Kconfig b/init/Kconfig index d811d5f..dec9819 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1571,6 +1571,16 @@ config NTP system clock to an NTP server, you can disable this option to save space. +config SYSCALL_SPLICE + bool Enable splice/vmsplice/tee/sendfile syscalls if EXPERT + default y + help + This option enables the splice, vmsplice, tee and sendfile syscalls. These + are used by applications to: move data between buffers and arbitrary file + descriptors; copy data between buffers; or copy data from userspace into + buffers. If building an embedded system where no applications use these + syscalls, you can disable this option to save space. + config PCI_QUIRKS default y bool Enable PCI quirk workarounds if EXPERT diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index d2f5b00..25d5551 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -170,6 +170,14 @@ cond_syscall(sys_fstat); cond_syscall(sys_stat); cond_syscall(sys_uname); cond_syscall(sys_olduname); +cond_syscall(sys_vmsplice); +cond_syscall(sys_splice); +cond_syscall(sys_tee); +cond_syscall(sys_sendfile); +cond_syscall(sys_sendfile64); +cond_syscall(compat_sys_vmsplice); +cond_syscall(compat_sys_sendfile);
Re: [PATCH v4 0/7] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Tue, Nov 25, 2014 at 12:00:59AM +0100, Pieter Smith wrote: REPO: https://github.com/smipi1/linux-tinification.git BRANCH: tiny/config-syscall-splice BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort ( https://tiny.wiki.kernel.org/). GOAL: Support compiling out the splice family of syscalls (splice, vmsplice, tee and sendfile) along with all supporting infrastructure if not needed. Many embedded systems will not need the splice-family syscalls. Omitting them saves space. HISTORY: PATCH v4: - Drops __splice_p() - Let nfsd fall back to non-splice support when splice is compiled out - Style fixes [...] RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set: add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399) I replied to one patch with a minor nit in the commit message. Other than that, I don't see any obvious issues with this. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the tip tree
On Tue, Nov 25, 2014 at 07:16:45AM +0100, Ingo Molnar wrote: * Stephen Rothwell s...@canb.auug.org.au wrote: Hi Josh, Today's linux-next merge of the tiny tree got a conflict in kernel/time/Makefile between commit fd866e2b116b (time: Rename udelay_test.c to test_udelay.c) from the tip tree and commit d1f6d68d03ea (kernel: time: Compile out NTP support) from the tiny tree. So I think a timer subsystem commit d1f6d68d03ea with this magnitude of linecount increase: Signed-off-by: Catalina Mocanu catalina.moc...@gmail.com [josh: Handle CONFIG_COMPAT=y.] Reviewed-by: Josh Triplett j...@joshtriplett.org Signed-off-by: Josh Triplett j...@joshtriplett.org --- drivers/pps/Kconfig| 2 +- include/linux/timex.h | 15 +-- init/Kconfig | 10 ++ kernel/compat.c| 8 ++-- kernel/sys_ni.c| 4 kernel/time/Makefile | 3 ++- kernel/time/ntp_internal.h | 28 kernel/time/posix-timers.c | 2 ++ kernel/time/time.c | 2 ++ kernel/time/timekeeping.c | 2 ++ 10 files changed, 70 insertions(+), 6 deletions(-) at minimum needs the ack of timer folks, before it can be committed to Git. Or is the tiny tree plan to submit all patches to the appropriate subsystem or gather acks, before sending it upstream? Yes, absolutely. I planned to send out a tinification patch review series later this week with all 10 current patches (both those reviewed on LKML and those only reviewed elsewhere). - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the tip tree
On Mon, Nov 24, 2014 at 10:30:10PM -0800, John Stultz wrote: On Mon, Nov 24, 2014 at 10:16 PM, Ingo Molnar mi...@kernel.org wrote: * Stephen Rothwell s...@canb.auug.org.au wrote: Hi Josh, Today's linux-next merge of the tiny tree got a conflict in kernel/time/Makefile between commit fd866e2b116b (time: Rename udelay_test.c to test_udelay.c) from the tip tree and commit d1f6d68d03ea (kernel: time: Compile out NTP support) from the tiny tree. So I think a timer subsystem commit d1f6d68d03ea with this magnitude of linecount increase: Signed-off-by: Catalina Mocanu catalina.moc...@gmail.com [josh: Handle CONFIG_COMPAT=y.] Reviewed-by: Josh Triplett j...@joshtriplett.org Signed-off-by: Josh Triplett j...@joshtriplett.org --- drivers/pps/Kconfig| 2 +- include/linux/timex.h | 15 +-- init/Kconfig | 10 ++ kernel/compat.c| 8 ++-- kernel/sys_ni.c| 4 kernel/time/Makefile | 3 ++- kernel/time/ntp_internal.h | 28 kernel/time/posix-timers.c | 2 ++ kernel/time/time.c | 2 ++ kernel/time/timekeeping.c | 2 ++ 10 files changed, 70 insertions(+), 6 deletions(-) at minimum needs the ack of timer folks, before it can be committed to Git. Or is the tiny tree plan to submit all patches to the appropriate subsystem or gather acks, before sending it upstream? Yea. From first glance d1f6d68d03ea looks fairly broken. Returning 0 for ntp_tick_length() (which should be the current tick length in NTP_SCALE_SHIFT shifted ns), seems like it would cause major timekeeping problems. Ouch, yeah; I'm impressed the kernel successfully booted that way (which I did test). Computing the tick_length to return seems to require a div_u64; is it safe to initialize a static const with the result of calling div_u64, or does the intializer need manual constant-folding to make the expression compile-time computable? Going by the logic in ntp_update_frequency, it looks like the stub ntp_tick_length needs to return: tick_length_base = 0; tick_usec = TICK_USEC; second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) NTP_SCALE_SHIFT; new_base = div_u64(second_length, NTP_INTERVAL_FREQ); tick_length += new_base - tick_length_base; (tick_length starts out 0, gets new_base - 0 added initially, and every subsequent time gets 0 added since tick_length_base won't change.) Substituting and simplifying: tick_length = new_base = div_u64(second_length, NTP_INTERVAL_FREQ) = div_u64((TICK_USEC * NSEC_PER_USEC * USER_HZ) NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ) The numerator there could potentially be simplified, but I don't see an obvious way around the division by NTP_INTERVAL_FREQ (defined as HZ). - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote: > On Sun, 23 Nov 2014 15:36:37 -0800 > Josh Triplett wrote: > > > On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: > > > On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: > > > > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: > > > > > Truly removing sendfile/sendpage means that you can't even compile NFS > > > > > into the tree. > > > > > > > > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large > > > > stack of "select" and "depends on", both directly and indirectly; adding > > > > a "select SPLICE_SYSCALL" to it seems fine. (That select does need > > > > adding, though. Pieter, you need to test-compile more than just > > > > tinyconfig and defconfig. Try an allyesconfig with *just* splice turned > > > > off, and make sure that compiles.) > > > > > > Did exacly that. Took forever on my hardware, but no problems. > > > > Ah, I see. Looking more closely at nfsd, it looks like it already has a > > code path for filesystems that don't do splice. I think, rather than > > making nfsd select SPLICE_SYSCALL, that it would suffice to change the > > "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c) > > to: > > > > rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); > > > > Then nfsd should simply *always* fall back to its non-splice support. > > > > I'd probably prefer the above, actually. We have to keep supporting > non-splice enabled fs' for the forseeable future, so we may as well > allow people to run nfsd in such configurations. It could even be > useful for testing the non-splice-enabled codepaths. Good point! - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: > On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: > > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: > > > Truly removing sendfile/sendpage means that you can't even compile NFS > > > into the tree. > > > > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large > > stack of "select" and "depends on", both directly and indirectly; adding > > a "select SPLICE_SYSCALL" to it seems fine. (That select does need > > adding, though. Pieter, you need to test-compile more than just > > tinyconfig and defconfig. Try an allyesconfig with *just* splice turned > > off, and make sure that compiles.) > > Did exacly that. Took forever on my hardware, but no problems. Ah, I see. Looking more closely at nfsd, it looks like it already has a code path for filesystems that don't do splice. I think, rather than making nfsd select SPLICE_SYSCALL, that it would suffice to change the "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c) to: rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); Then nfsd should simply *always* fall back to its non-splice support. That said, given that it seems exceedingly unlikely that anyone would use the in-kernel nfsd on a system trying to minimize kernel size, it still seems cleaner to just "select SPLICE_SYSCALL" from NFSD in Kconfig. That avoids making any changes at all to the nfsd source in this patch series. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice
On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote: > On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith wrote: > > To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This > > struct is exported by fs/splice. The goal of the larger patch set is to > > completely compile out fs/splice, so uses of the exported struct need to be > > compiled out along with fs/splice. > > > > This patch therefore compiles out splice support in fs/fuse when > > CONFIG_SYSCALL_SPLICE is undefined. > > > > Signed-off-by: Pieter Smith > > --- > > fs/fuse/dev.c | 4 ++-- > > include/linux/fs.h | 6 ++ > > 2 files changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > index ca88731..f8f92a4 100644 > > --- a/fs/fuse/dev.c > > +++ b/fs/fuse/dev.c > > @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, > > const struct iovec *iov, > > return fuse_dev_do_read(fc, file, , iov_length(iov, nr_segs)); > > } > > > > -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, > > +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t > > *ppos, > > struct pipe_inode_info *pipe, > > size_t len, unsigned int flags) > > { > > @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = { > > .llseek = no_llseek, > > .read = do_sync_read, > > .aio_read = fuse_dev_read, > > - .splice_read= fuse_dev_splice_read, > > + .splice_read= __splice_p(fuse_dev_splice_read), > > .write = do_sync_write, > > .aio_write = fuse_dev_write, > > .splice_write = fuse_dev_splice_write, > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index a957d43..04c0975 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t > > start, loff_t end, > > int datasync); > > extern void block_sync_page(struct page *page); > > > > +#ifdef CONFIG_SYSCALL_SPLICE > > +#define __splice_p(x) x > > +#else > > +#define __splice_p(x) NULL > > +#endif > > + > > This needs to go into a different patch. > One logical change per patch please. :-) Easy enough to merge this one into the patch introducing CONFIG_SYSCALL_SPLICE, then. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: > Truly removing sendfile/sendpage means that you can't even compile NFS > into the tree. If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large stack of "select" and "depends on", both directly and indirectly; adding a "select SPLICE_SYSCALL" to it seems fine. (That select does need adding, though. Pieter, you need to test-compile more than just tinyconfig and defconfig. Try an allyesconfig with *just* splice turned off, and make sure that compiles.) Given the requirements of running a file server in the kernel, I'd expect CONFIG_NFSD to end up with several more selects of optional functionality in the future. It seems rather likely that the average embedded system will be compiling out NFS. :) Also, this patch series compiles out splice and sendfile, including several *users* of sendpage; it doesn't compile out the sendpage support/infrastructure itself. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: Truly removing sendfile/sendpage means that you can't even compile NFS into the tree. If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large stack of select and depends on, both directly and indirectly; adding a select SPLICE_SYSCALL to it seems fine. (That select does need adding, though. Pieter, you need to test-compile more than just tinyconfig and defconfig. Try an allyesconfig with *just* splice turned off, and make sure that compiles.) Given the requirements of running a file server in the kernel, I'd expect CONFIG_NFSD to end up with several more selects of optional functionality in the future. It seems rather likely that the average embedded system will be compiling out NFS. :) Also, this patch series compiles out splice and sendfile, including several *users* of sendpage; it doesn't compile out the sendpage support/infrastructure itself. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice
On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote: On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith pie...@boesman.nl wrote: To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This struct is exported by fs/splice. The goal of the larger patch set is to completely compile out fs/splice, so uses of the exported struct need to be compiled out along with fs/splice. This patch therefore compiles out splice support in fs/fuse when CONFIG_SYSCALL_SPLICE is undefined. Signed-off-by: Pieter Smith pie...@boesman.nl --- fs/fuse/dev.c | 4 ++-- include/linux/fs.h | 6 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ca88731..f8f92a4 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov, return fuse_dev_do_read(fc, file, cs, iov_length(iov, nr_segs)); } -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = { .llseek = no_llseek, .read = do_sync_read, .aio_read = fuse_dev_read, - .splice_read= fuse_dev_splice_read, + .splice_read= __splice_p(fuse_dev_splice_read), .write = do_sync_write, .aio_write = fuse_dev_write, .splice_write = fuse_dev_splice_write, diff --git a/include/linux/fs.h b/include/linux/fs.h index a957d43..04c0975 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync); extern void block_sync_page(struct page *page); +#ifdef CONFIG_SYSCALL_SPLICE +#define __splice_p(x) x +#else +#define __splice_p(x) NULL +#endif + This needs to go into a different patch. One logical change per patch please. :-) Easy enough to merge this one into the patch introducing CONFIG_SYSCALL_SPLICE, then. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: Truly removing sendfile/sendpage means that you can't even compile NFS into the tree. If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large stack of select and depends on, both directly and indirectly; adding a select SPLICE_SYSCALL to it seems fine. (That select does need adding, though. Pieter, you need to test-compile more than just tinyconfig and defconfig. Try an allyesconfig with *just* splice turned off, and make sure that compiles.) Did exacly that. Took forever on my hardware, but no problems. Ah, I see. Looking more closely at nfsd, it looks like it already has a code path for filesystems that don't do splice. I think, rather than making nfsd select SPLICE_SYSCALL, that it would suffice to change the rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c) to: rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); Then nfsd should simply *always* fall back to its non-splice support. That said, given that it seems exceedingly unlikely that anyone would use the in-kernel nfsd on a system trying to minimize kernel size, it still seems cleaner to just select SPLICE_SYSCALL from NFSD in Kconfig. That avoids making any changes at all to the nfsd source in this patch series. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote: On Sun, 23 Nov 2014 15:36:37 -0800 Josh Triplett j...@joshtriplett.org wrote: On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote: On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote: On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote: Truly removing sendfile/sendpage means that you can't even compile NFS into the tree. If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large stack of select and depends on, both directly and indirectly; adding a select SPLICE_SYSCALL to it seems fine. (That select does need adding, though. Pieter, you need to test-compile more than just tinyconfig and defconfig. Try an allyesconfig with *just* splice turned off, and make sure that compiles.) Did exacly that. Took forever on my hardware, but no problems. Ah, I see. Looking more closely at nfsd, it looks like it already has a code path for filesystems that don't do splice. I think, rather than making nfsd select SPLICE_SYSCALL, that it would suffice to change the rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c) to: rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL); Then nfsd should simply *always* fall back to its non-splice support. I'd probably prefer the above, actually. We have to keep supporting non-splice enabled fs' for the forseeable future, so we may as well allow people to run nfsd in such configurations. It could even be useful for testing the non-splice-enabled codepaths. Good point! - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] net/core: support compiling out splice
[Please don't top-post.] On Sat, Nov 22, 2014 at 11:50:51PM +0100, Pieter Smith wrote: > splice exports a structure that is used by skbuf. Mocking out a function is > straightforward. To my knowledge there is no elegant way of mocking out a > splice_operations struct. I directly modified the code to prevent linking > against the struct. Do you know of a better technique to get the same > result? No, I don't. The approach you took seems fine; I'm just saying that you need to explain the need for it in the commit message. - JosH Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] fs/splice: full support for compiling out splice
On Sat, Nov 22, 2014 at 10:00:01PM +0100, Pieter Smith wrote: > Entirely compile out splice translation unit when the system is configured > without splice family of syscalls (i.e. CONFIG_SYSCALL_SPLICE is undefined). > > add/remove: 0/25 grow/shrink: 0/5 up/down: 0/-4845 (-4845) Very nice! - Josh Triplett > function old new delta > pipe_to_null 4 - -4 > generic_pipe_buf_nosteal 6 - -6 > spd_release_page 10 - -10 > PageUptodate 22 11 -11 > lock_page 36 24 -12 > page_cache_pipe_buf_release 16 - -16 > splice_write_null 24 4 -20 > page_cache_pipe_buf_ops 20 - -20 > nosteal_pipe_buf_ops 20 - -20 > default_pipe_buf_ops 20 - -20 > generic_splice_sendpage 24 - -24 > splice_shrink_spd 27 - -27 > direct_splice_actor 47 - -47 > default_file_splice_write 49 - -49 > wakeup_pipe_writers 54 - -54 > write_pipe_buf71 - -71 > page_cache_pipe_buf_confirm 80 - -80 > splice_grow_spd 87 - -87 > splice_from_pipe 93 - -93 > splice_from_pipe_next106 --106 > pipe_to_sendpage 109 --109 > page_cache_pipe_buf_steal114 --114 > generic_file_splice_read 131 8-123 > do_splice_direct 148 --148 > __splice_from_pipe 246 --246 > splice_direct_to_actor 416 --416 > splice_to_pipe 417 --417 > default_file_splice_read 688 --688 > iter_file_splice_write 702 4-698 > __generic_file_splice_read 1109 - -1109 > > Signed-off-by: Pieter Smith > --- > fs/Makefile| 3 ++- > fs/splice.c| 2 -- > include/linux/fs.h | 26 ++ > include/linux/splice.h | 43 +++ > 4 files changed, 71 insertions(+), 3 deletions(-) > > diff --git a/fs/Makefile b/fs/Makefile > index fb7646e..9395622 100644 > --- a/fs/Makefile > +++ b/fs/Makefile > @@ -10,7 +10,7 @@ obj-y :=open.o read_write.o file_table.o super.o \ > ioctl.o readdir.o select.o dcache.o inode.o \ > attr.o bad_inode.o file.o filesystems.o namespace.o \ > seq_file.o xattr.o libfs.o fs-writeback.o \ > - pnode.o splice.o sync.o utimes.o \ > + pnode.o sync.o utimes.o \ > stack.o fs_struct.o statfs.o fs_pin.o > > ifeq ($(CONFIG_BLOCK),y) > @@ -22,6 +22,7 @@ endif > obj-$(CONFIG_PROC_FS) += proc_namespace.o > > obj-$(CONFIG_FSNOTIFY) += notify/ > +obj-$(CONFIG_SYSCALL_SPLICE) += splice.o > obj-$(CONFIG_EPOLL) += eventpoll.o > obj-$(CONFIG_ANON_INODES)+= anon_inodes.o > obj-$(CONFIG_SIGNALFD) += signalfd.o > diff --git a/fs/splice.c b/fs/splice.c > index 7c4c695..44b201b 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1316,7 +1316,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, > struct file *out, > return ret; > } > > -#ifdef CONFIG_SYSCALL_SPLICE > static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, > struct pipe_inode_info *opipe, > size_t len, unsigned int flags); > @@ -2201,5 +2200,4 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, > in_fd, > return do_sendfile(out_fd, in_fd, NULL, count, 0); > } > #endif > -#endif > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 04c0975..9b3054e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2449,6 +2449,7 @@ extern void block_sync_page(struct page *page); > #define __splice_p(x) NULL > #endif > > +#ifdef CONFIG_SYSCALL_SPLICE > /* fs/splice.c */ > extern ssize_t generic_file_splice_read(struct file *, loff_t *, > struct pipe_inode_info *, size_t, unsigned int); > @@ -2458,6 +2459,31 @@ extern ssize_t iter
Re: [PATCH 3/6] fs/splice: support compiling out splice-family syscalls
On Sat, Nov 22, 2014 at 09:59:58PM +0100, Pieter Smith wrote: > Many embedded systems will not need the splice-family syscalls (splice, > vmsplice, tee and sendfile). Omitting them saves space. This adds a new > EXPERT > config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out. > > This patch removes almost all callers of .splice_read() and .splice_write() > in the file_operations struct. This paves the way to eventually compile out > the > .splice_read and .splice_write members of the file_operations struct as well > as > the remaining splice-related infrastructure. This commit message doesn't reflect the new approach of leaving those members in the structure. The patch looks good otherwise. - Josh Triplett > add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579) > function old new delta > splice_direct_to_actor 348 416 +68 > splice_to_pipe 371 417 +46 > splice_from_pipe_next107 106 -1 > fdput 11 - -11 > signal_pending39 26 -13 > fdget 56 42 -14 > user_page_pipe_buf_ops20 - -20 > user_page_pipe_buf_steal 25 - -25 > file_end_write58 29 -29 > file_start_write 68 34 -34 > pipe_to_user 43 - -43 > wakeup_pipe_readers 54 - -54 > do_splice_to 87 - -87 > ipipe_prep.part 92 - -92 > opipe_prep.part 119 --119 > sys_sendfile 122 --122 > sys_sendfile64 126 --126 > sys_vmsplice 137 --137 > vmsplice_to_user 205 --205 > sys_tee 491 --491 > do_sendfile 492 --492 > vmsplice_to_pipe 558 --558 > sys_splice 1020 - -1020 > > Signed-off-by: Pieter Smith > --- > fs/splice.c | 2 ++ > init/Kconfig| 10 ++ > kernel/sys_ni.c | 8 > 3 files changed, 20 insertions(+) > > diff --git a/fs/splice.c b/fs/splice.c > index 44b201b..7c4c695 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, > struct file *out, > return ret; > } > > +#ifdef CONFIG_SYSCALL_SPLICE > static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, > struct pipe_inode_info *opipe, > size_t len, unsigned int flags); > @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, > in_fd, > return do_sendfile(out_fd, in_fd, NULL, count, 0); > } > #endif > +#endif > > diff --git a/init/Kconfig b/init/Kconfig > index d811d5f..dec9819 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1571,6 +1571,16 @@ config NTP > system clock to an NTP server, you can disable this option to save > space. > > +config SYSCALL_SPLICE > + bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT > + default y > + help > + This option enables the splice, vmsplice, tee and sendfile syscalls. > These > + are used by applications to: move data between buffers and arbitrary > file > + descriptors; "copy" data between buffers; or copy data from userspace > into > + buffers. If building an embedded system where no applications use > these > + syscalls, you can disable this option to save space. > + > config PCI_QUIRKS > default y > bool "Enable PCI quirk workarounds" if EXPERT > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index d2f5b00..25d5551 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -170,6 +170,14 @@ cond_syscall(sys_fstat); > cond_syscall(sys_stat); > cond_syscall(sys_uname); > cond_syscall(sys_olduname); > +cond_syscall(sys_vmsplice); > +cond_syscall(sys_splice); > +cond_syscall(sys_tee); > +cond_syscall(sys_sendfile); > +cond_syscall(sys_sendfile64); > +cond_syscall(compat_sys_vmsplice); > +cond_syscall(compat_sys_sendfile); > +cond_syscall(compat_sys_sendfile64); > > /* arch-specific weak syscall entries */ > cond_syscall(sys_pciconfig_read); > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] net/core: support compiling out splice
On Sat, Nov 22, 2014 at 10:00:00PM +0100, Pieter Smith wrote: > Compile out splice support from networking core when the splice-family of > syscalls is not supported by the system (i.e. CONFIG_SYSCALL_SPLICE is > undefined). Please explain in the commit message why this particular bit of splice support needs compiling out when most other bits do not. Also, a couple of style comments below. > Signed-off-by: Pieter Smith > --- > include/linux/skbuff.h | 9 + > net/core/skbuff.c | 9 ++--- > 2 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index a59d934..8c28524 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -2640,9 +2640,18 @@ int skb_copy_bits(const struct sk_buff *skb, int > offset, void *to, int len); > int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int > len); > __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, u8 *to, > int len, __wsum csum); > +#ifdef CONFIG_SYSCALL_SPLICE > int skb_splice_bits(struct sk_buff *skb, unsigned int offset, > struct pipe_inode_info *pipe, unsigned int len, > unsigned int flags); > +#else /* #ifdef CONFIG_SYSCALL_SPLICE */ > +static inline int skb_splice_bits(struct sk_buff *skb, unsigned int offset, > + struct pipe_inode_info *pipe, unsigned int len, > + unsigned int flags) > +{ > + return -EPERM; > +} > +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */ These comments, and the one added below in the corresponding .c file, don't match the prevailing style. These are so short (fitting on one screen) that they hardly seem worth the comments at all, but if you want to include them, s/#ifdef// in the comment. > void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); > unsigned int skb_zerocopy_headlen(const struct sk_buff *from); > int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 61059a0..74fad8a 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1781,9 +1781,9 @@ static bool __splice_segment(struct page *page, > unsigned int poff, > * Map linear and fragment data from the skb to spd. It reports true if the > * pipe is full or if we already spliced the requested length. > */ > -static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info > *pipe, > - unsigned int *offset, unsigned int *len, > - struct splice_pipe_desc *spd, struct sock *sk) > +static bool __maybe_unused __skb_splice_bits(struct sk_buff *skb, struct > pipe_inode_info *pipe, > + unsigned int *offset, unsigned int > *len, > + struct splice_pipe_desc *spd, > struct sock *sk) > { > int seg; > > @@ -1821,6 +1821,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, > struct pipe_inode_info *pipe, > * the frag list, if such a thing exists. We'd probably need to recurse to > * handle that cleanly. > */ > +#ifdef CONFIG_SYSCALL_SPLICE > int skb_splice_bits(struct sk_buff *skb, unsigned int offset, > struct pipe_inode_info *pipe, unsigned int tlen, > unsigned int flags) > @@ -1876,6 +1877,8 @@ done: > > return ret; > } > +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */ > + This extra blank line shouldn't be here. > /** > * skb_store_bits - store bits from kernel buffer to skb > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6] fs/fuse: support compiling out splice
On Sat, Nov 22, 2014 at 09:59:59PM +0100, Pieter Smith wrote: > Compile out splice support from fuse when the splice-family of syscalls is not > supported by the system (i.e. CONFIG_SYSCALL_SPLICE is undefined). This patch needs some additional explanation. Could you explain in the commit message why this particular splice_read function (and only the splice_read function, not the splice_write function) needs compiling out, when others do not? - Josh Triplett > Signed-off-by: Pieter Smith > --- > fs/fuse/dev.c | 4 ++-- > include/linux/fs.h | 6 ++ > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index ca88731..f8f92a4 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const > struct iovec *iov, > return fuse_dev_do_read(fc, file, , iov_length(iov, nr_segs)); > } > > -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, > +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t > *ppos, > struct pipe_inode_info *pipe, > size_t len, unsigned int flags) > { > @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = { > .llseek = no_llseek, > .read = do_sync_read, > .aio_read = fuse_dev_read, > - .splice_read= fuse_dev_splice_read, > + .splice_read= __splice_p(fuse_dev_splice_read), > .write = do_sync_write, > .aio_write = fuse_dev_write, > .splice_write = fuse_dev_splice_write, > diff --git a/include/linux/fs.h b/include/linux/fs.h > index a957d43..04c0975 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t > start, loff_t end, > int datasync); > extern void block_sync_page(struct page *page); > > +#ifdef CONFIG_SYSCALL_SPLICE > +#define __splice_p(x) x > +#else > +#define __splice_p(x) NULL > +#endif > + > /* fs/splice.c */ > extern ssize_t generic_file_splice_read(struct file *, loff_t *, > struct pipe_inode_info *, size_t, unsigned int); > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6] fs/fuse: support compiling out splice
On Sat, Nov 22, 2014 at 09:59:59PM +0100, Pieter Smith wrote: Compile out splice support from fuse when the splice-family of syscalls is not supported by the system (i.e. CONFIG_SYSCALL_SPLICE is undefined). This patch needs some additional explanation. Could you explain in the commit message why this particular splice_read function (and only the splice_read function, not the splice_write function) needs compiling out, when others do not? - Josh Triplett Signed-off-by: Pieter Smith pie...@boesman.nl --- fs/fuse/dev.c | 4 ++-- include/linux/fs.h | 6 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ca88731..f8f92a4 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov, return fuse_dev_do_read(fc, file, cs, iov_length(iov, nr_segs)); } -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = { .llseek = no_llseek, .read = do_sync_read, .aio_read = fuse_dev_read, - .splice_read= fuse_dev_splice_read, + .splice_read= __splice_p(fuse_dev_splice_read), .write = do_sync_write, .aio_write = fuse_dev_write, .splice_write = fuse_dev_splice_write, diff --git a/include/linux/fs.h b/include/linux/fs.h index a957d43..04c0975 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync); extern void block_sync_page(struct page *page); +#ifdef CONFIG_SYSCALL_SPLICE +#define __splice_p(x) x +#else +#define __splice_p(x) NULL +#endif + /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] net/core: support compiling out splice
On Sat, Nov 22, 2014 at 10:00:00PM +0100, Pieter Smith wrote: Compile out splice support from networking core when the splice-family of syscalls is not supported by the system (i.e. CONFIG_SYSCALL_SPLICE is undefined). Please explain in the commit message why this particular bit of splice support needs compiling out when most other bits do not. Also, a couple of style comments below. Signed-off-by: Pieter Smith pie...@boesman.nl --- include/linux/skbuff.h | 9 + net/core/skbuff.c | 9 ++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a59d934..8c28524 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2640,9 +2640,18 @@ int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len); int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len); __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, u8 *to, int len, __wsum csum); +#ifdef CONFIG_SYSCALL_SPLICE int skb_splice_bits(struct sk_buff *skb, unsigned int offset, struct pipe_inode_info *pipe, unsigned int len, unsigned int flags); +#else /* #ifdef CONFIG_SYSCALL_SPLICE */ +static inline int skb_splice_bits(struct sk_buff *skb, unsigned int offset, + struct pipe_inode_info *pipe, unsigned int len, + unsigned int flags) +{ + return -EPERM; +} +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */ These comments, and the one added below in the corresponding .c file, don't match the prevailing style. These are so short (fitting on one screen) that they hardly seem worth the comments at all, but if you want to include them, s/#ifdef// in the comment. void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); unsigned int skb_zerocopy_headlen(const struct sk_buff *from); int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 61059a0..74fad8a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1781,9 +1781,9 @@ static bool __splice_segment(struct page *page, unsigned int poff, * Map linear and fragment data from the skb to spd. It reports true if the * pipe is full or if we already spliced the requested length. */ -static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe, - unsigned int *offset, unsigned int *len, - struct splice_pipe_desc *spd, struct sock *sk) +static bool __maybe_unused __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe, + unsigned int *offset, unsigned int *len, + struct splice_pipe_desc *spd, struct sock *sk) { int seg; @@ -1821,6 +1821,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe, * the frag list, if such a thing exists. We'd probably need to recurse to * handle that cleanly. */ +#ifdef CONFIG_SYSCALL_SPLICE int skb_splice_bits(struct sk_buff *skb, unsigned int offset, struct pipe_inode_info *pipe, unsigned int tlen, unsigned int flags) @@ -1876,6 +1877,8 @@ done: return ret; } +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */ + This extra blank line shouldn't be here. /** * skb_store_bits - store bits from kernel buffer to skb -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/6] fs/splice: support compiling out splice-family syscalls
On Sat, Nov 22, 2014 at 09:59:58PM +0100, Pieter Smith wrote: Many embedded systems will not need the splice-family syscalls (splice, vmsplice, tee and sendfile). Omitting them saves space. This adds a new EXPERT config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out. This patch removes almost all callers of .splice_read() and .splice_write() in the file_operations struct. This paves the way to eventually compile out the .splice_read and .splice_write members of the file_operations struct as well as the remaining splice-related infrastructure. This commit message doesn't reflect the new approach of leaving those members in the structure. The patch looks good otherwise. - Josh Triplett add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579) function old new delta splice_direct_to_actor 348 416 +68 splice_to_pipe 371 417 +46 splice_from_pipe_next107 106 -1 fdput 11 - -11 signal_pending39 26 -13 fdget 56 42 -14 user_page_pipe_buf_ops20 - -20 user_page_pipe_buf_steal 25 - -25 file_end_write58 29 -29 file_start_write 68 34 -34 pipe_to_user 43 - -43 wakeup_pipe_readers 54 - -54 do_splice_to 87 - -87 ipipe_prep.part 92 - -92 opipe_prep.part 119 --119 sys_sendfile 122 --122 sys_sendfile64 126 --126 sys_vmsplice 137 --137 vmsplice_to_user 205 --205 sys_tee 491 --491 do_sendfile 492 --492 vmsplice_to_pipe 558 --558 sys_splice 1020 - -1020 Signed-off-by: Pieter Smith pie...@boesman.nl --- fs/splice.c | 2 ++ init/Kconfig| 10 ++ kernel/sys_ni.c | 8 3 files changed, 20 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 44b201b..7c4c695 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, return ret; } +#ifdef CONFIG_SYSCALL_SPLICE static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, struct pipe_inode_info *opipe, size_t len, unsigned int flags); @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, return do_sendfile(out_fd, in_fd, NULL, count, 0); } #endif +#endif diff --git a/init/Kconfig b/init/Kconfig index d811d5f..dec9819 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1571,6 +1571,16 @@ config NTP system clock to an NTP server, you can disable this option to save space. +config SYSCALL_SPLICE + bool Enable splice/vmsplice/tee/sendfile syscalls if EXPERT + default y + help + This option enables the splice, vmsplice, tee and sendfile syscalls. These + are used by applications to: move data between buffers and arbitrary file + descriptors; copy data between buffers; or copy data from userspace into + buffers. If building an embedded system where no applications use these + syscalls, you can disable this option to save space. + config PCI_QUIRKS default y bool Enable PCI quirk workarounds if EXPERT diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index d2f5b00..25d5551 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -170,6 +170,14 @@ cond_syscall(sys_fstat); cond_syscall(sys_stat); cond_syscall(sys_uname); cond_syscall(sys_olduname); +cond_syscall(sys_vmsplice); +cond_syscall(sys_splice); +cond_syscall(sys_tee); +cond_syscall(sys_sendfile); +cond_syscall(sys_sendfile64); +cond_syscall(compat_sys_vmsplice); +cond_syscall(compat_sys_sendfile); +cond_syscall(compat_sys_sendfile64); /* arch-specific weak syscall entries */ cond_syscall(sys_pciconfig_read); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] fs/splice: full support for compiling out splice
On Sat, Nov 22, 2014 at 10:00:01PM +0100, Pieter Smith wrote: Entirely compile out splice translation unit when the system is configured without splice family of syscalls (i.e. CONFIG_SYSCALL_SPLICE is undefined). add/remove: 0/25 grow/shrink: 0/5 up/down: 0/-4845 (-4845) Very nice! - Josh Triplett function old new delta pipe_to_null 4 - -4 generic_pipe_buf_nosteal 6 - -6 spd_release_page 10 - -10 PageUptodate 22 11 -11 lock_page 36 24 -12 page_cache_pipe_buf_release 16 - -16 splice_write_null 24 4 -20 page_cache_pipe_buf_ops 20 - -20 nosteal_pipe_buf_ops 20 - -20 default_pipe_buf_ops 20 - -20 generic_splice_sendpage 24 - -24 splice_shrink_spd 27 - -27 direct_splice_actor 47 - -47 default_file_splice_write 49 - -49 wakeup_pipe_writers 54 - -54 write_pipe_buf71 - -71 page_cache_pipe_buf_confirm 80 - -80 splice_grow_spd 87 - -87 splice_from_pipe 93 - -93 splice_from_pipe_next106 --106 pipe_to_sendpage 109 --109 page_cache_pipe_buf_steal114 --114 generic_file_splice_read 131 8-123 do_splice_direct 148 --148 __splice_from_pipe 246 --246 splice_direct_to_actor 416 --416 splice_to_pipe 417 --417 default_file_splice_read 688 --688 iter_file_splice_write 702 4-698 __generic_file_splice_read 1109 - -1109 Signed-off-by: Pieter Smith pie...@boesman.nl --- fs/Makefile| 3 ++- fs/splice.c| 2 -- include/linux/fs.h | 26 ++ include/linux/splice.h | 43 +++ 4 files changed, 71 insertions(+), 3 deletions(-) diff --git a/fs/Makefile b/fs/Makefile index fb7646e..9395622 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -10,7 +10,7 @@ obj-y :=open.o read_write.o file_table.o super.o \ ioctl.o readdir.o select.o dcache.o inode.o \ attr.o bad_inode.o file.o filesystems.o namespace.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ - pnode.o splice.o sync.o utimes.o \ + pnode.o sync.o utimes.o \ stack.o fs_struct.o statfs.o fs_pin.o ifeq ($(CONFIG_BLOCK),y) @@ -22,6 +22,7 @@ endif obj-$(CONFIG_PROC_FS) += proc_namespace.o obj-$(CONFIG_FSNOTIFY) += notify/ +obj-$(CONFIG_SYSCALL_SPLICE) += splice.o obj-$(CONFIG_EPOLL) += eventpoll.o obj-$(CONFIG_ANON_INODES)+= anon_inodes.o obj-$(CONFIG_SIGNALFD) += signalfd.o diff --git a/fs/splice.c b/fs/splice.c index 7c4c695..44b201b 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1316,7 +1316,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, return ret; } -#ifdef CONFIG_SYSCALL_SPLICE static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, struct pipe_inode_info *opipe, size_t len, unsigned int flags); @@ -2201,5 +2200,4 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, return do_sendfile(out_fd, in_fd, NULL, count, 0); } #endif -#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index 04c0975..9b3054e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2449,6 +2449,7 @@ extern void block_sync_page(struct page *page); #define __splice_p(x) NULL #endif +#ifdef CONFIG_SYSCALL_SPLICE /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); @@ -2458,6 +2459,31 @@ extern ssize_t iter_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out, loff_t *, size_t len, unsigned int flags); +#else +static inline ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe
Re: [PATCH 5/6] net/core: support compiling out splice
[Please don't top-post.] On Sat, Nov 22, 2014 at 11:50:51PM +0100, Pieter Smith wrote: splice exports a structure that is used by skbuf. Mocking out a function is straightforward. To my knowledge there is no elegant way of mocking out a splice_operations struct. I directly modified the code to prevent linking against the struct. Do you know of a better technique to get the same result? No, I don't. The approach you took seems fine; I'm just saying that you need to explain the need for it in the commit message. - JosH Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: [PATCH 1/1] init:add boot option "initramfs_packnum"
On Fri, Nov 21, 2014 at 11:46:42AM +0800, leizhen wrote: > During the development phase, we usually enlarge initrd-end to reserve more > memory than the initfs zip exactly occupied. Then we can easily add or delete > files in zip package, without generate fdt again and again. But unfortunately, > if too many zeros followed initfs zip, it will take a long time to break the > loop. > > while (!message && len) { > ... ... > if (!*buf) { > buf++; > ... ... > continue; > } > > So, use the boot option "initramfs_packnum" to specify how many zip packages > in each initrd area. When the specified number of packages decompressed in one > area, immediately terminate the loop. Have no impact on current use by > default. This seems like the wrong approach to me. Rather than changing Linux to ignore excess data in the initramfs, could you change your bootloader to pass the true length of the initramfs to Linux? Even if you reserve extra space, if you know how much data you've actually included, you could have your bootloader pass that information to Linux. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: [PATCH 1/1] init:add boot option initramfs_packnum
On Fri, Nov 21, 2014 at 11:46:42AM +0800, leizhen wrote: During the development phase, we usually enlarge initrd-end to reserve more memory than the initfs zip exactly occupied. Then we can easily add or delete files in zip package, without generate fdt again and again. But unfortunately, if too many zeros followed initfs zip, it will take a long time to break the loop. while (!message len) { ... ... if (!*buf) { buf++; ... ... continue; } So, use the boot option initramfs_packnum to specify how many zip packages in each initrd area. When the specified number of packages decompressed in one area, immediately terminate the loop. Have no impact on current use by default. This seems like the wrong approach to me. Rather than changing Linux to ignore excess data in the initramfs, could you change your bootloader to pass the true length of the initramfs to Linux? Even if you reserve extra space, if you know how much data you've actually included, you could have your bootloader pass that information to Linux. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the net-next tree
On Mon, Nov 17, 2014 at 04:35:03PM +1100, Stephen Rothwell wrote: > Hi Josh, > > Today's linux-next merge of the tiny tree got a conflict in > net/openvswitch/Kconfig between commit 8cd4313aa775 ("openvswitch: Fix > build failure") from the net-next tree and commit b043d487e255 ("lib: > Conditionally compile flex_array") from the tiny tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). Looks good to me. > diff --cc net/openvswitch/Kconfig > index b7d818c59423,1d979cecd66e.. > --- a/net/openvswitch/Kconfig > +++ b/net/openvswitch/Kconfig > @@@ -4,9 -4,8 +4,10 @@@ > > config OPENVSWITCH > tristate "Open vSwitch" > +depends on INET > select LIBCRC32C > +select NET_MPLS_GSO > + select FLEX_ARRAY > ---help--- > Open vSwitch is a multilayer Ethernet switch targeted at virtualized > environments. In addition to supporting a variety of features -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the net-next tree
On Mon, Nov 17, 2014 at 04:35:03PM +1100, Stephen Rothwell wrote: Hi Josh, Today's linux-next merge of the tiny tree got a conflict in net/openvswitch/Kconfig between commit 8cd4313aa775 (openvswitch: Fix build failure) from the net-next tree and commit b043d487e255 (lib: Conditionally compile flex_array) from the tiny tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). Looks good to me. diff --cc net/openvswitch/Kconfig index b7d818c59423,1d979cecd66e.. --- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@@ -4,9 -4,8 +4,10 @@@ config OPENVSWITCH tristate Open vSwitch +depends on INET select LIBCRC32C +select NET_MPLS_GSO + select FLEX_ARRAY ---help--- Open vSwitch is a multilayer Ethernet switch targeted at virtualized environments. In addition to supporting a variety of features -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sun, Nov 16, 2014 at 07:42:30AM -0800, Andy Lutomirski wrote: > On Sun, Nov 16, 2014 at 5:32 AM, Theodore Ts'o wrote: > > On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote: > >> > >> That may be a bug with the user namespace permission check. Perhaps we > >> shouldn't allow dropping groups that aren't mapped in the user > >> namespace. > > > > I'm not saying that we can't change the behavior of whether or not a > > user can drop a group permission. I'm just saying that we need to do > > so consciously. The setgroups()/getgroups() ABI isn't part of > > POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those > > people who care about that. > > It may make sense to reach out to some place like oss-security. > > FWIW, I think we should ask, at the same time, about: > > - Dropping supplementary groups. > - Switching gid/egid/sgid to a supplementary group. > - Denying ptrace of a process with supplementary groups that the > tracer doesn't have. I wonder how crazy it would be to just require either CAP_SYS_PTRACE or cred1 == cred2 (as in, you have *exactly* the same credentials as the target)? > Also, I much prefer a sysctl to a boot option. Boot options are nasty > to configure in many distributions. Agreed. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sun, Nov 16, 2014 at 08:32:30AM -0500, Theodore Ts'o wrote: > On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote: > > That may be a bug with the user namespace permission check. Perhaps we > > shouldn't allow dropping groups that aren't mapped in the user > > namespace. > > I'm not saying that we can't change the behavior of whether or not a > user can drop a group permission. I'm just saying that we need to do > so consciously. Agreed. > The setgroups()/getgroups() ABI isn't part of > POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those > people who care about that. POSIX.1-2001 actually specifies getgroups, but not setgroups. In any case, yes, POSIX doesn't say anything about this behavior. > The bigger deal is that it's very different from how BSD 4.x has > handled things, which means there is two decades of history that we're > looking at here. And there are times when taking away permissions in > an expected fashion can cause security problems. (As a silly example; > some architect at Digital wrote a spec that said that setuid must > return EINVAL for values greater than 32k --- back in the days when > uid's were a signed short. The junior programmer who implemented this > for Ultrix made the check for 32,000 decimal. Guess what happened > when /bin/login got a failure with setuid when it wasn't expecting one > --- since root could never get an error with that system call, right? Ignored it and kept going, starting the user's shell as root? I'd guess that a similar story motivated the note in the Linux manpages for setuid, setresuid, and similar, saying "Note: there are cases where setuid() can fail even when the caller is UID 0; it is a grave security error to omit checking for a failure return from setuid().". (Also, these days, glibc marks setuid and similar with the warn_unused_result attribute.) > And MIT Project Athena started ran out of lower numbered uid's and > froshlings started getting assigned uid's > 32,000) > > In this particular case, the change is probably a little less likely > to cause serious problems, although the fact that sudo does allow > negative group assignments is an example of another potential > breakage. > > OTOH, I'm aware of how this could cause major problems to the concept > of allowing an untrusted user to set up their own containers to > constrain what program with a possibly untrusted provinance might be > able to do. I can see times when I might want to run in a container > where the user didn't have access to groups that I have access to by > default --- including groups such as disk, sudo, lpadmin, etc. > > If we do want to make such a change, my suggestion is to keep things > *very* simple. Let it be a boot-time option whether or not users are > allowed to drop group permissions, and let it affect all possible ways > that users can drop groups. And we can create a shell script that > will search for the obvious ways that a user could get screwed by > enabling this, which we can encourage distributions to package up for > their end users. And then we document the heck out of the fact that > this option exists, and when/if we want to make it the default, so > it's perfectly clear and transparent to all what is happening. An option sounds sensible to me. I think a sysctl makes more sense, though. I'll add one in v4. What did you have in mind about the shell script? Something like: grep -r !% /etc/sudoers /etc/sudoers.d ? - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 51/56] drivers/char/mem: support compiling out splice
[Please don't top-post.] On Sun, Nov 16, 2014 at 01:18:27PM +0100, Pieter Smith wrote: > I had a look at the numbers (see below): For a tinyconfig kernel, savings > outside fs/splice.c are negligible. With the addition of networking > however, net/core and net/ipv4 can be squeezed for usable savings. > > Should I leave networking (and the rest) until we can come up with a > non-icky approach outside fs/splice.c? > > *THE NUMBERS* > > Given a tinyconfig, fs/splice.c can free up 8.2K (3.5K + 4.7K) with > CONFIG_SYSCALL_SPLICE. The rest are not worth the effort: > fs/splice: > syscalls only: add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 > (-3579) > the remainder: add/remove: 0/24 grow/shrink: 0/4 up/down: 0/-4824 > (-4824) Nice! Go ahead and submit the patches for that portion, and the rest can wait until we get compiler support for omitting structure fields. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 51/56] drivers/char/mem: support compiling out splice
[Please don't top-post.] On Sun, Nov 16, 2014 at 01:18:27PM +0100, Pieter Smith wrote: I had a look at the numbers (see below): For a tinyconfig kernel, savings outside fs/splice.c are negligible. With the addition of networking however, net/core and net/ipv4 can be squeezed for usable savings. Should I leave networking (and the rest) until we can come up with a non-icky approach outside fs/splice.c? *THE NUMBERS* Given a tinyconfig, fs/splice.c can free up 8.2K (3.5K + 4.7K) with CONFIG_SYSCALL_SPLICE. The rest are not worth the effort: fs/splice: syscalls only: add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579) the remainder: add/remove: 0/24 grow/shrink: 0/4 up/down: 0/-4824 (-4824) Nice! Go ahead and submit the patches for that portion, and the rest can wait until we get compiler support for omitting structure fields. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sun, Nov 16, 2014 at 08:32:30AM -0500, Theodore Ts'o wrote: On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote: That may be a bug with the user namespace permission check. Perhaps we shouldn't allow dropping groups that aren't mapped in the user namespace. I'm not saying that we can't change the behavior of whether or not a user can drop a group permission. I'm just saying that we need to do so consciously. Agreed. The setgroups()/getgroups() ABI isn't part of POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those people who care about that. POSIX.1-2001 actually specifies getgroups, but not setgroups. In any case, yes, POSIX doesn't say anything about this behavior. The bigger deal is that it's very different from how BSD 4.x has handled things, which means there is two decades of history that we're looking at here. And there are times when taking away permissions in an expected fashion can cause security problems. (As a silly example; some architect at Digital wrote a spec that said that setuid must return EINVAL for values greater than 32k --- back in the days when uid's were a signed short. The junior programmer who implemented this for Ultrix made the check for 32,000 decimal. Guess what happened when /bin/login got a failure with setuid when it wasn't expecting one --- since root could never get an error with that system call, right? Ignored it and kept going, starting the user's shell as root? I'd guess that a similar story motivated the note in the Linux manpages for setuid, setresuid, and similar, saying Note: there are cases where setuid() can fail even when the caller is UID 0; it is a grave security error to omit checking for a failure return from setuid().. (Also, these days, glibc marks setuid and similar with the warn_unused_result attribute.) And MIT Project Athena started ran out of lower numbered uid's and froshlings started getting assigned uid's 32,000) In this particular case, the change is probably a little less likely to cause serious problems, although the fact that sudo does allow negative group assignments is an example of another potential breakage. OTOH, I'm aware of how this could cause major problems to the concept of allowing an untrusted user to set up their own containers to constrain what program with a possibly untrusted provinance might be able to do. I can see times when I might want to run in a container where the user didn't have access to groups that I have access to by default --- including groups such as disk, sudo, lpadmin, etc. If we do want to make such a change, my suggestion is to keep things *very* simple. Let it be a boot-time option whether or not users are allowed to drop group permissions, and let it affect all possible ways that users can drop groups. And we can create a shell script that will search for the obvious ways that a user could get screwed by enabling this, which we can encourage distributions to package up for their end users. And then we document the heck out of the fact that this option exists, and when/if we want to make it the default, so it's perfectly clear and transparent to all what is happening. An option sounds sensible to me. I think a sysctl makes more sense, though. I'll add one in v4. What did you have in mind about the shell script? Something like: grep -r !% /etc/sudoers /etc/sudoers.d ? - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sun, Nov 16, 2014 at 07:42:30AM -0800, Andy Lutomirski wrote: On Sun, Nov 16, 2014 at 5:32 AM, Theodore Ts'o ty...@mit.edu wrote: On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote: That may be a bug with the user namespace permission check. Perhaps we shouldn't allow dropping groups that aren't mapped in the user namespace. I'm not saying that we can't change the behavior of whether or not a user can drop a group permission. I'm just saying that we need to do so consciously. The setgroups()/getgroups() ABI isn't part of POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those people who care about that. It may make sense to reach out to some place like oss-security. FWIW, I think we should ask, at the same time, about: - Dropping supplementary groups. - Switching gid/egid/sgid to a supplementary group. - Denying ptrace of a process with supplementary groups that the tracer doesn't have. I wonder how crazy it would be to just require either CAP_SYS_PTRACE or cred1 == cred2 (as in, you have *exactly* the same credentials as the target)? Also, I much prefer a sysctl to a boot option. Boot options are nasty to configure in many distributions. Agreed. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add supplementary UIDs, and getusers/setusers system calls
On Sat, Nov 15, 2014 at 11:08:31PM -0800, Josh Triplett wrote: > asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist); > +asmlinkage long sys_setusers(int uidsetsize, uid_t __user *grouplist); Obvious typo here: s/grouplist/userlist/. Will fix in a v2, but I'll wait for other feedback on v1 first. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH manpages] Document supplementary user IDs
Add new manpages for getusers(2) and setusers(2). Discuss supplementary UIDs in credentials(7). Update manpages for seteuid(2), setfsuid(2), setresuid(2), setreuid(2), and setuid(2). Signed-off-by: Josh Triplett --- man2/getusers.2| 177 + man2/seteuid.2 | 10 ++- man2/setfsuid.2| 4 +- man2/setresuid.2 | 8 ++- man2/setreuid.2| 12 +++- man2/setuid.2 | 7 ++- man2/setusers.2| 1 + man7/credentials.7 | 21 ++- 8 files changed, 229 insertions(+), 11 deletions(-) create mode 100644 man2/getusers.2 create mode 100644 man2/setusers.2 diff --git a/man2/getusers.2 b/man2/getusers.2 new file mode 100644 index 000..e5dd371 --- /dev/null +++ b/man2/getusers.2 @@ -0,0 +1,177 @@ +.\" Copyright 2014 Josh Triplett +.\" Based on getgroups.2: +.\" Copyright 1993 Rickard E. Faith (fa...@cs.unc.edu) +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.\" Modified Thu Oct 31 12:04:29 1996 by Eric S. Raymond +.\" Modified, 27 May 2004, Michael Kerrisk +.\" Added notes on capability requirements +.\" 2008-05-03, mtk, expanded and rewrote parts of DESCRIPTION and RETURN +.\" VALUE, made style of page more consistent with man-pages style. +.\" +.TH GETUSERS 2 2014-10-20 "Linux" "Linux Programmer's Manual" +.SH NAME +getusers, setusers \- get/set list of supplementary user IDs +.SH SYNOPSIS +.B #include +.br +.B #include +.sp +.BI "int getusers(int " size ", uid_t *" list ); +.sp +.BI "int setusers(int " size ", const uid_t *" list ); +.sp +.in -4n +Feature Test Macro Requirements for glibc (see +.BR feature_test_macros (7)): +.in +.sp +.BR getusers (), +.BR setusers (): +_GNU_SOURCE +.SH DESCRIPTION +.PP +.BR getusers () +returns the supplementary user IDs of the calling process in +.IR list . +The argument +.I size +should be set to the maximum number of items that can be stored in the +buffer pointed to by +.IR list . +If the calling process has more than +.I size +supplementary user IDs, +.BR getusers () +returns an error. +The returned list may or may not include the real, effective, saved, or filesystem user ID of the calling process. + +If +.I size +is zero, +.I list +is not modified, but the total number of supplementary user IDs for the +process is returned. +This allows the caller to determine the size of a dynamically allocated +.I list +to be used in a further call to +.BR getusers (). +.PP +.BR setusers () +sets the supplementary user IDs for the calling process. +The +.I size +argument specifies the number of supplementary user IDs +in the buffer pointed to by +.IR list . + +Any process may drop user IDs from its supplementary list; however, adding user +IDs not already present in the supplementary list or in the real, effective, or +saved user ID requires the +.B CAP_SETUID +capability. +.SH RETURN VALUE +On success, +.BR getusers () +returns the number of supplementary user IDs. +On error, \-1 is returned, and +.I errno +is set appropriately. + +On success, +.BR setusers () +returns 0. +On error, \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EFAULT +.I list +has an invalid address. +.PP +.BR getusers () +can additionally fail with the following error: +.TP +.B EINVAL +.I size +is less than the number of supplementary user IDs, but is not zero. +.PP +.BR setusers () +can additionally fail with the following errors: +.TP +.B EINVAL +.I size +is greater than +.B NUSERS_MAX +(65536). +.TP +.B ENOMEM +Out of memory. +.TP +.B EPERM +.I list +contains user IDs not present in the current supplemental user IDs list, +and not equal to the real, effective, or saved user ID, and the calling +process does not ha
[PATCH] Add supplementary UIDs, and getusers/setusers system calls
Analogous to the supplementary GID list, the supplementary UID list provides a set of additional user credentials that a process can act as. A process with CAP_SETUID can set its UID list arbitrarily; a process without CAP_SETUID can only reduce its UID list. This allows each user to have a set of UIDs that they can then use to further sandbox individual child processes without first escalating to root to change UIDs. For instance, a PAM module could give each user a block of UIDs to work with. Tested via the following test program: #include #include #include #include static int getusers(int count, uid_t *uids) { return syscall(322, count, uids); } static int setusers(int count, const uid_t *uids) { return syscall(323, count, uids); } static void show_users(void) { uid_t uids[65536]; int i, count = getusers(65536, uids); if (count < 0) err(1, "getusers"); printf("UIDs:"); for (i = 0; i < count; i++) printf(" %u", (unsigned)uids[i]); printf("\n"); } int main(void) { uid_t list1[] = { 1, 2, 3, 4, 5 }; uid_t list2[] = { 1, 2, 3, 4 }; uid_t list3[] = { 2, 3, 4 }; show_users(); if (setusers(5, list1) < 0) err(1, "setusers 1"); show_users(); if (setresgid(1, 1, 1) < 0) err(1, "setresgid"); if (setresuid(1, 1, 1) < 0) err(1, "setresuid"); if (setusers(4, list2) < 0) err(1, "setusers 2"); show_users(); if (setusers(3, list3) < 0) err(1, "setusers 3"); show_users(); if (setusers(4, list2) < 0) err(1, "setusers 4"); show_users(); if (setresuid(2, 2, 2) < 0) err(1, "setresuid 2"); if (setusers(5, list1) < 0) err(1, "setusers 5"); show_users(); return 0; } In this test, all but the last call to setusers succeeds; the last call fails with EPERM because the unprivileged process attempts to add UID 5 to the supplementary UID list, which it does not currently have. Signed-off-by: Josh Triplett --- arch/x86/syscalls/syscall_32.tbl | 2 + arch/x86/syscalls/syscall_64.tbl | 2 + include/linux/cred.h | 66 +++ include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 6 +- include/uapi/linux/limits.h | 1 + init/Kconfig | 9 ++ kernel/cred.c | 4 + kernel/groups.c | 173 ++ kernel/sys.c | 21 +++-- kernel/sys_ni.c | 2 + 11 files changed, 280 insertions(+), 8 deletions(-) diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 9fe1b5d..55717d7 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -364,3 +364,5 @@ 355i386getrandom sys_getrandom 356i386memfd_createsys_memfd_create 357i386bpf sys_bpf +358i386getuserssys_getusers +359i386setuserssys_setusers diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 281150b..5572e67 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -328,6 +328,8 @@ 319common memfd_createsys_memfd_create 320common kexec_file_load sys_kexec_file_load 321common bpf sys_bpf +322common getuserssys_getusers +323common setuserssys_setusers # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/cred.h b/include/linux/cred.h index b2d0820..31169fe 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -76,6 +76,8 @@ extern int groups_search(const struct group_info *, kgid_t); extern int in_group_p(kgid_t); extern int in_egroup_p(kgid_t); +struct user_info; + /* * The security context of a task * @@ -135,6 +137,12 @@ struct cred { struct user_struct *user; /* real user ID subscription */ struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */ struct group_info *group_info; /* supplementary groups for euid/fsgid */ +#ifdef CONFIG_SUPPLEMENTARY_UIDS + struct user_info *user_info;/* supplementary users */ +#define INIT_USER_INFO .user_info = _users, +#else +#define INIT_USER_INFO +#endif struct rcu_head rcu;/* RCU deletion hook */ }; @@ -381,4 +389,62 @@ do { \ *(_fsgid) = __cred->fsgid; \ } while(0) +#ifdef CONFIG_SUPPLEMENTARY_UIDS +struct user_info { + atomic_tusage; + int nusers; + int nblocks; + kuid_t small_block[NGROU
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote: > Josh Triplett writes: > > On November 15, 2014 6:05:11 PM PST, Theodore Ts'o wrote: > >>On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote: > >>> > However, sudoers seems to allow negative group matches. So maybe > >>> > allowing this only with no_new_privs already set would make sense. > >>> > >>> Sigh, bad sudo. Sure, restricting this to no_new_privs only seems > >>fine. > >>> I'll do that in v2, and document that in the manpage. > >> > >>I've also seen use cases (generally back in the bad old days of big > >>timesharing VAX 750's :-) where the system admin might assign someone > >>to the "games-abusers" group, and then set /usr/games to mode 705 > >>root:games-abusers --- presumably because it's easier to add a few > >>people to the deny list rather than having to add all of the EECS > >>department to the games group minus the abusers. > >> > >>So arbitrarily anyone to drop groups from their supplemental group > >>list will result in a change from both existing practice and legacy > >>Unix systems, and it could potentially lead to a security exposure. > > > > As Andy pointed out, you can already do that with a user namespace, > > That may be a bug with the user namespace permission check. Perhaps we > shouldn't allow dropping groups that aren't mapped in the user > namespace. Changing that would break containers for users on many common Linux distributions, which put users in various supplementary groups by default. I do actually have another patch planned, to allow users to set up gid maps for groups they currently have permission for, not just for their primary group. But even with that change, breaking the ability for container-root to use setgroups to drop all its supplementary groups seems very wrong, and seems highly likely to break real-world software running in containers. > To add to the discussion there are also sg and newgroup, though I don't > think they touch supplementary groups. Both sg and newgrp do prove a different point, though: an unprivileged user should be able to change their gid to one of their supplementary groups. That's another patch I planned to submit after this one: allow setgid/setresgid/etc to a supplementary group ID. > > for any case not involving a setuid or setgid (or otherwise > > privilege-gaining) program. And requiring no_new_privs handles that. > > Frankly adding a no_new_privs check to allow something/anything scares > me. That converts no_new_privs from something simple and easy to > understand to something much more complicated. no_new_privs already exists in large part to support such use cases, such as setting seccomp filters, which would be dangerous for privilege-escalating programs to inherit. > > Given the combination of those two things, do you still see any > > problematic cases? > > Josh I think it is time to stop and make certain you understand how > unix groups work in the large. > > It seems to me that either supplementary groups can be dropped or > supplementary groups can not be dropped. If they can be dropped we > shouldn't need complicated checks to allow them to be dropped in some > circumstances but not others. > > Unix groups are an old interface and they have been used in a lot of > ways over the years. We need to be careful any change we make is good > and truly backwards compatible and that we do our darndest not to > introduce security holes. Seeking the decades-old precedents and corner cases that might contradict the most obvious semantics is a large part of what I'm looking for out of this thread. However, I don't necessarily think that we must permit unprivileged setgroups in *all* cases if we permit it in *any*; it may well make sense to reduce the surface area of the change by limiting the circumstances in which you can do this. Whether it does end up making sense to do so depends on the outcome of this discussion. I added the no_new_privs check at Andy's suggestion, since the use case I had in mind will work just fine with that restriction. I'm also open to just allowing setgroups in all cases, if that semantic seems preferable, or to adding some separate check specific to this case if that really seems necessary. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 10:40:06PM -0500, Theodore Ts'o wrote: > On Sat, Nov 15, 2014 at 06:35:05PM -0800, Josh Triplett wrote: > > >So arbitrarily anyone to drop groups from their supplemental group > > >list will result in a change from both existing practice and legacy > > >Unix systems, and it could potentially lead to a security exposure. > > > > As Andy pointed out, you can already do that with a user namespace, > > for any case not involving a setuid or setgid (or otherwise > > privilege-gaining) program. And requiring no_new_privs handles > > that. > > Well, it's no worse than what we can do already with the user > namespace, yes. I'm still worried it's going to come as a surprise > for some configurations because it's a change from what was allowed > historically. Then again, pretty much all of the tripwire and rootkit > scanners won't notice a "setuid" program that uses capabilities > instead of the traditional setuid bit, and most sysadmins won't think > to check for an executable with a forced capability mask, so this > isn't exactly a new problem We certainly have introduced orthogonal APIs in various areas before, such that applications written prior to those APIs may interact interestingly with them; we don't allow *breaking* those applications, or introducing security holes, but the existence of applications designed to block one particular way to do something doesn't *automatically* rule out the possibility of adding another way to do it. It does require some careful thought, though. When we introduced seccomp filters for syscalls, we tied them to no_new_privs to prevent any possible security holes caused by selective syscall denial/filtration. In this case, I'm indifferent about whether unprivileged setgroups works without no_new_privs; if people are comfortable with that, fine, and if people would prefer no_new_privs (or for that matter a sysctl, a compile-time option, or any other means of making the behavior optional), I can do that too. The security model of "having a group gives you less privilege than not having it" seems crazy, but nonetheless I can see a couple of easy ways that we can avoid breaking that pattern, no_new_privs being one of them. I'd like to make sure that nobody sees any other real-world corner case that unprivileged setgroups would break. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On November 15, 2014 6:05:11 PM PST, Theodore Ts'o wrote: >On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote: >> > However, sudoers seems to allow negative group matches. So maybe >> > allowing this only with no_new_privs already set would make sense. >> >> Sigh, bad sudo. Sure, restricting this to no_new_privs only seems >fine. >> I'll do that in v2, and document that in the manpage. > >I've also seen use cases (generally back in the bad old days of big >timesharing VAX 750's :-) where the system admin might assign someone >to the "games-abusers" group, and then set /usr/games to mode 705 >root:games-abusers --- presumably because it's easier to add a few >people to the deny list rather than having to add all of the EECS >department to the games group minus the abusers. > >So arbitrarily anyone to drop groups from their supplemental group >list will result in a change from both existing practice and legacy >Unix systems, and it could potentially lead to a security exposure. As Andy pointed out, you can already do that with a user namespace, for any case not involving a setuid or setgid (or otherwise privilege-gaining) program. And requiring no_new_privs handles that. Given the combination of those two things, do you still see any problematic cases? - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv3 1/2] groups: Factor out a function to set a pre-sorted group list
This way, functions that already need to sort the group list need not do so twice. The new set_groups_sorted is intentionally not exported. Signed-off-by: Josh Triplett --- v2, v3: No changes to patch 1/2. kernel/groups.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 451698f..f0667e7 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, kgid_t grp) } /** + * set_groups_sorted - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install; must be sorted + */ +static void set_groups_sorted(struct cred *new, struct group_info *group_info) +{ + put_group_info(new->group_info); + get_group_info(group_info); + new->group_info = group_info; +} + +/** * set_groups - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter * @group_info: The group list to install */ void set_groups(struct cred *new, struct group_info *group_info) { - put_group_info(new->group_info); groups_sort(group_info); - get_group_info(group_info); - new->group_info = group_info; + set_groups_sorted(new, group_info); } EXPORT_SYMBOL(set_groups); -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] getgroups.2: Document unprivileged setgroups calls
Signed-off-by: Josh Triplett --- v3: Document use of gid/egid/sgid. v2: Document requirement for no_new_privs. (If this doesn't end up going into 3.18, the version number in the patch will need updating.) man2/getgroups.2 | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/man2/getgroups.2 b/man2/getgroups.2 index 373c204..e2b834e 100644 --- a/man2/getgroups.2 +++ b/man2/getgroups.2 @@ -81,9 +81,16 @@ to be used in a further call to .PP .BR setgroups () sets the supplementary group IDs for the calling process. -Appropriate privileges (Linux: the +A process with the .B CAP_SETGID -capability) are required. +capability may change its supplementary group IDs arbitrarily. +As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS (see +.BR prctl (2)) +may drop supplementary groups, or add any of the current real UID, the current +effective UID, or the current saved set-user-ID; adding any other group ID +requires the +.B CAP_SETGID +capability. The .I size argument specifies the number of supplementary group IDs -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv3 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have (either as a supplementary group, or as its gid, egid, or sgid). Since some privileged programs (such as sudo) allow tests for non-membership in a group, require no_new_privs to drop group privileges. The kernel already maintains the list of supplementary group IDs in sorted order, and setgroups already needs to sort the new list, so this just requires a linear comparison of the two sorted lists. This moves the CAP_SETGID test from setgroups into set_current_groups. Tested via the following test program: void run_id(void) { pid_t p = fork(); switch (p) { case -1: err(1, "fork"); case 0: execl("/usr/bin/id", "id", NULL); err(1, "exec"); default: if (waitpid(p, NULL, 0) < 0) err(1, "waitpid"); } } int main(void) { gid_t list1[] = { 1, 2, 3, 4, 5 }; gid_t list2[] = { 2, 3, 4 }; run_id(); if (setgroups(5, list1) < 0) err(1, "setgroups 1"); run_id(); if (setresgid(1, 1, 1) < 0) err(1, "setresgid"); if (setresuid(1, 1, 1) < 0) err(1, "setresuid"); run_id(); if (setgroups(3, list2) < 0) err(1, "setgroups 2"); run_id(); if (setgroups(5, list1) < 0) err(1, "setgroups 3"); run_id(); return 0; } Without this patch, the test program gets EPERM from the second setgroups call, after dropping root privileges. With this patch, the test program successfully drops groups 1 and 5, but then gets EPERM from the third setgroups call, since that call attempts to add groups the process does not currently have. Signed-off-by: Josh Triplett --- v3: Allow gid, egid, or sgid. v2: Require no_new_privs. kernel/groups.c | 42 +++--- kernel/uid16.c | 2 -- 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index f0667e7..5114155 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -153,6 +153,37 @@ int groups_search(const struct group_info *group_info, kgid_t grp) return 0; } +/* Return true if the group_info is a subset of the group_info of the specified + * credentials. Also allow the first group_info to contain the gid, egid, or + * sgid of the credentials. + */ +static bool group_subset(const struct group_info *g1, +const struct cred *cred2) +{ + const struct group_info *g2 = cred2->group_info; + unsigned int i, j; + + for (i = 0, j = 0; i < g1->ngroups; i++) { + kgid_t gid1 = GROUP_AT(g1, i); + kgid_t gid2; + for (; j < g2->ngroups; j++) { + gid2 = GROUP_AT(g2, j); + if (gid_lte(gid1, gid2)) + break; + } + if (j >= g2->ngroups || !gid_eq(gid1, gid2)) { + if (!gid_eq(gid1, cred2->gid) + && !gid_eq(gid1, cred2->egid) + && !gid_eq(gid1, cred2->sgid)) + return false; + } else { + j++; + } + } + + return true; +} + /** * set_groups_sorted - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter @@ -189,11 +220,18 @@ int set_current_groups(struct group_info *group_info) { struct cred *new; + groups_sort(group_info); new = prepare_creds(); if (!new) return -ENOMEM; + if (!(ns_capable(current_user_ns(), CAP_SETGID) + || (task_no_new_privs(current) + && group_subset(group_info, new { + abort_creds(new); + return -EPERM; + } - set_groups(new, group_info); + set_groups_sorted(new, group_info); return commit_creds(new); } @@ -233,8 +271,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bb..b27e167 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, group
[PATCHv2 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have. Since some privileged programs (such as sudo) allow tests for non-membership in a group, require no_new_privs to drop group privileges. The kernel already maintains the list of supplementary group IDs in sorted order, and setgroups already needs to sort the new list, so this just requires a linear comparison of the two sorted lists. This moves the CAP_SETGID test from setgroups into set_current_groups. Tested via the following test program: void run_id(void) { pid_t p = fork(); switch (p) { case -1: err(1, "fork"); case 0: execl("/usr/bin/id", "id", NULL); err(1, "exec"); default: if (waitpid(p, NULL, 0) < 0) err(1, "waitpid"); } } int main(void) { gid_t list1[] = { 1, 2, 3, 4, 5 }; gid_t list2[] = { 2, 3, 4 }; run_id(); if (setgroups(5, list1) < 0) err(1, "setgroups 1"); run_id(); if (setresgid(1, 1, 1) < 0) err(1, "setresgid"); if (setresuid(1, 1, 1) < 0) err(1, "setresuid"); run_id(); if (setgroups(3, list2) < 0) err(1, "setgroups 2"); run_id(); if (setgroups(5, list1) < 0) err(1, "setgroups 3"); run_id(); return 0; } Without this patch, the test program gets EPERM from the second setgroups call, after dropping root privileges. With this patch, the test program successfully drops groups 1 and 5, but then gets EPERM from the third setgroups call, since that call attempts to add groups the process does not currently have. Signed-off-by: Josh Triplett --- v2: Require no_new_privs. kernel/groups.c | 34 +++--- kernel/uid16.c | 2 -- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index f0667e7..f7fb8dd 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, kgid_t grp) return 0; } +/* Compare two sorted group lists; return true if the first is a subset of the + * second. + */ +static bool is_subset(const struct group_info *g1, const struct group_info *g2) +{ + unsigned int i, j; + + for (i = 0, j = 0; i < g1->ngroups; i++) { + kgid_t gid1 = GROUP_AT(g1, i); + kgid_t gid2; + for (; j < g2->ngroups; j++) { + gid2 = GROUP_AT(g2, j); + if (gid_lte(gid1, gid2)) + break; + } + if (j >= g2->ngroups || !gid_eq(gid1, gid2)) + return false; + j++; + } + + return true; +} + /** * set_groups_sorted - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter @@ -189,11 +212,18 @@ int set_current_groups(struct group_info *group_info) { struct cred *new; + groups_sort(group_info); new = prepare_creds(); if (!new) return -ENOMEM; + if (!(ns_capable(current_user_ns(), CAP_SETGID) + || (task_no_new_privs(current) + && is_subset(group_info, new->group_info { + abort_creds(new); + return -EPERM; + } - set_groups(new, group_info); + set_groups_sorted(new, group_info); return commit_creds(new); } @@ -233,8 +263,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bb..b27e167 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 manpages] getgroups.2: Document unprivileged setgroups calls
Signed-off-by: Josh Triplett --- v2: Document requirement for no_new_privs. (If this doesn't end up going into 3.18, the version number in the patch will need updating.) man2/getgroups.2 | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/man2/getgroups.2 b/man2/getgroups.2 index 373c204..3f3d330 100644 --- a/man2/getgroups.2 +++ b/man2/getgroups.2 @@ -81,9 +81,11 @@ to be used in a further call to .PP .BR setgroups () sets the supplementary group IDs for the calling process. -Appropriate privileges (Linux: the +As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS may drop +supplementary groups, but may not add new groups. Adding groups, or making any +change at all without no_new_privs enabled, requires the .B CAP_SETGID -capability) are required. +capability. The .I size argument specifies the number of supplementary group IDs -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] groups: Factor out a function to set a pre-sorted group list
This way, functions that already need to sort the group list need not do so twice. The new set_groups_sorted is intentionally not exported. Signed-off-by: Josh Triplett --- kernel/groups.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 451698f..f0667e7 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, kgid_t grp) } /** + * set_groups_sorted - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install; must be sorted + */ +static void set_groups_sorted(struct cred *new, struct group_info *group_info) +{ + put_group_info(new->group_info); + get_group_info(group_info); + new->group_info = group_info; +} + +/** * set_groups - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter * @group_info: The group list to install */ void set_groups(struct cred *new, struct group_info *group_info) { - put_group_info(new->group_info); groups_sort(group_info); - get_group_info(group_info); - new->group_info = group_info; + set_groups_sorted(new, group_info); } EXPORT_SYMBOL(set_groups); -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 12:06:20PM -0800, Andy Lutomirski wrote: > On Sat, Nov 15, 2014 at 11:29 AM, Josh Triplett wrote: > > On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote: > >> Josh Triplett writes: > >> > >> > Currently, unprivileged processes (without CAP_SETGID) cannot call > >> > setgroups at all. In particular, processes with a set of supplementary > >> > groups cannot further drop permissions without obtaining elevated > >> > permissions first. > >> > > >> > Allow unprivileged processes to call setgroups with a subset of their > >> > current groups; only require CAP_SETGID to add a group the process does > >> > not currently have. > >> > >> A couple of questions. > >> - Is there precedence in other unix flavors for this? > > > > I found a few references to now-nonexistent pages at MIT about a system > > with this property, but other than that no. > > > > I've also found more than a few references to people wanting this > > functionality. > > > >> - What motiviates this change? > > > > I have a series of patches planned to add more ways to drop elevated > > privileges without requiring a transition through root to do so. That > > would improve the ability for unprivileged users to run programs > > sandboxed with even *less* privileges. (Among other things, that would > > also allow programs running with no_new_privs to further *reduce* their > > privileges, which they can't currently do in this case.) > > > >> - Have you looked to see if anything might for bug compatibilty > >> require applications not to be able to drop supplementary groups? > > > > I haven't found any such case; that doesn't mean such a case does not > > exist. Feedback welcome. > > > > The only case I can think of (and I don't know of any examples of such a > > system): some kind of quota system that limits the members of a group to > > a certain amount of storage, but places no such limit on non-members. > > > > However, the idea of *holding* a credential (a supplementary group ID) > > giving *less* privileges, and *dropping* a credential giving *more* > > privileges, would completely invert normal security models. (The sane > > way to design such a system would be to have a privileged group for > > "users who can exceed the quota".) > > Agreed. And, if you want to bypass quotas by dropping a supplementary > group, you already can by unsharing your user namespace. Good point! Given that a process can run with a new user namespace and no other namespaces, and then drop all its other privileges that way, the ability to drop privileges without using a user namespace seems completely harmless, with one exception that you noted: > However, sudoers seems to allow negative group matches. So maybe > allowing this only with no_new_privs already set would make sense. Sigh, bad sudo. Sure, restricting this to no_new_privs only seems fine. I'll do that in v2, and document that in the manpage. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote: > Josh Triplett writes: > > > Currently, unprivileged processes (without CAP_SETGID) cannot call > > setgroups at all. In particular, processes with a set of supplementary > > groups cannot further drop permissions without obtaining elevated > > permissions first. > > > > Allow unprivileged processes to call setgroups with a subset of their > > current groups; only require CAP_SETGID to add a group the process does > > not currently have. > > A couple of questions. > - Is there precedence in other unix flavors for this? I found a few references to now-nonexistent pages at MIT about a system with this property, but other than that no. I've also found more than a few references to people wanting this functionality. > - What motiviates this change? I have a series of patches planned to add more ways to drop elevated privileges without requiring a transition through root to do so. That would improve the ability for unprivileged users to run programs sandboxed with even *less* privileges. (Among other things, that would also allow programs running with no_new_privs to further *reduce* their privileges, which they can't currently do in this case.) > - Have you looked to see if anything might for bug compatibilty > require applications not to be able to drop supplementary groups? I haven't found any such case; that doesn't mean such a case does not exist. Feedback welcome. The only case I can think of (and I don't know of any examples of such a system): some kind of quota system that limits the members of a group to a certain amount of storage, but places no such limit on non-members. However, the idea of *holding* a credential (a supplementary group ID) giving *less* privileges, and *dropping* a credential giving *more* privileges, would completely invert normal security models. (The sane way to design such a system would be to have a privileged group for "users who can exceed the quota".) If it turns out that a real case exists that people care about, I could easily make this configurable, either at compile time or via a sysctl. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH manpages] getgroups.2: Document unprivileged setgroups calls
Signed-off-by: Josh Triplett --- This should probably also include appropriate documentation for what kernel introduces this behavior. man2/getgroups.2 | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/man2/getgroups.2 b/man2/getgroups.2 index 373c204..edca37c 100644 --- a/man2/getgroups.2 +++ b/man2/getgroups.2 @@ -81,9 +81,10 @@ to be used in a further call to .PP .BR setgroups () sets the supplementary group IDs for the calling process. -Appropriate privileges (Linux: the +Any process may drop groups from its list, but adding groups requires +appropriate privileges (Linux: the .B CAP_SETGID -capability) are required. +capability). The .I size argument specifies the number of supplementary group IDs -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] groups: Factor out a function to set a pre-sorted group list
This way, functions that already need to sort the group list need not do so twice. The new set_groups_sorted is intentionally not exported. Signed-off-by: Josh Triplett --- kernel/groups.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 451698f..f0667e7 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, kgid_t grp) } /** + * set_groups_sorted - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install; must be sorted + */ +static void set_groups_sorted(struct cred *new, struct group_info *group_info) +{ + put_group_info(new->group_info); + get_group_info(group_info); + new->group_info = group_info; +} + +/** * set_groups - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter * @group_info: The group list to install */ void set_groups(struct cred *new, struct group_info *group_info) { - put_group_info(new->group_info); groups_sort(group_info); - get_group_info(group_info); - new->group_info = group_info; + set_groups_sorted(new, group_info); } EXPORT_SYMBOL(set_groups); -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have. The kernel already maintains the list of supplementary group IDs in sorted order, and setgroups already needs to sort the new list, so this just requires a linear comparison of the two sorted lists. This moves the CAP_SETGID test from setgroups into set_current_groups. Tested via the following test program: #include #include #include #include #include void run_id(void) { pid_t p = fork(); switch (p) { case -1: err(1, "fork"); case 0: execl("/usr/bin/id", "id", NULL); err(1, "exec"); default: if (waitpid(p, NULL, 0) < 0) err(1, "waitpid"); } } int main(void) { gid_t list1[] = { 1, 2, 3, 4, 5 }; gid_t list2[] = { 2, 3, 4 }; run_id(); if (setgroups(5, list1) < 0) err(1, "setgroups 1"); run_id(); if (setresgid(1, 1, 1) < 0) err(1, "setresgid"); if (setresuid(1, 1, 1) < 0) err(1, "setresuid"); run_id(); if (setgroups(3, list2) < 0) err(1, "setgroups 2"); run_id(); if (setgroups(5, list1) < 0) err(1, "setgroups 3"); run_id(); return 0; } Without this patch, the test program gets EPERM from the second setgroups call, after dropping root privileges. With this patch, the test program successfully drops groups 1 and 5, but then gets EPERM from the third setgroups call, since that call attempts to add groups the process does not currently have. Signed-off-by: Josh Triplett --- kernel/groups.c | 33 ++--- kernel/uid16.c | 2 -- 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index f0667e7..fe7367d 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, kgid_t grp) return 0; } +/* Compare two sorted group lists; return true if the first is a subset of the + * second. + */ +static bool is_subset(const struct group_info *g1, const struct group_info *g2) +{ + unsigned int i, j; + + for (i = 0, j = 0; i < g1->ngroups; i++) { + kgid_t gid1 = GROUP_AT(g1, i); + kgid_t gid2; + for (; j < g2->ngroups; j++) { + gid2 = GROUP_AT(g2, j); + if (gid_lte(gid1, gid2)) + break; + } + if (j >= g2->ngroups || !gid_eq(gid1, gid2)) + return false; + j++; + } + + return true; +} + /** * set_groups_sorted - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter @@ -189,11 +212,17 @@ int set_current_groups(struct group_info *group_info) { struct cred *new; + groups_sort(group_info); new = prepare_creds(); if (!new) return -ENOMEM; + if (!ns_capable(current_user_ns(), CAP_SETGID) + && !is_subset(group_info, new->group_info)) { + abort_creds(new); + return -EPERM; + } - set_groups(new, group_info); + set_groups_sorted(new, group_info); return commit_creds(new); } @@ -233,8 +262,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bb..b27e167 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] groups: Factor out a function to set a pre-sorted group list
This way, functions that already need to sort the group list need not do so twice. The new set_groups_sorted is intentionally not exported. Signed-off-by: Josh Triplett j...@joshtriplett.org --- kernel/groups.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 451698f..f0667e7 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, kgid_t grp) } /** + * set_groups_sorted - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install; must be sorted + */ +static void set_groups_sorted(struct cred *new, struct group_info *group_info) +{ + put_group_info(new-group_info); + get_group_info(group_info); + new-group_info = group_info; +} + +/** * set_groups - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter * @group_info: The group list to install */ void set_groups(struct cred *new, struct group_info *group_info) { - put_group_info(new-group_info); groups_sort(group_info); - get_group_info(group_info); - new-group_info = group_info; + set_groups_sorted(new, group_info); } EXPORT_SYMBOL(set_groups); -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have. The kernel already maintains the list of supplementary group IDs in sorted order, and setgroups already needs to sort the new list, so this just requires a linear comparison of the two sorted lists. This moves the CAP_SETGID test from setgroups into set_current_groups. Tested via the following test program: #include err.h #include grp.h #include stdio.h #include sys/types.h #include unistd.h void run_id(void) { pid_t p = fork(); switch (p) { case -1: err(1, fork); case 0: execl(/usr/bin/id, id, NULL); err(1, exec); default: if (waitpid(p, NULL, 0) 0) err(1, waitpid); } } int main(void) { gid_t list1[] = { 1, 2, 3, 4, 5 }; gid_t list2[] = { 2, 3, 4 }; run_id(); if (setgroups(5, list1) 0) err(1, setgroups 1); run_id(); if (setresgid(1, 1, 1) 0) err(1, setresgid); if (setresuid(1, 1, 1) 0) err(1, setresuid); run_id(); if (setgroups(3, list2) 0) err(1, setgroups 2); run_id(); if (setgroups(5, list1) 0) err(1, setgroups 3); run_id(); return 0; } Without this patch, the test program gets EPERM from the second setgroups call, after dropping root privileges. With this patch, the test program successfully drops groups 1 and 5, but then gets EPERM from the third setgroups call, since that call attempts to add groups the process does not currently have. Signed-off-by: Josh Triplett j...@joshtriplett.org --- kernel/groups.c | 33 ++--- kernel/uid16.c | 2 -- 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index f0667e7..fe7367d 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, kgid_t grp) return 0; } +/* Compare two sorted group lists; return true if the first is a subset of the + * second. + */ +static bool is_subset(const struct group_info *g1, const struct group_info *g2) +{ + unsigned int i, j; + + for (i = 0, j = 0; i g1-ngroups; i++) { + kgid_t gid1 = GROUP_AT(g1, i); + kgid_t gid2; + for (; j g2-ngroups; j++) { + gid2 = GROUP_AT(g2, j); + if (gid_lte(gid1, gid2)) + break; + } + if (j = g2-ngroups || !gid_eq(gid1, gid2)) + return false; + j++; + } + + return true; +} + /** * set_groups_sorted - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter @@ -189,11 +212,17 @@ int set_current_groups(struct group_info *group_info) { struct cred *new; + groups_sort(group_info); new = prepare_creds(); if (!new) return -ENOMEM; + if (!ns_capable(current_user_ns(), CAP_SETGID) +!is_subset(group_info, new-group_info)) { + abort_creds(new); + return -EPERM; + } - set_groups(new, group_info); + set_groups_sorted(new, group_info); return commit_creds(new); } @@ -233,8 +262,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bb..b27e167 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize NGROUPS_MAX) return -EINVAL; -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH manpages] getgroups.2: Document unprivileged setgroups calls
Signed-off-by: Josh Triplett j...@joshtriplett.org --- This should probably also include appropriate documentation for what kernel introduces this behavior. man2/getgroups.2 | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/man2/getgroups.2 b/man2/getgroups.2 index 373c204..edca37c 100644 --- a/man2/getgroups.2 +++ b/man2/getgroups.2 @@ -81,9 +81,10 @@ to be used in a further call to .PP .BR setgroups () sets the supplementary group IDs for the calling process. -Appropriate privileges (Linux: the +Any process may drop groups from its list, but adding groups requires +appropriate privileges (Linux: the .B CAP_SETGID -capability) are required. +capability). The .I size argument specifies the number of supplementary group IDs -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote: Josh Triplett j...@joshtriplett.org writes: Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have. A couple of questions. - Is there precedence in other unix flavors for this? I found a few references to now-nonexistent pages at MIT about a system with this property, but other than that no. I've also found more than a few references to people wanting this functionality. - What motiviates this change? I have a series of patches planned to add more ways to drop elevated privileges without requiring a transition through root to do so. That would improve the ability for unprivileged users to run programs sandboxed with even *less* privileges. (Among other things, that would also allow programs running with no_new_privs to further *reduce* their privileges, which they can't currently do in this case.) - Have you looked to see if anything might for bug compatibilty require applications not to be able to drop supplementary groups? I haven't found any such case; that doesn't mean such a case does not exist. Feedback welcome. The only case I can think of (and I don't know of any examples of such a system): some kind of quota system that limits the members of a group to a certain amount of storage, but places no such limit on non-members. However, the idea of *holding* a credential (a supplementary group ID) giving *less* privileges, and *dropping* a credential giving *more* privileges, would completely invert normal security models. (The sane way to design such a system would be to have a privileged group for users who can exceed the quota.) If it turns out that a real case exists that people care about, I could easily make this configurable, either at compile time or via a sysctl. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 12:06:20PM -0800, Andy Lutomirski wrote: On Sat, Nov 15, 2014 at 11:29 AM, Josh Triplett j...@joshtriplett.org wrote: On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote: Josh Triplett j...@joshtriplett.org writes: Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have. A couple of questions. - Is there precedence in other unix flavors for this? I found a few references to now-nonexistent pages at MIT about a system with this property, but other than that no. I've also found more than a few references to people wanting this functionality. - What motiviates this change? I have a series of patches planned to add more ways to drop elevated privileges without requiring a transition through root to do so. That would improve the ability for unprivileged users to run programs sandboxed with even *less* privileges. (Among other things, that would also allow programs running with no_new_privs to further *reduce* their privileges, which they can't currently do in this case.) - Have you looked to see if anything might for bug compatibilty require applications not to be able to drop supplementary groups? I haven't found any such case; that doesn't mean such a case does not exist. Feedback welcome. The only case I can think of (and I don't know of any examples of such a system): some kind of quota system that limits the members of a group to a certain amount of storage, but places no such limit on non-members. However, the idea of *holding* a credential (a supplementary group ID) giving *less* privileges, and *dropping* a credential giving *more* privileges, would completely invert normal security models. (The sane way to design such a system would be to have a privileged group for users who can exceed the quota.) Agreed. And, if you want to bypass quotas by dropping a supplementary group, you already can by unsharing your user namespace. Good point! Given that a process can run with a new user namespace and no other namespaces, and then drop all its other privileges that way, the ability to drop privileges without using a user namespace seems completely harmless, with one exception that you noted: However, sudoers seems to allow negative group matches. So maybe allowing this only with no_new_privs already set would make sense. Sigh, bad sudo. Sure, restricting this to no_new_privs only seems fine. I'll do that in v2, and document that in the manpage. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] groups: Factor out a function to set a pre-sorted group list
This way, functions that already need to sort the group list need not do so twice. The new set_groups_sorted is intentionally not exported. Signed-off-by: Josh Triplett j...@joshtriplett.org --- kernel/groups.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 451698f..f0667e7 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, kgid_t grp) } /** + * set_groups_sorted - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install; must be sorted + */ +static void set_groups_sorted(struct cred *new, struct group_info *group_info) +{ + put_group_info(new-group_info); + get_group_info(group_info); + new-group_info = group_info; +} + +/** * set_groups - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter * @group_info: The group list to install */ void set_groups(struct cred *new, struct group_info *group_info) { - put_group_info(new-group_info); groups_sort(group_info); - get_group_info(group_info); - new-group_info = group_info; + set_groups_sorted(new, group_info); } EXPORT_SYMBOL(set_groups); -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have. Since some privileged programs (such as sudo) allow tests for non-membership in a group, require no_new_privs to drop group privileges. The kernel already maintains the list of supplementary group IDs in sorted order, and setgroups already needs to sort the new list, so this just requires a linear comparison of the two sorted lists. This moves the CAP_SETGID test from setgroups into set_current_groups. Tested via the following test program: void run_id(void) { pid_t p = fork(); switch (p) { case -1: err(1, fork); case 0: execl(/usr/bin/id, id, NULL); err(1, exec); default: if (waitpid(p, NULL, 0) 0) err(1, waitpid); } } int main(void) { gid_t list1[] = { 1, 2, 3, 4, 5 }; gid_t list2[] = { 2, 3, 4 }; run_id(); if (setgroups(5, list1) 0) err(1, setgroups 1); run_id(); if (setresgid(1, 1, 1) 0) err(1, setresgid); if (setresuid(1, 1, 1) 0) err(1, setresuid); run_id(); if (setgroups(3, list2) 0) err(1, setgroups 2); run_id(); if (setgroups(5, list1) 0) err(1, setgroups 3); run_id(); return 0; } Without this patch, the test program gets EPERM from the second setgroups call, after dropping root privileges. With this patch, the test program successfully drops groups 1 and 5, but then gets EPERM from the third setgroups call, since that call attempts to add groups the process does not currently have. Signed-off-by: Josh Triplett j...@joshtriplett.org --- v2: Require no_new_privs. kernel/groups.c | 34 +++--- kernel/uid16.c | 2 -- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index f0667e7..f7fb8dd 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, kgid_t grp) return 0; } +/* Compare two sorted group lists; return true if the first is a subset of the + * second. + */ +static bool is_subset(const struct group_info *g1, const struct group_info *g2) +{ + unsigned int i, j; + + for (i = 0, j = 0; i g1-ngroups; i++) { + kgid_t gid1 = GROUP_AT(g1, i); + kgid_t gid2; + for (; j g2-ngroups; j++) { + gid2 = GROUP_AT(g2, j); + if (gid_lte(gid1, gid2)) + break; + } + if (j = g2-ngroups || !gid_eq(gid1, gid2)) + return false; + j++; + } + + return true; +} + /** * set_groups_sorted - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter @@ -189,11 +212,18 @@ int set_current_groups(struct group_info *group_info) { struct cred *new; + groups_sort(group_info); new = prepare_creds(); if (!new) return -ENOMEM; + if (!(ns_capable(current_user_ns(), CAP_SETGID) + || (task_no_new_privs(current) + is_subset(group_info, new-group_info { + abort_creds(new); + return -EPERM; + } - set_groups(new, group_info); + set_groups_sorted(new, group_info); return commit_creds(new); } @@ -233,8 +263,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bb..b27e167 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize NGROUPS_MAX) return -EINVAL; -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 manpages] getgroups.2: Document unprivileged setgroups calls
Signed-off-by: Josh Triplett j...@joshtriplett.org --- v2: Document requirement for no_new_privs. (If this doesn't end up going into 3.18, the version number in the patch will need updating.) man2/getgroups.2 | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/man2/getgroups.2 b/man2/getgroups.2 index 373c204..3f3d330 100644 --- a/man2/getgroups.2 +++ b/man2/getgroups.2 @@ -81,9 +81,11 @@ to be used in a further call to .PP .BR setgroups () sets the supplementary group IDs for the calling process. -Appropriate privileges (Linux: the +As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS may drop +supplementary groups, but may not add new groups. Adding groups, or making any +change at all without no_new_privs enabled, requires the .B CAP_SETGID -capability) are required. +capability. The .I size argument specifies the number of supplementary group IDs -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv3 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
Currently, unprivileged processes (without CAP_SETGID) cannot call setgroups at all. In particular, processes with a set of supplementary groups cannot further drop permissions without obtaining elevated permissions first. Allow unprivileged processes to call setgroups with a subset of their current groups; only require CAP_SETGID to add a group the process does not currently have (either as a supplementary group, or as its gid, egid, or sgid). Since some privileged programs (such as sudo) allow tests for non-membership in a group, require no_new_privs to drop group privileges. The kernel already maintains the list of supplementary group IDs in sorted order, and setgroups already needs to sort the new list, so this just requires a linear comparison of the two sorted lists. This moves the CAP_SETGID test from setgroups into set_current_groups. Tested via the following test program: void run_id(void) { pid_t p = fork(); switch (p) { case -1: err(1, fork); case 0: execl(/usr/bin/id, id, NULL); err(1, exec); default: if (waitpid(p, NULL, 0) 0) err(1, waitpid); } } int main(void) { gid_t list1[] = { 1, 2, 3, 4, 5 }; gid_t list2[] = { 2, 3, 4 }; run_id(); if (setgroups(5, list1) 0) err(1, setgroups 1); run_id(); if (setresgid(1, 1, 1) 0) err(1, setresgid); if (setresuid(1, 1, 1) 0) err(1, setresuid); run_id(); if (setgroups(3, list2) 0) err(1, setgroups 2); run_id(); if (setgroups(5, list1) 0) err(1, setgroups 3); run_id(); return 0; } Without this patch, the test program gets EPERM from the second setgroups call, after dropping root privileges. With this patch, the test program successfully drops groups 1 and 5, but then gets EPERM from the third setgroups call, since that call attempts to add groups the process does not currently have. Signed-off-by: Josh Triplett j...@joshtriplett.org --- v3: Allow gid, egid, or sgid. v2: Require no_new_privs. kernel/groups.c | 42 +++--- kernel/uid16.c | 2 -- 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index f0667e7..5114155 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -153,6 +153,37 @@ int groups_search(const struct group_info *group_info, kgid_t grp) return 0; } +/* Return true if the group_info is a subset of the group_info of the specified + * credentials. Also allow the first group_info to contain the gid, egid, or + * sgid of the credentials. + */ +static bool group_subset(const struct group_info *g1, +const struct cred *cred2) +{ + const struct group_info *g2 = cred2-group_info; + unsigned int i, j; + + for (i = 0, j = 0; i g1-ngroups; i++) { + kgid_t gid1 = GROUP_AT(g1, i); + kgid_t gid2; + for (; j g2-ngroups; j++) { + gid2 = GROUP_AT(g2, j); + if (gid_lte(gid1, gid2)) + break; + } + if (j = g2-ngroups || !gid_eq(gid1, gid2)) { + if (!gid_eq(gid1, cred2-gid) +!gid_eq(gid1, cred2-egid) +!gid_eq(gid1, cred2-sgid)) + return false; + } else { + j++; + } + } + + return true; +} + /** * set_groups_sorted - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter @@ -189,11 +220,18 @@ int set_current_groups(struct group_info *group_info) { struct cred *new; + groups_sort(group_info); new = prepare_creds(); if (!new) return -ENOMEM; + if (!(ns_capable(current_user_ns(), CAP_SETGID) + || (task_no_new_privs(current) + group_subset(group_info, new { + abort_creds(new); + return -EPERM; + } - set_groups(new, group_info); + set_groups_sorted(new, group_info); return commit_creds(new); } @@ -233,8 +271,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bb..b27e167 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) - return -EPERM; if ((unsigned)gidsetsize NGROUPS_MAX
[PATCH] getgroups.2: Document unprivileged setgroups calls
Signed-off-by: Josh Triplett j...@joshtriplett.org --- v3: Document use of gid/egid/sgid. v2: Document requirement for no_new_privs. (If this doesn't end up going into 3.18, the version number in the patch will need updating.) man2/getgroups.2 | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/man2/getgroups.2 b/man2/getgroups.2 index 373c204..e2b834e 100644 --- a/man2/getgroups.2 +++ b/man2/getgroups.2 @@ -81,9 +81,16 @@ to be used in a further call to .PP .BR setgroups () sets the supplementary group IDs for the calling process. -Appropriate privileges (Linux: the +A process with the .B CAP_SETGID -capability) are required. +capability may change its supplementary group IDs arbitrarily. +As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS (see +.BR prctl (2)) +may drop supplementary groups, or add any of the current real UID, the current +effective UID, or the current saved set-user-ID; adding any other group ID +requires the +.B CAP_SETGID +capability. The .I size argument specifies the number of supplementary group IDs -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv3 1/2] groups: Factor out a function to set a pre-sorted group list
This way, functions that already need to sort the group list need not do so twice. The new set_groups_sorted is intentionally not exported. Signed-off-by: Josh Triplett j...@joshtriplett.org --- v2, v3: No changes to patch 1/2. kernel/groups.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 451698f..f0667e7 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, kgid_t grp) } /** + * set_groups_sorted - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install; must be sorted + */ +static void set_groups_sorted(struct cred *new, struct group_info *group_info) +{ + put_group_info(new-group_info); + get_group_info(group_info); + new-group_info = group_info; +} + +/** * set_groups - Change a group subscription in a set of credentials * @new: The newly prepared set of credentials to alter * @group_info: The group list to install */ void set_groups(struct cred *new, struct group_info *group_info) { - put_group_info(new-group_info); groups_sort(group_info); - get_group_info(group_info); - new-group_info = group_info; + set_groups_sorted(new, group_info); } EXPORT_SYMBOL(set_groups); -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On November 15, 2014 6:05:11 PM PST, Theodore Ts'o ty...@mit.edu wrote: On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote: However, sudoers seems to allow negative group matches. So maybe allowing this only with no_new_privs already set would make sense. Sigh, bad sudo. Sure, restricting this to no_new_privs only seems fine. I'll do that in v2, and document that in the manpage. I've also seen use cases (generally back in the bad old days of big timesharing VAX 750's :-) where the system admin might assign someone to the games-abusers group, and then set /usr/games to mode 705 root:games-abusers --- presumably because it's easier to add a few people to the deny list rather than having to add all of the EECS department to the games group minus the abusers. So arbitrarily anyone to drop groups from their supplemental group list will result in a change from both existing practice and legacy Unix systems, and it could potentially lead to a security exposure. As Andy pointed out, you can already do that with a user namespace, for any case not involving a setuid or setgid (or otherwise privilege-gaining) program. And requiring no_new_privs handles that. Given the combination of those two things, do you still see any problematic cases? - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 10:40:06PM -0500, Theodore Ts'o wrote: On Sat, Nov 15, 2014 at 06:35:05PM -0800, Josh Triplett wrote: So arbitrarily anyone to drop groups from their supplemental group list will result in a change from both existing practice and legacy Unix systems, and it could potentially lead to a security exposure. As Andy pointed out, you can already do that with a user namespace, for any case not involving a setuid or setgid (or otherwise privilege-gaining) program. And requiring no_new_privs handles that. Well, it's no worse than what we can do already with the user namespace, yes. I'm still worried it's going to come as a surprise for some configurations because it's a change from what was allowed historically. Then again, pretty much all of the tripwire and rootkit scanners won't notice a setuid program that uses capabilities instead of the traditional setuid bit, and most sysadmins won't think to check for an executable with a forced capability mask, so this isn't exactly a new problem We certainly have introduced orthogonal APIs in various areas before, such that applications written prior to those APIs may interact interestingly with them; we don't allow *breaking* those applications, or introducing security holes, but the existence of applications designed to block one particular way to do something doesn't *automatically* rule out the possibility of adding another way to do it. It does require some careful thought, though. When we introduced seccomp filters for syscalls, we tied them to no_new_privs to prevent any possible security holes caused by selective syscall denial/filtration. In this case, I'm indifferent about whether unprivileged setgroups works without no_new_privs; if people are comfortable with that, fine, and if people would prefer no_new_privs (or for that matter a sysctl, a compile-time option, or any other means of making the behavior optional), I can do that too. The security model of having a group gives you less privilege than not having it seems crazy, but nonetheless I can see a couple of easy ways that we can avoid breaking that pattern, no_new_privs being one of them. I'd like to make sure that nobody sees any other real-world corner case that unprivileged setgroups would break. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups
On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote: Josh Triplett j...@joshtriplett.org writes: On November 15, 2014 6:05:11 PM PST, Theodore Ts'o ty...@mit.edu wrote: On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote: However, sudoers seems to allow negative group matches. So maybe allowing this only with no_new_privs already set would make sense. Sigh, bad sudo. Sure, restricting this to no_new_privs only seems fine. I'll do that in v2, and document that in the manpage. I've also seen use cases (generally back in the bad old days of big timesharing VAX 750's :-) where the system admin might assign someone to the games-abusers group, and then set /usr/games to mode 705 root:games-abusers --- presumably because it's easier to add a few people to the deny list rather than having to add all of the EECS department to the games group minus the abusers. So arbitrarily anyone to drop groups from their supplemental group list will result in a change from both existing practice and legacy Unix systems, and it could potentially lead to a security exposure. As Andy pointed out, you can already do that with a user namespace, That may be a bug with the user namespace permission check. Perhaps we shouldn't allow dropping groups that aren't mapped in the user namespace. Changing that would break containers for users on many common Linux distributions, which put users in various supplementary groups by default. I do actually have another patch planned, to allow users to set up gid maps for groups they currently have permission for, not just for their primary group. But even with that change, breaking the ability for container-root to use setgroups to drop all its supplementary groups seems very wrong, and seems highly likely to break real-world software running in containers. To add to the discussion there are also sg and newgroup, though I don't think they touch supplementary groups. Both sg and newgrp do prove a different point, though: an unprivileged user should be able to change their gid to one of their supplementary groups. That's another patch I planned to submit after this one: allow setgid/setresgid/etc to a supplementary group ID. for any case not involving a setuid or setgid (or otherwise privilege-gaining) program. And requiring no_new_privs handles that. Frankly adding a no_new_privs check to allow something/anything scares me. That converts no_new_privs from something simple and easy to understand to something much more complicated. no_new_privs already exists in large part to support such use cases, such as setting seccomp filters, which would be dangerous for privilege-escalating programs to inherit. Given the combination of those two things, do you still see any problematic cases? Josh I think it is time to stop and make certain you understand how unix groups work in the large. It seems to me that either supplementary groups can be dropped or supplementary groups can not be dropped. If they can be dropped we shouldn't need complicated checks to allow them to be dropped in some circumstances but not others. Unix groups are an old interface and they have been used in a lot of ways over the years. We need to be careful any change we make is good and truly backwards compatible and that we do our darndest not to introduce security holes. Seeking the decades-old precedents and corner cases that might contradict the most obvious semantics is a large part of what I'm looking for out of this thread. However, I don't necessarily think that we must permit unprivileged setgroups in *all* cases if we permit it in *any*; it may well make sense to reduce the surface area of the change by limiting the circumstances in which you can do this. Whether it does end up making sense to do so depends on the outcome of this discussion. I added the no_new_privs check at Andy's suggestion, since the use case I had in mind will work just fine with that restriction. I'm also open to just allowing setgroups in all cases, if that semantic seems preferable, or to adding some separate check specific to this case if that really seems necessary. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH manpages] Document supplementary user IDs
Add new manpages for getusers(2) and setusers(2). Discuss supplementary UIDs in credentials(7). Update manpages for seteuid(2), setfsuid(2), setresuid(2), setreuid(2), and setuid(2). Signed-off-by: Josh Triplett j...@joshtriplett.org --- man2/getusers.2| 177 + man2/seteuid.2 | 10 ++- man2/setfsuid.2| 4 +- man2/setresuid.2 | 8 ++- man2/setreuid.2| 12 +++- man2/setuid.2 | 7 ++- man2/setusers.2| 1 + man7/credentials.7 | 21 ++- 8 files changed, 229 insertions(+), 11 deletions(-) create mode 100644 man2/getusers.2 create mode 100644 man2/setusers.2 diff --git a/man2/getusers.2 b/man2/getusers.2 new file mode 100644 index 000..e5dd371 --- /dev/null +++ b/man2/getusers.2 @@ -0,0 +1,177 @@ +.\ Copyright 2014 Josh Triplett j...@joshtriplett.org +.\ Based on getgroups.2: +.\ Copyright 1993 Rickard E. Faith (fa...@cs.unc.edu) +.\ +.\ %%%LICENSE_START(VERBATIM) +.\ Permission is granted to make and distribute verbatim copies of this +.\ manual provided the copyright notice and this permission notice are +.\ preserved on all copies. +.\ +.\ Permission is granted to copy and distribute modified versions of this +.\ manual under the conditions for verbatim copying, provided that the +.\ entire resulting derived work is distributed under the terms of a +.\ permission notice identical to this one. +.\ +.\ Since the Linux kernel and libraries are constantly changing, this +.\ manual page may be incorrect or out-of-date. The author(s) assume no +.\ responsibility for errors or omissions, or for damages resulting from +.\ the use of the information contained herein. The author(s) may not +.\ have taken the same level of care in the production of this manual, +.\ which is licensed free of charge, as they might when working +.\ professionally. +.\ +.\ Formatted or processed versions of this manual, if unaccompanied by +.\ the source, must acknowledge the copyright and authors of this work. +.\ %%%LICENSE_END +.\ +.\ Modified Thu Oct 31 12:04:29 1996 by Eric S. Raymond e...@thyrsus.com +.\ Modified, 27 May 2004, Michael Kerrisk mtk.manpa...@gmail.com +.\ Added notes on capability requirements +.\ 2008-05-03, mtk, expanded and rewrote parts of DESCRIPTION and RETURN +.\ VALUE, made style of page more consistent with man-pages style. +.\ +.TH GETUSERS 2 2014-10-20 Linux Linux Programmer's Manual +.SH NAME +getusers, setusers \- get/set list of supplementary user IDs +.SH SYNOPSIS +.B #include sys/types.h +.br +.B #include unistd.h +.sp +.BI int getusers(int size , uid_t * list ); +.sp +.BI int setusers(int size , const uid_t * list ); +.sp +.in -4n +Feature Test Macro Requirements for glibc (see +.BR feature_test_macros (7)): +.in +.sp +.BR getusers (), +.BR setusers (): +_GNU_SOURCE +.SH DESCRIPTION +.PP +.BR getusers () +returns the supplementary user IDs of the calling process in +.IR list . +The argument +.I size +should be set to the maximum number of items that can be stored in the +buffer pointed to by +.IR list . +If the calling process has more than +.I size +supplementary user IDs, +.BR getusers () +returns an error. +The returned list may or may not include the real, effective, saved, or filesystem user ID of the calling process. + +If +.I size +is zero, +.I list +is not modified, but the total number of supplementary user IDs for the +process is returned. +This allows the caller to determine the size of a dynamically allocated +.I list +to be used in a further call to +.BR getusers (). +.PP +.BR setusers () +sets the supplementary user IDs for the calling process. +The +.I size +argument specifies the number of supplementary user IDs +in the buffer pointed to by +.IR list . + +Any process may drop user IDs from its supplementary list; however, adding user +IDs not already present in the supplementary list or in the real, effective, or +saved user ID requires the +.B CAP_SETUID +capability. +.SH RETURN VALUE +On success, +.BR getusers () +returns the number of supplementary user IDs. +On error, \-1 is returned, and +.I errno +is set appropriately. + +On success, +.BR setusers () +returns 0. +On error, \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EFAULT +.I list +has an invalid address. +.PP +.BR getusers () +can additionally fail with the following error: +.TP +.B EINVAL +.I size +is less than the number of supplementary user IDs, but is not zero. +.PP +.BR setusers () +can additionally fail with the following errors: +.TP +.B EINVAL +.I size +is greater than +.B NUSERS_MAX +(65536). +.TP +.B ENOMEM +Out of memory. +.TP +.B EPERM +.I list +contains user IDs not present in the current supplemental user IDs list, +and not equal to the real, effective, or saved user ID, and the calling +process does not have +.BR CAP_SETUID . +.SH CONFORMING TO +.BR getusers (), +.BR setusers (), +and supplementary user IDs in general are Linux-specific and should not +be used
[PATCH] Add supplementary UIDs, and getusers/setusers system calls
Analogous to the supplementary GID list, the supplementary UID list provides a set of additional user credentials that a process can act as. A process with CAP_SETUID can set its UID list arbitrarily; a process without CAP_SETUID can only reduce its UID list. This allows each user to have a set of UIDs that they can then use to further sandbox individual child processes without first escalating to root to change UIDs. For instance, a PAM module could give each user a block of UIDs to work with. Tested via the following test program: #include err.h #include stdio.h #include sys/syscall.h #include unistd.h static int getusers(int count, uid_t *uids) { return syscall(322, count, uids); } static int setusers(int count, const uid_t *uids) { return syscall(323, count, uids); } static void show_users(void) { uid_t uids[65536]; int i, count = getusers(65536, uids); if (count 0) err(1, getusers); printf(UIDs:); for (i = 0; i count; i++) printf( %u, (unsigned)uids[i]); printf(\n); } int main(void) { uid_t list1[] = { 1, 2, 3, 4, 5 }; uid_t list2[] = { 1, 2, 3, 4 }; uid_t list3[] = { 2, 3, 4 }; show_users(); if (setusers(5, list1) 0) err(1, setusers 1); show_users(); if (setresgid(1, 1, 1) 0) err(1, setresgid); if (setresuid(1, 1, 1) 0) err(1, setresuid); if (setusers(4, list2) 0) err(1, setusers 2); show_users(); if (setusers(3, list3) 0) err(1, setusers 3); show_users(); if (setusers(4, list2) 0) err(1, setusers 4); show_users(); if (setresuid(2, 2, 2) 0) err(1, setresuid 2); if (setusers(5, list1) 0) err(1, setusers 5); show_users(); return 0; } In this test, all but the last call to setusers succeeds; the last call fails with EPERM because the unprivileged process attempts to add UID 5 to the supplementary UID list, which it does not currently have. Signed-off-by: Josh Triplett j...@joshtriplett.org --- arch/x86/syscalls/syscall_32.tbl | 2 + arch/x86/syscalls/syscall_64.tbl | 2 + include/linux/cred.h | 66 +++ include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 6 +- include/uapi/linux/limits.h | 1 + init/Kconfig | 9 ++ kernel/cred.c | 4 + kernel/groups.c | 173 ++ kernel/sys.c | 21 +++-- kernel/sys_ni.c | 2 + 11 files changed, 280 insertions(+), 8 deletions(-) diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 9fe1b5d..55717d7 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -364,3 +364,5 @@ 355i386getrandom sys_getrandom 356i386memfd_createsys_memfd_create 357i386bpf sys_bpf +358i386getuserssys_getusers +359i386setuserssys_setusers diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 281150b..5572e67 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -328,6 +328,8 @@ 319common memfd_createsys_memfd_create 320common kexec_file_load sys_kexec_file_load 321common bpf sys_bpf +322common getuserssys_getusers +323common setuserssys_setusers # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/cred.h b/include/linux/cred.h index b2d0820..31169fe 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -76,6 +76,8 @@ extern int groups_search(const struct group_info *, kgid_t); extern int in_group_p(kgid_t); extern int in_egroup_p(kgid_t); +struct user_info; + /* * The security context of a task * @@ -135,6 +137,12 @@ struct cred { struct user_struct *user; /* real user ID subscription */ struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */ struct group_info *group_info; /* supplementary groups for euid/fsgid */ +#ifdef CONFIG_SUPPLEMENTARY_UIDS + struct user_info *user_info;/* supplementary users */ +#define INIT_USER_INFO .user_info = init_users, +#else +#define INIT_USER_INFO +#endif struct rcu_head rcu;/* RCU deletion hook */ }; @@ -381,4 +389,62 @@ do { \ *(_fsgid) = __cred-fsgid; \ } while(0) +#ifdef CONFIG_SUPPLEMENTARY_UIDS +struct user_info { + atomic_tusage; + int nusers; + int nblocks; + kuid_t small_block[NGROUPS_SMALL]; + kuid_t *blocks[0]; +}; + +#define USER_AT(ui, i) GROUP_AT(ui, i) +extern struct user_info init_users
Re: [PATCH] Add supplementary UIDs, and getusers/setusers system calls
On Sat, Nov 15, 2014 at 11:08:31PM -0800, Josh Triplett wrote: asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist); +asmlinkage long sys_setusers(int uidsetsize, uid_t __user *grouplist); Obvious typo here: s/grouplist/userlist/. Will fix in a v2, but I'll wait for other feedback on v1 first. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build warnings after merge of the tiny tree
On Tue, Nov 11, 2014 at 05:43:04PM +1100, Stephen Rothwell wrote: > Hi Josh, > > After merging the tiny tree, today's linux-next build (powerpc64 > allnoconfig) produced these warnings: > > kernel/compat.c:33:12: warning: 'compat_get_timex' defined but not used > [-Wunused-function] > static int compat_get_timex(struct timex *txc, struct compat_timex __user > *utp) > ^ > kernel/compat.c:63:12: warning: 'compat_put_timex' defined but not used > [-Wunused-function] > static int compat_put_timex(struct compat_timex __user *utp, struct timex > *txc) > ^ > > Introduced by commit 7beb114f18e0 ("kernel: time: Compile out NTP > support"). Neither of these functions is needed if CONFIG_NTP is not > set. Fixed. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build warnings after merge of the tiny tree
On Tue, Nov 11, 2014 at 05:43:04PM +1100, Stephen Rothwell wrote: Hi Josh, After merging the tiny tree, today's linux-next build (powerpc64 allnoconfig) produced these warnings: kernel/compat.c:33:12: warning: 'compat_get_timex' defined but not used [-Wunused-function] static int compat_get_timex(struct timex *txc, struct compat_timex __user *utp) ^ kernel/compat.c:63:12: warning: 'compat_put_timex' defined but not used [-Wunused-function] static int compat_put_timex(struct compat_timex __user *utp, struct timex *txc) ^ Introduced by commit 7beb114f18e0 (kernel: time: Compile out NTP support). Neither of these functions is needed if CONFIG_NTP is not set. Fixed. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with Linus' tree
On Mon, Nov 10, 2014 at 02:28:03PM +1100, Stephen Rothwell wrote: > Hi Josh, > > Today's linux-next merge of the tiny tree got a conflict in net/Kconfig > between commit f89b7755f517 ("bpf: split eBPF out of NET") from Linus' > tree and commit 4ecea0db79ef ("lib: rhashtable: Make rhashtable.c > optional") from the tiny tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). This resolution looks correct to me. - Josh Triplett > diff --cc net/Kconfig > index 99815b5454bf,02badd46823f.. > --- a/net/Kconfig > +++ b/net/Kconfig > @@@ -6,7 -6,8 +6,8 @@@ menuconfig NE > bool "Networking support" > select NLATTR > select GENERIC_NET_UTILS > -select ANON_INODES > +select BPF > + select RHASHTABLE > ---help--- > Unless you really know what you are doing, you should say Y here. > The reason is that some programs need kernel networking support even -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the akpm-current tree with the tiny tree
On Mon, Nov 10, 2014 at 07:59:12PM +1100, Stephen Rothwell wrote: > Hi Andrew, > > Today's linux-next merge of the akpm-current tree got a conflict in > lib/Kconfig between commit 3a8aeefbd4e9 ("lib: halfmd4: Make halfmd4.c > optional") from the tiny tree and commit c7f379b518a3 ("lib: Add CRC64 > ECMA module") from the akpm-current tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). Looks good. - Josh Triplett > diff --cc lib/Kconfig > index b8c97f05140d,2faf7b2de5b3.. > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@@ -186,9 -180,13 +186,16 @@@ config CRC > when they need to do cyclic redundancy check according CRC8 > algorithm. Module will be called crc8. > > + config CRC64_ECMA > + tristate "CRC64 ECMA function" > + help > + This option provides CRC64 ECMA function. Drivers may select this > + when they need to do cyclic redundancy check according to the CRC64 > + ECMA algorithm. > + > +config HALFMD4 > +bool > + > config AUDIT_GENERIC > bool > depends on AUDIT && !AUDIT_ARCH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the net-next tree
On Mon, Nov 10, 2014 at 02:25:11PM +1100, Stephen Rothwell wrote: > Hi Josh, > > Today's linux-next merge of the tiny tree got a conflict in > lib/Makefile between commit e5a2c8999576 ("fast_hash: avoid indirect > function calls") from the net-next tree and commit 4ecea0db79ef ("lib: > rhashtable: Make rhashtable.c optional") from the tiny tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). This resolution looks correct to me. - Josh Triplett > diff --cc lib/Makefile > index 04e53dd16070,47b8305288e2.. > --- a/lib/Makefile > +++ b/lib/Makefile > @@@ -22,11 -22,14 +22,14 @@@ lib-$(CONFIG_SMP) += cpumask. > lib-y += kobject.o klist.o > obj-y += lockref.o > > - obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ > + obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ >bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \ > - gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \ > + gcd.o lcm.o list_sort.o uuid.o iovec.o clz_ctz.o \ >bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \ > - percpu-refcount.o percpu_ida.o rhashtable.o > - percpu-refcount.o percpu_ida.o hash.o > ++ percpu-refcount.o percpu_ida.o > + obj-$(CONFIG_FLEX_ARRAY) += flex_array.o > + obj-$(CONFIG_HALFMD4) += halfmd4.o > + obj-$(CONFIG_RHASHTABLE) += rhashtable.o > obj-y += string_helpers.o > obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o > obj-y += kstrtox.o -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the tiny tree
On Sun, Nov 09, 2014 at 11:56:04PM -0800, Josh Triplett wrote: > On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote: > > Hi Josh, > > > > After merging the tiny tree, today's linux-next build (powerpc64 > > allnoconfig) > > failed like this: > > > > arch/powerpc/kernel/built-in.o: In function `sys_call_table': > > (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex' > > arch/powerpc/kernel/built-in.o: In function `sys_call_table': > > (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime' > > > > Caused by commit be1e48a8e48f ("kernel: time: Compile out NTP support"). > > Good catch, thanks. Not clear why that build failure didn't show up on > x86. > > Looks like the two compat syscalls need cond_syscall lines as well. > I'll add those to the patch. Done in current tiny/next. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the tiny tree
On Sun, Nov 09, 2014 at 11:56:04PM -0800, Josh Triplett wrote: On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote: Hi Josh, After merging the tiny tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/kernel/built-in.o: In function `sys_call_table': (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex' arch/powerpc/kernel/built-in.o: In function `sys_call_table': (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime' Caused by commit be1e48a8e48f (kernel: time: Compile out NTP support). Good catch, thanks. Not clear why that build failure didn't show up on x86. Looks like the two compat syscalls need cond_syscall lines as well. I'll add those to the patch. Done in current tiny/next. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with the net-next tree
On Mon, Nov 10, 2014 at 02:25:11PM +1100, Stephen Rothwell wrote: Hi Josh, Today's linux-next merge of the tiny tree got a conflict in lib/Makefile between commit e5a2c8999576 (fast_hash: avoid indirect function calls) from the net-next tree and commit 4ecea0db79ef (lib: rhashtable: Make rhashtable.c optional) from the tiny tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). This resolution looks correct to me. - Josh Triplett diff --cc lib/Makefile index 04e53dd16070,47b8305288e2.. --- a/lib/Makefile +++ b/lib/Makefile @@@ -22,11 -22,14 +22,14 @@@ lib-$(CONFIG_SMP) += cpumask. lib-y += kobject.o klist.o obj-y += lockref.o - obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ + obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \ - gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \ + gcd.o lcm.o list_sort.o uuid.o iovec.o clz_ctz.o \ bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \ - percpu-refcount.o percpu_ida.o rhashtable.o - percpu-refcount.o percpu_ida.o hash.o ++ percpu-refcount.o percpu_ida.o + obj-$(CONFIG_FLEX_ARRAY) += flex_array.o + obj-$(CONFIG_HALFMD4) += halfmd4.o + obj-$(CONFIG_RHASHTABLE) += rhashtable.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += kstrtox.o -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the akpm-current tree with the tiny tree
On Mon, Nov 10, 2014 at 07:59:12PM +1100, Stephen Rothwell wrote: Hi Andrew, Today's linux-next merge of the akpm-current tree got a conflict in lib/Kconfig between commit 3a8aeefbd4e9 (lib: halfmd4: Make halfmd4.c optional) from the tiny tree and commit c7f379b518a3 (lib: Add CRC64 ECMA module) from the akpm-current tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). Looks good. - Josh Triplett diff --cc lib/Kconfig index b8c97f05140d,2faf7b2de5b3.. --- a/lib/Kconfig +++ b/lib/Kconfig @@@ -186,9 -180,13 +186,16 @@@ config CRC when they need to do cyclic redundancy check according CRC8 algorithm. Module will be called crc8. + config CRC64_ECMA + tristate CRC64 ECMA function + help + This option provides CRC64 ECMA function. Drivers may select this + when they need to do cyclic redundancy check according to the CRC64 + ECMA algorithm. + +config HALFMD4 +bool + config AUDIT_GENERIC bool depends on AUDIT !AUDIT_ARCH -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tiny tree with Linus' tree
On Mon, Nov 10, 2014 at 02:28:03PM +1100, Stephen Rothwell wrote: Hi Josh, Today's linux-next merge of the tiny tree got a conflict in net/Kconfig between commit f89b7755f517 (bpf: split eBPF out of NET) from Linus' tree and commit 4ecea0db79ef (lib: rhashtable: Make rhashtable.c optional) from the tiny tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). This resolution looks correct to me. - Josh Triplett diff --cc net/Kconfig index 99815b5454bf,02badd46823f.. --- a/net/Kconfig +++ b/net/Kconfig @@@ -6,7 -6,8 +6,8 @@@ menuconfig NE bool Networking support select NLATTR select GENERIC_NET_UTILS -select ANON_INODES +select BPF + select RHASHTABLE ---help--- Unless you really know what you are doing, you should say Y here. The reason is that some programs need kernel networking support even -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the tiny tree
On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote: > Hi Josh, > > After merging the tiny tree, today's linux-next build (powerpc64 allnoconfig) > failed like this: > > arch/powerpc/kernel/built-in.o: In function `sys_call_table': > (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex' > arch/powerpc/kernel/built-in.o: In function `sys_call_table': > (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime' > > Caused by commit be1e48a8e48f ("kernel: time: Compile out NTP support"). Good catch, thanks. Not clear why that build failure didn't show up on x86. Looks like the two compat syscalls need cond_syscall lines as well. I'll add those to the patch. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the tiny tree
On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote: Hi Josh, After merging the tiny tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/kernel/built-in.o: In function `sys_call_table': (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex' arch/powerpc/kernel/built-in.o: In function `sys_call_table': (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime' Caused by commit be1e48a8e48f (kernel: time: Compile out NTP support). Good catch, thanks. Not clear why that build failure didn't show up on x86. Looks like the two compat syscalls need cond_syscall lines as well. I'll add those to the patch. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info
On Sat, Nov 08, 2014 at 12:49:29PM -0800, Joe Perches wrote: > On Sat, 2014-11-08 at 11:01 -0800, Josh Triplett wrote: > > On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote: > > > CREDITS | 7 +++ > > > MAINTAINERS | 10 ++ > > > 2 files changed, 17 insertions(+) > > > > Given the wide variety of folks who patch CREDITS and MAINTAINERS, might > > I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS > > file (perhaps in the same entry as get_maintainer.pl), so that > > get_maintainer.pl does not fall back to git history for them? > > Thanks but: > > There is no MAINTAINERS/CREDITS maintainer. > > I'm not volunteering. Oh, I thought from all your work on file patterns and other such improvements that you were the de-facto MAINTAINERS maintainer. > Maybe you? No thank you. > But maybe something like this would help: > > It has to have both an M: and an S: entry > with maintained or supported to stop get_maintainers > from doing a git history lookup. > > --- > > diff --git a/MAINTAINERS b/MAINTAINERS > index 0662378..584fd69 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -5881,6 +5881,12 @@ F: drivers/mailbox/ > F: include/linux/mailbox_client.h > F: include/linux/mailbox_controller.h > > +MAINTAINERS/CREDITS > +M: linux-kernel@vger.kernel.org > +S: Maintained > +F: MAINTAINERS > +F: CREDITS Sending something only to LKML seems likely to go un-noticed. I think we need a better solution than this. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scripts/ksize: Add kernel build size report
On Sat, Nov 08, 2014 at 03:18:18PM -0800, Darren Hart wrote: > ksize generates hierarchical build size reports from vmlinux, *.o, and > built-in.o files. > > ksize is useful in preparing minimal configurations and comparing > similar configurations across kernel versions. > > Signed-off-by: Darren Hart > Cc: Josh Triplett One comment below; with that addressed: Reviewed-by: Josh Triplett > +def main(argv): > +try: > +opts, args = getopt.getopt(argv[1:], "dh", ["help"]) > +except getopt.GetoptError, err: > +print '%s' % str(err) > +usage() > +return 2 > + > +driver_detail = False > +for o, a in opts: > +if o == '-d': > +driver_detail = True > +elif o in ('-h', '--help'): > +usage() > +return 0 > +else: > +assert False, "unhandled option" > + > +cols = term_width() > + > +# Determine kernel version > +p = Popen("strings vmlinux | grep 'Linux version' | cut -d ' ' -f 3", > + shell=True, stdout=PIPE, stderr=PIPE) > +version = p.communicate()[0].strip() This seems like a very fragile, Perl-y way to obtain the kernel version. I'd suggest either not including the version (just as bloat-o-meter doesn't), or parsing it out using objdump -h -t and file offsetting (looking for the offset and size of linux_banner). Personally I'd go with the former. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info
On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote: > CREDITS | 7 +++ > MAINTAINERS | 10 ++ > 2 files changed, 17 insertions(+) Given the wide variety of folks who patch CREDITS and MAINTAINERS, might I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS file (perhaps in the same entry as get_maintainer.pl), so that get_maintainer.pl does not fall back to git history for them? - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info
On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote: CREDITS | 7 +++ MAINTAINERS | 10 ++ 2 files changed, 17 insertions(+) Given the wide variety of folks who patch CREDITS and MAINTAINERS, might I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS file (perhaps in the same entry as get_maintainer.pl), so that get_maintainer.pl does not fall back to git history for them? - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scripts/ksize: Add kernel build size report
On Sat, Nov 08, 2014 at 03:18:18PM -0800, Darren Hart wrote: ksize generates hierarchical build size reports from vmlinux, *.o, and built-in.o files. ksize is useful in preparing minimal configurations and comparing similar configurations across kernel versions. Signed-off-by: Darren Hart dvh...@linux.intel.com Cc: Josh Triplett j...@joshtriplett.org One comment below; with that addressed: Reviewed-by: Josh Triplett j...@joshtriplett.org +def main(argv): +try: +opts, args = getopt.getopt(argv[1:], dh, [help]) +except getopt.GetoptError, err: +print '%s' % str(err) +usage() +return 2 + +driver_detail = False +for o, a in opts: +if o == '-d': +driver_detail = True +elif o in ('-h', '--help'): +usage() +return 0 +else: +assert False, unhandled option + +cols = term_width() + +# Determine kernel version +p = Popen(strings vmlinux | grep 'Linux version' | cut -d ' ' -f 3, + shell=True, stdout=PIPE, stderr=PIPE) +version = p.communicate()[0].strip() This seems like a very fragile, Perl-y way to obtain the kernel version. I'd suggest either not including the version (just as bloat-o-meter doesn't), or parsing it out using objdump -h -t and file offsetting (looking for the offset and size of linux_banner). Personally I'd go with the former. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info
On Sat, Nov 08, 2014 at 12:49:29PM -0800, Joe Perches wrote: On Sat, 2014-11-08 at 11:01 -0800, Josh Triplett wrote: On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote: CREDITS | 7 +++ MAINTAINERS | 10 ++ 2 files changed, 17 insertions(+) Given the wide variety of folks who patch CREDITS and MAINTAINERS, might I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS file (perhaps in the same entry as get_maintainer.pl), so that get_maintainer.pl does not fall back to git history for them? Thanks but: There is no MAINTAINERS/CREDITS maintainer. I'm not volunteering. Oh, I thought from all your work on file patterns and other such improvements that you were the de-facto MAINTAINERS maintainer. Maybe you? No thank you. But maybe something like this would help: It has to have both an M: and an S: entry with maintained or supported to stop get_maintainers from doing a git history lookup. --- diff --git a/MAINTAINERS b/MAINTAINERS index 0662378..584fd69 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5881,6 +5881,12 @@ F: drivers/mailbox/ F: include/linux/mailbox_client.h F: include/linux/mailbox_controller.h +MAINTAINERS/CREDITS +M: linux-kernel@vger.kernel.org +S: Maintained +F: MAINTAINERS +F: CREDITS Sending something only to LKML seems likely to go un-noticed. I think we need a better solution than this. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lib/flex_array: make build optional
On Thu, Nov 06, 2014 at 04:50:56PM -0500, Aristeu Rozanski wrote: > On Thu, Nov 06, 2014 at 01:44:54PM -0800, Dave Hansen wrote: > > On 11/06/2014 01:33 PM, Dave Hansen wrote: > > > On 11/06/2014 01:27 PM, Aristeu Rozanski wrote: > > >> +config FLEX_ARRAY > > >> +bool "Flexible array" > > >> +default n > > >> +help > > >> + This option enables an implementation of flexible arrays which > > >> + allows creating arrays of fixed size elements with an > > >> arbritrary > > >> + size without requiring the single allocation of a contiguous > > >> area. > > >> + > > >> + See Documentation/flexible-arrays.txt > > > > > > Is there any reason to expose this to the user via Kconfig? > > > > > > No sane person would even turn it on if they don't need it. > > > > IOW, I think you should just make it: > > > > config FLEX_ARRAY > > def_bool n > > Joe Pershes complained on a similar patch on making it default to 'n'. > Will rework the patches this way. https://git.kernel.org/cgit/linux/kernel/git/josh/linux.git/commit/?h=tiny/unflex-array=6631d5fb4cf395ebd2dc0f2da05525b9d3436a3f Already done. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lib/flex_array: make build optional
On Thu, Nov 06, 2014 at 04:50:56PM -0500, Aristeu Rozanski wrote: On Thu, Nov 06, 2014 at 01:44:54PM -0800, Dave Hansen wrote: On 11/06/2014 01:33 PM, Dave Hansen wrote: On 11/06/2014 01:27 PM, Aristeu Rozanski wrote: +config FLEX_ARRAY +bool Flexible array +default n +help + This option enables an implementation of flexible arrays which + allows creating arrays of fixed size elements with an arbritrary + size without requiring the single allocation of a contiguous area. + + See Documentation/flexible-arrays.txt Is there any reason to expose this to the user via Kconfig? No sane person would even turn it on if they don't need it. IOW, I think you should just make it: config FLEX_ARRAY def_bool n Joe Pershes complained on a similar patch on making it default to 'n'. Will rework the patches this way. https://git.kernel.org/cgit/linux/kernel/git/josh/linux.git/commit/?h=tiny/unflex-arrayid=6631d5fb4cf395ebd2dc0f2da05525b9d3436a3f Already done. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CodingStyle: Add a chapter on conditional compilation
On Mon, Nov 03, 2014 at 09:47:40AM -0800, Joe Perches wrote: > On Mon, 2014-11-03 at 11:46 -0500, Jonathan Corbet wrote: > > On Wed, 29 Oct 2014 11:15:17 -0700 > > Josh Triplett wrote: > > > > > Document several common practices and conventions regarding conditional > > > compilation, most notably the preference for ifdefs in headers rather > > > than .c files. > > > > OK, I've picked this one up for my 3.19 docs pull. > > I think that Al Viro's suggestion from awhile ago: > > https://lkml.org/lkml/2013/3/20/388 > > could still be in CodingStyle somewhere or in > another document like CodingStyleSuggestions. I think that text needs some cleanup to better fit CodingStyle, but the intent and recommendations definitely ought to go in. A few of those seem too far down the road of "don't stuff beans up your nose", and some of them need shortening (just "don't put an else after an if condition ending with break or return; remember to handle errors via break, return, or continue, and outdent the subsequent code"). > Another thing that could go is the suggestion to > use Lindent. > > https://lkml.org/lkml/2013/2/11/390 Agreed completely. We might consider coming up with settings for clang-format, which seems like a far more capable replacement that actually understands C. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] compiler: Correct macro parameter expansion problem
On Mon, Nov 03, 2014 at 06:34:13AM -0800, Jeff Kirsher wrote: > From: Mark Rustad > > The macro __compiletime_error_fallback has an error in that it > lacks parens around the expansion of an expression. It also > lacks a conversion to a boolean value. The first problem can > result in a mis-evaluation. The second problem can also result > in an error if the value comes out negative. That would thwart > the intended error generation. That is, the error being asserted > would not in fact generate an error. Fix both problems by adding > !!() to the expansion. > > Signed-off-by: Mark Rustad > Tested-by: Aaron Brown > Signed-off-by: Jeff Kirsher Good catch. Reviewed-by: Josh Triplett > include/linux/compiler.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index d5ad7b1..59dc611 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -331,7 +331,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, > int val, int expect); > */ > # ifndef __CHECKER__ > # define __compiletime_error_fallback(condition) \ > - do { ((void)sizeof(char[1 - 2 * condition])); } while (0) > + do { ((void)sizeof(char[1 - 2 * !!(condition)])); } while (0) > # endif > #endif > #ifndef __compiletime_error_fallback > -- > 1.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sparse" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 10/10] x86: Support compiling out userspace IO (iopl and ioperm)
On Mon, Nov 03, 2014 at 12:10:49PM +, One Thousand Gnomes wrote: > On Sun, 2 Nov 2014 09:33:01 -0800 > Josh Triplett wrote: > > > On the vast majority of modern systems, no processes will use the > > userspsace IO syscalls, iopl and ioperm. Add a new config option, > > CONFIG_X86_IOPORT, to support configuring them out of the kernel > > entirely. Most current systems do not run programs using these > > syscalls, so X86_IOPORT does not depend on EXPERT, though it does still > > default to y. > > This isn't unreasonable but there are drivers with userspace helpers that > use iopl/ioperm type functionality where you should be doing a SELECT of > X86_IOPORT. The one that comes to mind is the uvesa driver. From a quick > scan it may these days be the only mainstream one that needs the select > adding. Should kernel drivers really express dependencies that only their (current instances of) corresponding userspace components need? Something seems wrong about that. > Some X servers for legacy cards still use io port access. Sure, X servers using UMS rather than KMS seem like a common reason to need this. > There are also > a couple of other highly non-obvious userspace users that hang on for > some systems - eg some older servers DMI and error records can only by > read via a real mode BIOS call so management tools have no choice but to > go the lrmi/io path. As with any userspace interface, some callers may potentially still exist. And this still has "default y", too, to avoid user surprises. > Still makes sense IMHO. > > From a code perspective however you could define IO_BITMAP_LONGS to 0, > add an IO_BITMAP_SIZE (defined as LONGS + 1 or 0) and as far as I can see > gcc would then optimise out a lot of the code you are ifdeffing IO_BITMAP_LONGS already gets defined to (0/sizeof(long)). And as far as I can tell, that would only work for init_tss_io, not anything else. Even then, that would only work with a zero-size array left around in tss_struct, which doesn't seem appropriate. The remaining ifdefs wrap code that GCC could not constant-fold away, and making that code constant-foldable seems significantly more invasive than the ifdefs. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/