from:"Josh Triplett"

Re: [PATCH v4] selftest: size: Add size test for Linux kernel

2014-11-26 Thread Josh Triplett

On Wed, Nov 26, 2014 at 08:27:23PM -0800, Tim Bird wrote:
 --- /dev/null
 +++ b/tools/testing/selftests/size/Makefile
[...]
 +LIBGCC=$(shell $(CC) -print-libgcc-file-name)
 +
 +get_size: get_size.c
 + $(CC) --static -ffreestanding -nostartfiles \
 + -Wl,--entry=_start get_size.c $(LIBGCC) \
 + -o get_size

You don't need -Wl,--entry=_start; that's the default.

You shouldn't need to manually find libgcc, either; the compiler should
do that for you.  What goes wrong if you don't include that?  If you're
trying to link libgcc statically, try -static-libgcc.

Also, static is normally spelled -static, not --static.

 --- /dev/null
 +++ b/tools/testing/selftests/size/get_size.c
[...]
 +int print(const char *s)

This function, and all the others apart from _start, should be declared
static.

 +void num_to_str(unsigned long num, char *s)

Likewise, static.

 +{
 + unsigned long long temp, div;
 + int started;
 +
 + temp = num;
 + div = 100LL;
 + started = 0;
 + while (div) {
 + if (temp/div || started) {
 + *s++ = (unsigned char)(temp/div + '0');
 + started = 1;
 + }
 + temp -= (temp/div)*div;
 + div /= 10;
 + }
 + *s = 0;
 +}

You'd probably end up with significantly smaller code (and no divisions,
and thus no corner cases on architectures that need a special function
to do unsigned long long division) if you print in hex.  You could also
drop the no leading zeros logic, and just *always* print a 64-bit
value as 16 hex digits.

 +int print_num(unsigned long num)

Likewise, static.

 +{
 + char num_buf[30];
 +
 + num_to_str(num, num_buf);
 + return print(num_buf);
 +}
 +
 +int print_k_value(const char *s, unsigned long num, unsigned long units)
 +{
 + unsigned long long temp;
 + int ccode;
 +
 + print(s);
 +
 + temp = num;
 + temp = (temp * units)/1024;
 + num = temp;
 + ccode = print_num(num);
 + print(\n);
 + return ccode;
 +}

I'd suggest dropping this entirely, and just always printing the exact
values returned by sysinfo.  Drop the multiply, too, and just print
info.mem_unit as well.  It's easy to post-process the value in a more
capable environment.

 +/* this program has no main(), as startup libraries are not used */
 +void _start(void)
 +{
 + int ccode;
 + struct sysinfo info;
 + unsigned long used;
 +
 + print(Testing system size.\n);
 + print(1..1\n);
 +
 + ccode = sysinfo(info);
 + if (ccode  0) {
 + print(not ok 1 get size runtime size\n);

Shouldn't the not ok here and the ok below have the same test
description?

 + print(# could not get sysinfo\n);
 + _exit(ccode);
 + }
 + /* ignore cache complexities for now */
 + used = info.totalram - info.freeram - info.bufferram;
 + print_k_value(ok 1 get runtime memory use # size = , used,
 + info.mem_unit);
 +
 + print(# System runtime memory report (units in Kilobytes):\n);
 + print_k_value(#   Total:  , info.totalram, info.mem_unit);
 + print_k_value(#   Free:   , info.freeram, info.mem_unit);
 + print_k_value(#   Buffer: , info.bufferram, info.mem_unit);
 + print_k_value(#   In use: , used, info.mem_unit);
 +
 + _exit(0);
 +}
 -- 
 1.8.2.2
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the tip tree

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 10:30:10PM -0800, John Stultz wrote:
> On Mon, Nov 24, 2014 at 10:16 PM, Ingo Molnar  wrote:
> >
> > * Stephen Rothwell  wrote:
> >
> >> Hi Josh,
> >>
> >> Today's linux-next merge of the tiny tree got a conflict in
> >> kernel/time/Makefile between commit fd866e2b116b ("time: Rename
> >> udelay_test.c to test_udelay.c") from the tip tree and commit
> >> d1f6d68d03ea ("kernel: time: Compile out NTP support") from the tiny
> >> tree.
> >
> > So I think a timer subsystem commit d1f6d68d03ea with this
> > magnitude of linecount increase:
> >
> >  Signed-off-by: Catalina Mocanu 
> >  [josh: Handle CONFIG_COMPAT=y.]
> >  Reviewed-by: Josh Triplett 
> >  Signed-off-by: Josh Triplett 
> >  ---
> >   drivers/pps/Kconfig|  2 +-
> >   include/linux/timex.h  | 15 +--
> >   init/Kconfig   | 10 ++
> >   kernel/compat.c|  8 ++--
> >   kernel/sys_ni.c|  4 
> >   kernel/time/Makefile   |  3 ++-
> >   kernel/time/ntp_internal.h | 28 
> >   kernel/time/posix-timers.c |  2 ++
> >   kernel/time/time.c |  2 ++
> >   kernel/time/timekeeping.c  |  2 ++
> >   10 files changed, 70 insertions(+), 6 deletions(-)
> >
> > at minimum needs the ack of timer folks, before it can be
> > committed to Git. Or is the tiny tree plan to submit all
> > patches to the appropriate subsystem or gather acks, before
> > sending it upstream?
> 
> Yea.  From first glance d1f6d68d03ea looks fairly broken.
> 
> Returning 0 for ntp_tick_length() (which should be the current tick
> length in NTP_SCALE_SHIFT shifted ns), seems like it would cause major
> timekeeping problems.

Ouch, yeah; I'm impressed the kernel successfully booted that way (which
I did test).

Computing the tick_length to return seems to require a div_u64; is it
safe to initialize a static const with the result of calling div_u64, or
does the intializer need manual constant-folding to make the expression
compile-time computable?

Going by the logic in ntp_update_frequency, it looks like the stub
ntp_tick_length needs to return:

tick_length_base = 0;
tick_usec = TICK_USEC;
second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << NTP_SCALE_SHIFT;
new_base = div_u64(second_length, NTP_INTERVAL_FREQ);
tick_length += new_base - tick_length_base;

(tick_length starts out 0, gets new_base - 0 added initially, and every
subsequent time gets 0 added since tick_length_base won't change.)

Substituting and simplifying:

tick_length = new_base
= div_u64(second_length, NTP_INTERVAL_FREQ)
= div_u64((TICK_USEC * NSEC_PER_USEC * USER_HZ) << NTP_SCALE_SHIFT, 
NTP_INTERVAL_FREQ)

The numerator there could potentially be simplified, but I don't see an
obvious way around the division by NTP_INTERVAL_FREQ (defined as HZ).

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the tip tree

2014-11-24 Thread Josh Triplett

On Tue, Nov 25, 2014 at 07:16:45AM +0100, Ingo Molnar wrote:
> * Stephen Rothwell  wrote:
> > Hi Josh,
> > 
> > Today's linux-next merge of the tiny tree got a conflict in
> > kernel/time/Makefile between commit fd866e2b116b ("time: Rename
> > udelay_test.c to test_udelay.c") from the tip tree and commit
> > d1f6d68d03ea ("kernel: time: Compile out NTP support") from the tiny
> > tree.
> 
> So I think a timer subsystem commit d1f6d68d03ea with this 
> magnitude of linecount increase:
> 
>  Signed-off-by: Catalina Mocanu 
>  [josh: Handle CONFIG_COMPAT=y.]
>  Reviewed-by: Josh Triplett 
>  Signed-off-by: Josh Triplett 
>  ---
>   drivers/pps/Kconfig|  2 +-
>   include/linux/timex.h  | 15 +--
>   init/Kconfig   | 10 ++
>   kernel/compat.c|  8 ++--
>   kernel/sys_ni.c|  4 
>   kernel/time/Makefile   |  3 ++-
>   kernel/time/ntp_internal.h | 28 
>   kernel/time/posix-timers.c |  2 ++
>   kernel/time/time.c |  2 ++
>   kernel/time/timekeeping.c  |  2 ++
>   10 files changed, 70 insertions(+), 6 deletions(-)
> 
> at minimum needs the ack of timer folks, before it can be 
> committed to Git. Or is the tiny tree plan to submit all
> patches to the appropriate subsystem or gather acks, before 
> sending it upstream?

Yes, absolutely.  I planned to send out a tinification patch review
series later this week with all 10 current patches (both those reviewed
on LKML and those only reviewed elsewhere).

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/7] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-24 Thread Josh Triplett

On Tue, Nov 25, 2014 at 12:00:59AM +0100, Pieter Smith wrote:
> REPO: https://github.com/smipi1/linux-tinification.git
> 
> BRANCH: tiny/config-syscall-splice
> 
> BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort 
> (
>   https://tiny.wiki.kernel.org/).
> 
> GOAL: Support compiling out the splice family of syscalls (splice, vmsplice,
>   tee and sendfile) along with all supporting infrastructure if not needed.
>   Many embedded systems will not need the splice-family syscalls. Omitting 
> them
>   saves space.
> 
> HISTORY:
>   PATCH v4:
> - Drops __splice_p()
> - Let nfsd fall back to non-splice support when splice is compiled out
> - Style fixes
[...]
> RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set:
> 
> add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)

I replied to one patch with a minor nit in the commit message.  Other
than that, I don't see any obvious issues with this.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/7] fs/splice: support compiling out splice-family syscalls

2014-11-24 Thread Josh Triplett

On Tue, Nov 25, 2014 at 12:01:02AM +0100, Pieter Smith wrote:
> Many embedded systems will not need the splice-family syscalls (splice,
> vmsplice, tee and sendfile). Omitting them saves space.  This adds a new 
> EXPERT
> config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.
> 
> The goal is to completely compile out fs/splice along with the syscalls. To
> achieve this, the remaining patch-set will deal with fs/splice exports. As far
> as possible, the impact on other device drivers will be minimized so as to
> reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE.
> 
> The use of exported functions will be solved by transparently mocking them out
> with static inlines. Uses of the exported pipe_buf_operations struct however
> require direct modification in fs/fuse and net/core. The next two patches will
> deal with this. A macro is defined that will assist with NULL'ing out 
> callbacks
> when CONFIG_SYSCALL_SPLICE is undefined: __splice_p().

This message needs updating, since the patch series doesn't introduce or
use __splice_p anymore.

> Once all exports are solved, fs/splice can be compiled out.
> 
> The bloat benefit of this patch given a tinyconfig is:
> 
> add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
> function old new   delta
> splice_direct_to_actor   348 416 +68
> splice_to_pipe   371 417 +46
> splice_from_pipe_next107 106  -1
> fdput 11   - -11
> signal_pending39  26 -13
> fdget 56  42 -14
> user_page_pipe_buf_ops20   - -20
> user_page_pipe_buf_steal  25   - -25
> file_end_write58  29 -29
> file_start_write  68  34 -34
> pipe_to_user  43   - -43
> wakeup_pipe_readers   54   - -54
> do_splice_to  87   - -87
> ipipe_prep.part   92   - -92
> opipe_prep.part  119   --119
> sys_sendfile 122   --122
> sys_sendfile64   126   --126
> sys_vmsplice 137   --137
> vmsplice_to_user 205   --205
> sys_tee  491   --491
> do_sendfile  492   --492
> vmsplice_to_pipe 558   --558
> sys_splice  1020   -   -1020
> 
> Signed-off-by: Pieter Smith 
> ---
>  fs/splice.c |  2 ++
>  init/Kconfig| 10 ++
>  kernel/sys_ni.c |  8 
>  3 files changed, 20 insertions(+)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 44b201b..7c4c695 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
> struct file *out,
>   return ret;
>  }
>  
> +#ifdef CONFIG_SYSCALL_SPLICE
>  static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
>  struct pipe_inode_info *opipe,
>  size_t len, unsigned int flags);
> @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
> in_fd,
>   return do_sendfile(out_fd, in_fd, NULL, count, 0);
>  }
>  #endif
> +#endif
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index d811d5f..dec9819 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1571,6 +1571,16 @@ config NTP
> system clock to an NTP server, you can disable this option to save
> space.
>  
> +config SYSCALL_SPLICE
> + bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT
> + default y
> + help
> +   This option enables the splice, vmsplice, tee and sendfile syscalls. 
> These
> +   are used by applications to: move data between buffers and arbitrary 
> file
> +   descriptors; "copy" data between buffers; or copy data from userspace 
> into
> +   buffers. If building an embedded system where no applications use 
> these
> +   syscalls, you can disable this option to save space.
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index d2f5b00..25d5551 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
>  cond_syscall(sys_stat);
>  cond_syscall(sys_uname);
>  cond_syscall(sys_olduname);
> +cond_syscall(sys_vmsplice);
> +cond_syscall(sys_splice);
> +cond_syscall(sys_tee);
> +cond_syscall(sys_sendfile);
> +cond_syscall(sys_sendfile64);

Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 10:49:31AM +0100, Pieter Smith wrote:
> On Sun, Nov 23, 2014 at 03:23:02PM -0800, Josh Triplett wrote:
> > On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote:
> > > On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith  wrote:
> > > > To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. 
> > > > This
> > > > struct is exported by fs/splice. The goal of the larger patch set is to
> > > > completely compile out fs/splice, so uses of the exported struct need 
> > > > to be
> > > > compiled out along with fs/splice.
> > > >
> > > > This patch therefore compiles out splice support in fs/fuse when
> > > > CONFIG_SYSCALL_SPLICE is undefined.
> > > >
> > > > Signed-off-by: Pieter Smith 
> > > > ---
> > > >  fs/fuse/dev.c  | 4 ++--
> > > >  include/linux/fs.h | 6 ++
> > > >  2 files changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > > index ca88731..f8f92a4 100644
> > > > --- a/fs/fuse/dev.c
> > > > +++ b/fs/fuse/dev.c
> > > > @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, 
> > > > const struct iovec *iov,
> > > > return fuse_dev_do_read(fc, file, , iov_length(iov, 
> > > > nr_segs));
> > > >  }
> > > >
> > > > -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
> > > > +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, 
> > > > loff_t *ppos,
> > > > struct pipe_inode_info *pipe,
> > > > size_t len, unsigned int flags)
> > > >  {
> > > > @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations 
> > > > = {
> > > > .llseek = no_llseek,
> > > > .read   = do_sync_read,
> > > > .aio_read   = fuse_dev_read,
> > > > -   .splice_read= fuse_dev_splice_read,
> > > > +   .splice_read= __splice_p(fuse_dev_splice_read),
> > > > .write  = do_sync_write,
> > > > .aio_write  = fuse_dev_write,
> > > > .splice_write   = fuse_dev_splice_write,
> > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > index a957d43..04c0975 100644
> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, 
> > > > loff_t start, loff_t end,
> > > > int datasync);
> > > >  extern void block_sync_page(struct page *page);
> > > >
> > > > +#ifdef CONFIG_SYSCALL_SPLICE
> > > > +#define __splice_p(x) x
> > > > +#else
> > > > +#define __splice_p(x) NULL
> > > > +#endif
> > > > +
> > > 
> > > This needs to go into a different patch.
> > > One logical change per patch please. :-)
> > 
> > Easy enough to merge this one into the patch introducing
> > CONFIG_SYSCALL_SPLICE, then.
> > 
> > - Josh Triplett
> 
> The patch introducing CONFIG_SYSCALL_SPLICE (PATCH 3) only compiles out the
> syscalls. PATCH 6 on the other hand, compiles out fs/splice.c. This patch
> allows fs/fuse to be compiled when fs/splice.c is compiled out. If I am to
> squash it, it would be logical to include it in PATCH 6, not 3.

The suggestion wasn't to move the fs/fuse/dev.c bits; those should
definitely stay in this patch.  The suggestion was just to move the bit
of the patch defining __splice_p from this patch to patch 3.  (Note that
you need to define it before you use it, so it can't go in patch 6.)

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 11:01:38AM +0100, Pieter Smith wrote:
> On Sun, Nov 23, 2014 at 04:32:51PM -0800, Josh Triplett wrote:
> > On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote:
> > > On Sun, 23 Nov 2014 15:36:37 -0800
> > > Josh Triplett  wrote:
> > > 
> > > > On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
> > > > > On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
> > > > > > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
> > > > > > > Truly removing sendfile/sendpage means that you can't even 
> > > > > > > compile NFS
> > > > > > > into the tree.
> > > > > > 
> > > > > > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a 
> > > > > > large
> > > > > > stack of "select" and "depends on", both directly and indirectly; 
> > > > > > adding
> > > > > > a "select SPLICE_SYSCALL" to it seems fine.  (That select does need
> > > > > > adding, though.  Pieter, you need to test-compile more than just
> > > > > > tinyconfig and defconfig.  Try an allyesconfig with *just* splice 
> > > > > > turned
> > > > > > off, and make sure that compiles.)
> > > > > 
> > > > > Did exacly that. Took forever on my hardware, but no problems.
> > > > 
> > > > Ah, I see.  Looking more closely at nfsd, it looks like it already has a
> > > > code path for filesystems that don't do splice.  I think, rather than
> > > > making nfsd select SPLICE_SYSCALL, that it would suffice to change the
> > > > "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c)
> > > > to:
> > > > 
> > > > rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
> > > > 
> > > > Then nfsd should simply *always* fall back to its non-splice support.
> > > > 
> > > 
> > > I'd probably prefer the above, actually. We have to keep supporting
> > > non-splice enabled fs' for the forseeable future, so we may as well
> > > allow people to run nfsd in such configurations. It could even be
> > > useful for testing the non-splice-enabled codepaths.
> > 
> > Good point!
> > 
> > - Josh Triplett
> 
> I'll add this to svc_process_common. I can squash this into PATCH 3, which is
> where the syscalls can be compiled out. The log entry may however get a little
> crowded and multi-functional.
> 
> Should I keep this as a separate patch?

I'd keep it as a separate patch, yes.  It doesn't become necessary until
patch 6, so you can add it as a new patch between patches 3 and 6.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 09:38:20AM +0100, Geert Uytterhoeven wrote:
> On Mon, Nov 24, 2014 at 12:36 AM, Josh Triplett  wrote:
> > On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
> >> On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
> >> > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
> >> > > Truly removing sendfile/sendpage means that you can't even compile NFS
> >> > > into the tree.
> >> >
> >> > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
> >> > stack of "select" and "depends on", both directly and indirectly; adding
> >> > a "select SPLICE_SYSCALL" to it seems fine.  (That select does need
> >> > adding, though.  Pieter, you need to test-compile more than just
> >> > tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
> >> > off, and make sure that compiles.)
> >>
> >> Did exacly that. Took forever on my hardware, but no problems.
> >
> > Ah, I see.  Looking more closely at nfsd, it looks like it already has a
> > code path for filesystems that don't do splice.  I think, rather than
> > making nfsd select SPLICE_SYSCALL, that it would suffice to change the
> > "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c)
> > to:
> >
> > rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
> >
> > Then nfsd should simply *always* fall back to its non-splice support.
> 
> Hence I suggest adding to the nfsd help text:
> 
> While nfsd works without SPLICE_SYSCALL, you may want to enable
> SPLICE_SYSCALL for <...> (performance?) reasons.

It already seems sufficiently unlikely to enable NFSD while disabling
SPLICE_SYSCALL (in the latter case, turning on EXPERT to do so).  It
doesn't seem worth adding such a note to NFSD.  At most, I'd say that
NFSD might want a note somewhere in its documentation saying that it
takes advantage of filesystems with splice support if serving files from
one, and if running on a kernel that has splice.

> (Hmm, does Kconfig need a "suggests", cfr. Debian package dependencies?)

Perhaps, though that seems much lower priority than, for instance,
transitive "select".

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 09:38:20AM +0100, Geert Uytterhoeven wrote:
 On Mon, Nov 24, 2014 at 12:36 AM, Josh Triplett j...@joshtriplett.org wrote:
  On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
  On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
   On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
Truly removing sendfile/sendpage means that you can't even compile NFS
into the tree.
  
   If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
   stack of select and depends on, both directly and indirectly; adding
   a select SPLICE_SYSCALL to it seems fine.  (That select does need
   adding, though.  Pieter, you need to test-compile more than just
   tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
   off, and make sure that compiles.)
 
  Did exacly that. Took forever on my hardware, but no problems.
 
  Ah, I see.  Looking more closely at nfsd, it looks like it already has a
  code path for filesystems that don't do splice.  I think, rather than
  making nfsd select SPLICE_SYSCALL, that it would suffice to change the
  rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c)
  to:
 
  rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
 
  Then nfsd should simply *always* fall back to its non-splice support.
 
 Hence I suggest adding to the nfsd help text:
 
 While nfsd works without SPLICE_SYSCALL, you may want to enable
 SPLICE_SYSCALL for ... (performance?) reasons.

It already seems sufficiently unlikely to enable NFSD while disabling
SPLICE_SYSCALL (in the latter case, turning on EXPERT to do so).  It
doesn't seem worth adding such a note to NFSD.  At most, I'd say that
NFSD might want a note somewhere in its documentation saying that it
takes advantage of filesystems with splice support if serving files from
one, and if running on a kernel that has splice.

 (Hmm, does Kconfig need a suggests, cfr. Debian package dependencies?)

Perhaps, though that seems much lower priority than, for instance,
transitive select.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 11:01:38AM +0100, Pieter Smith wrote:
 On Sun, Nov 23, 2014 at 04:32:51PM -0800, Josh Triplett wrote:
  On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote:
   On Sun, 23 Nov 2014 15:36:37 -0800
   Josh Triplett j...@joshtriplett.org wrote:
   
On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
 On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
  On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
   Truly removing sendfile/sendpage means that you can't even 
   compile NFS
   into the tree.
  
  If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a 
  large
  stack of select and depends on, both directly and indirectly; 
  adding
  a select SPLICE_SYSCALL to it seems fine.  (That select does need
  adding, though.  Pieter, you need to test-compile more than just
  tinyconfig and defconfig.  Try an allyesconfig with *just* splice 
  turned
  off, and make sure that compiles.)
 
 Did exacly that. Took forever on my hardware, but no problems.

Ah, I see.  Looking more closely at nfsd, it looks like it already has a
code path for filesystems that don't do splice.  I think, rather than
making nfsd select SPLICE_SYSCALL, that it would suffice to change the
rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c)
to:

rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);

Then nfsd should simply *always* fall back to its non-splice support.

   
   I'd probably prefer the above, actually. We have to keep supporting
   non-splice enabled fs' for the forseeable future, so we may as well
   allow people to run nfsd in such configurations. It could even be
   useful for testing the non-splice-enabled codepaths.
  
  Good point!
  
  - Josh Triplett
 
 I'll add this to svc_process_common. I can squash this into PATCH 3, which is
 where the syscalls can be compiled out. The log entry may however get a little
 crowded and multi-functional.
 
 Should I keep this as a separate patch?

I'd keep it as a separate patch, yes.  It doesn't become necessary until
patch 6, so you can add it as a new patch between patches 3 and 6.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 10:49:31AM +0100, Pieter Smith wrote:
 On Sun, Nov 23, 2014 at 03:23:02PM -0800, Josh Triplett wrote:
  On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote:
   On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith pie...@boesman.nl wrote:
To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. 
This
struct is exported by fs/splice. The goal of the larger patch set is to
completely compile out fs/splice, so uses of the exported struct need 
to be
compiled out along with fs/splice.
   
This patch therefore compiles out splice support in fs/fuse when
CONFIG_SYSCALL_SPLICE is undefined.
   
Signed-off-by: Pieter Smith pie...@boesman.nl
---
 fs/fuse/dev.c  | 4 ++--
 include/linux/fs.h | 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)
   
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ca88731..f8f92a4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, 
const struct iovec *iov,
return fuse_dev_do_read(fc, file, cs, iov_length(iov, 
nr_segs));
 }
   
-static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
+static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, 
loff_t *ppos,
struct pipe_inode_info *pipe,
size_t len, unsigned int flags)
 {
@@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations 
= {
.llseek = no_llseek,
.read   = do_sync_read,
.aio_read   = fuse_dev_read,
-   .splice_read= fuse_dev_splice_read,
+   .splice_read= __splice_p(fuse_dev_splice_read),
.write  = do_sync_write,
.aio_write  = fuse_dev_write,
.splice_write   = fuse_dev_splice_write,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a957d43..04c0975 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, 
loff_t start, loff_t end,
int datasync);
 extern void block_sync_page(struct page *page);
   
+#ifdef CONFIG_SYSCALL_SPLICE
+#define __splice_p(x) x
+#else
+#define __splice_p(x) NULL
+#endif
+
   
   This needs to go into a different patch.
   One logical change per patch please. :-)
  
  Easy enough to merge this one into the patch introducing
  CONFIG_SYSCALL_SPLICE, then.
  
  - Josh Triplett
 
 The patch introducing CONFIG_SYSCALL_SPLICE (PATCH 3) only compiles out the
 syscalls. PATCH 6 on the other hand, compiles out fs/splice.c. This patch
 allows fs/fuse to be compiled when fs/splice.c is compiled out. If I am to
 squash it, it would be logical to include it in PATCH 6, not 3.

The suggestion wasn't to move the fs/fuse/dev.c bits; those should
definitely stay in this patch.  The suggestion was just to move the bit
of the patch defining __splice_p from this patch to patch 3.  (Note that
you need to define it before you use it, so it can't go in patch 6.)

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/7] fs/splice: support compiling out splice-family syscalls

2014-11-24 Thread Josh Triplett

On Tue, Nov 25, 2014 at 12:01:02AM +0100, Pieter Smith wrote:
 Many embedded systems will not need the splice-family syscalls (splice,
 vmsplice, tee and sendfile). Omitting them saves space.  This adds a new 
 EXPERT
 config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.
 
 The goal is to completely compile out fs/splice along with the syscalls. To
 achieve this, the remaining patch-set will deal with fs/splice exports. As far
 as possible, the impact on other device drivers will be minimized so as to
 reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE.
 
 The use of exported functions will be solved by transparently mocking them out
 with static inlines. Uses of the exported pipe_buf_operations struct however
 require direct modification in fs/fuse and net/core. The next two patches will
 deal with this. A macro is defined that will assist with NULL'ing out 
 callbacks
 when CONFIG_SYSCALL_SPLICE is undefined: __splice_p().

This message needs updating, since the patch series doesn't introduce or
use __splice_p anymore.

 Once all exports are solved, fs/splice can be compiled out.
 
 The bloat benefit of this patch given a tinyconfig is:
 
 add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
 function old new   delta
 splice_direct_to_actor   348 416 +68
 splice_to_pipe   371 417 +46
 splice_from_pipe_next107 106  -1
 fdput 11   - -11
 signal_pending39  26 -13
 fdget 56  42 -14
 user_page_pipe_buf_ops20   - -20
 user_page_pipe_buf_steal  25   - -25
 file_end_write58  29 -29
 file_start_write  68  34 -34
 pipe_to_user  43   - -43
 wakeup_pipe_readers   54   - -54
 do_splice_to  87   - -87
 ipipe_prep.part   92   - -92
 opipe_prep.part  119   --119
 sys_sendfile 122   --122
 sys_sendfile64   126   --126
 sys_vmsplice 137   --137
 vmsplice_to_user 205   --205
 sys_tee  491   --491
 do_sendfile  492   --492
 vmsplice_to_pipe 558   --558
 sys_splice  1020   -   -1020
 
 Signed-off-by: Pieter Smith pie...@boesman.nl
 ---
  fs/splice.c |  2 ++
  init/Kconfig| 10 ++
  kernel/sys_ni.c |  8 
  3 files changed, 20 insertions(+)
 
 diff --git a/fs/splice.c b/fs/splice.c
 index 44b201b..7c4c695 100644
 --- a/fs/splice.c
 +++ b/fs/splice.c
 @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
 struct file *out,
   return ret;
  }
  
 +#ifdef CONFIG_SYSCALL_SPLICE
  static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
  struct pipe_inode_info *opipe,
  size_t len, unsigned int flags);
 @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
 in_fd,
   return do_sendfile(out_fd, in_fd, NULL, count, 0);
  }
  #endif
 +#endif
  
 diff --git a/init/Kconfig b/init/Kconfig
 index d811d5f..dec9819 100644
 --- a/init/Kconfig
 +++ b/init/Kconfig
 @@ -1571,6 +1571,16 @@ config NTP
 system clock to an NTP server, you can disable this option to save
 space.
  
 +config SYSCALL_SPLICE
 + bool Enable splice/vmsplice/tee/sendfile syscalls if EXPERT
 + default y
 + help
 +   This option enables the splice, vmsplice, tee and sendfile syscalls. 
 These
 +   are used by applications to: move data between buffers and arbitrary 
 file
 +   descriptors; copy data between buffers; or copy data from userspace 
 into
 +   buffers. If building an embedded system where no applications use 
 these
 +   syscalls, you can disable this option to save space.
 +
  config PCI_QUIRKS
   default y
   bool Enable PCI quirk workarounds if EXPERT
 diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
 index d2f5b00..25d5551 100644
 --- a/kernel/sys_ni.c
 +++ b/kernel/sys_ni.c
 @@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
  cond_syscall(sys_stat);
  cond_syscall(sys_uname);
  cond_syscall(sys_olduname);
 +cond_syscall(sys_vmsplice);
 +cond_syscall(sys_splice);
 +cond_syscall(sys_tee);
 +cond_syscall(sys_sendfile);
 +cond_syscall(sys_sendfile64);
 +cond_syscall(compat_sys_vmsplice);
 +cond_syscall(compat_sys_sendfile);

Re: [PATCH v4 0/7] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-24 Thread Josh Triplett

On Tue, Nov 25, 2014 at 12:00:59AM +0100, Pieter Smith wrote:
 REPO: https://github.com/smipi1/linux-tinification.git
 
 BRANCH: tiny/config-syscall-splice
 
 BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort 
 (
   https://tiny.wiki.kernel.org/).
 
 GOAL: Support compiling out the splice family of syscalls (splice, vmsplice,
   tee and sendfile) along with all supporting infrastructure if not needed.
   Many embedded systems will not need the splice-family syscalls. Omitting 
 them
   saves space.
 
 HISTORY:
   PATCH v4:
 - Drops __splice_p()
 - Let nfsd fall back to non-splice support when splice is compiled out
 - Style fixes
[...]
 RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set:
 
 add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)

I replied to one patch with a minor nit in the commit message.  Other
than that, I don't see any obvious issues with this.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the tip tree

2014-11-24 Thread Josh Triplett

On Tue, Nov 25, 2014 at 07:16:45AM +0100, Ingo Molnar wrote:
 * Stephen Rothwell s...@canb.auug.org.au wrote:
  Hi Josh,
  
  Today's linux-next merge of the tiny tree got a conflict in
  kernel/time/Makefile between commit fd866e2b116b (time: Rename
  udelay_test.c to test_udelay.c) from the tip tree and commit
  d1f6d68d03ea (kernel: time: Compile out NTP support) from the tiny
  tree.
 
 So I think a timer subsystem commit d1f6d68d03ea with this 
 magnitude of linecount increase:
 
  Signed-off-by: Catalina Mocanu catalina.moc...@gmail.com
  [josh: Handle CONFIG_COMPAT=y.]
  Reviewed-by: Josh Triplett j...@joshtriplett.org
  Signed-off-by: Josh Triplett j...@joshtriplett.org
  ---
   drivers/pps/Kconfig|  2 +-
   include/linux/timex.h  | 15 +--
   init/Kconfig   | 10 ++
   kernel/compat.c|  8 ++--
   kernel/sys_ni.c|  4 
   kernel/time/Makefile   |  3 ++-
   kernel/time/ntp_internal.h | 28 
   kernel/time/posix-timers.c |  2 ++
   kernel/time/time.c |  2 ++
   kernel/time/timekeeping.c  |  2 ++
   10 files changed, 70 insertions(+), 6 deletions(-)
 
 at minimum needs the ack of timer folks, before it can be 
 committed to Git. Or is the tiny tree plan to submit all
 patches to the appropriate subsystem or gather acks, before 
 sending it upstream?

Yes, absolutely.  I planned to send out a tinification patch review
series later this week with all 10 current patches (both those reviewed
on LKML and those only reviewed elsewhere).

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the tip tree

2014-11-24 Thread Josh Triplett

On Mon, Nov 24, 2014 at 10:30:10PM -0800, John Stultz wrote:
 On Mon, Nov 24, 2014 at 10:16 PM, Ingo Molnar mi...@kernel.org wrote:
 
  * Stephen Rothwell s...@canb.auug.org.au wrote:
 
  Hi Josh,
 
  Today's linux-next merge of the tiny tree got a conflict in
  kernel/time/Makefile between commit fd866e2b116b (time: Rename
  udelay_test.c to test_udelay.c) from the tip tree and commit
  d1f6d68d03ea (kernel: time: Compile out NTP support) from the tiny
  tree.
 
  So I think a timer subsystem commit d1f6d68d03ea with this
  magnitude of linecount increase:
 
   Signed-off-by: Catalina Mocanu catalina.moc...@gmail.com
   [josh: Handle CONFIG_COMPAT=y.]
   Reviewed-by: Josh Triplett j...@joshtriplett.org
   Signed-off-by: Josh Triplett j...@joshtriplett.org
   ---
drivers/pps/Kconfig|  2 +-
include/linux/timex.h  | 15 +--
init/Kconfig   | 10 ++
kernel/compat.c|  8 ++--
kernel/sys_ni.c|  4 
kernel/time/Makefile   |  3 ++-
kernel/time/ntp_internal.h | 28 
kernel/time/posix-timers.c |  2 ++
kernel/time/time.c |  2 ++
kernel/time/timekeeping.c  |  2 ++
10 files changed, 70 insertions(+), 6 deletions(-)
 
  at minimum needs the ack of timer folks, before it can be
  committed to Git. Or is the tiny tree plan to submit all
  patches to the appropriate subsystem or gather acks, before
  sending it upstream?
 
 Yea.  From first glance d1f6d68d03ea looks fairly broken.
 
 Returning 0 for ntp_tick_length() (which should be the current tick
 length in NTP_SCALE_SHIFT shifted ns), seems like it would cause major
 timekeeping problems.

Ouch, yeah; I'm impressed the kernel successfully booted that way (which
I did test).

Computing the tick_length to return seems to require a div_u64; is it
safe to initialize a static const with the result of calling div_u64, or
does the intializer need manual constant-folding to make the expression
compile-time computable?

Going by the logic in ntp_update_frequency, it looks like the stub
ntp_tick_length needs to return:

tick_length_base = 0;
tick_usec = TICK_USEC;
second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)  NTP_SCALE_SHIFT;
new_base = div_u64(second_length, NTP_INTERVAL_FREQ);
tick_length += new_base - tick_length_base;

(tick_length starts out 0, gets new_base - 0 added initially, and every
subsequent time gets 0 added since tick_length_base won't change.)

Substituting and simplifying:

tick_length = new_base
= div_u64(second_length, NTP_INTERVAL_FREQ)
= div_u64((TICK_USEC * NSEC_PER_USEC * USER_HZ)  NTP_SCALE_SHIFT, 
NTP_INTERVAL_FREQ)

The numerator there could potentially be simplified, but I don't see an
obvious way around the division by NTP_INTERVAL_FREQ (defined as HZ).

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote:
> On Sun, 23 Nov 2014 15:36:37 -0800
> Josh Triplett  wrote:
> 
> > On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
> > > On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
> > > > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
> > > > > Truly removing sendfile/sendpage means that you can't even compile NFS
> > > > > into the tree.
> > > > 
> > > > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
> > > > stack of "select" and "depends on", both directly and indirectly; adding
> > > > a "select SPLICE_SYSCALL" to it seems fine.  (That select does need
> > > > adding, though.  Pieter, you need to test-compile more than just
> > > > tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
> > > > off, and make sure that compiles.)
> > > 
> > > Did exacly that. Took forever on my hardware, but no problems.
> > 
> > Ah, I see.  Looking more closely at nfsd, it looks like it already has a
> > code path for filesystems that don't do splice.  I think, rather than
> > making nfsd select SPLICE_SYSCALL, that it would suffice to change the
> > "rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c)
> > to:
> > 
> > rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
> > 
> > Then nfsd should simply *always* fall back to its non-splice support.
> > 
> 
> I'd probably prefer the above, actually. We have to keep supporting
> non-splice enabled fs' for the forseeable future, so we may as well
> allow people to run nfsd in such configurations. It could even be
> useful for testing the non-splice-enabled codepaths.

Good point!

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
> On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
> > On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
> > > Truly removing sendfile/sendpage means that you can't even compile NFS
> > > into the tree.
> > 
> > If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
> > stack of "select" and "depends on", both directly and indirectly; adding
> > a "select SPLICE_SYSCALL" to it seems fine.  (That select does need
> > adding, though.  Pieter, you need to test-compile more than just
> > tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
> > off, and make sure that compiles.)
> 
> Did exacly that. Took forever on my hardware, but no problems.

Ah, I see.  Looking more closely at nfsd, it looks like it already has a
code path for filesystems that don't do splice.  I think, rather than
making nfsd select SPLICE_SYSCALL, that it would suffice to change the
"rqstp->rq_splice_ok = true;" in svc_process_common (net/sunrpc/svc.c)
to:

rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);

Then nfsd should simply *always* fall back to its non-splice support.

That said, given that it seems exceedingly unlikely that anyone would
use the in-kernel nfsd on a system trying to minimize kernel size, it
still seems cleaner to just "select SPLICE_SYSCALL" from NFSD in
Kconfig.  That avoids making any changes at all to the nfsd source in
this patch series.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote:
> On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith  wrote:
> > To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This
> > struct is exported by fs/splice. The goal of the larger patch set is to
> > completely compile out fs/splice, so uses of the exported struct need to be
> > compiled out along with fs/splice.
> >
> > This patch therefore compiles out splice support in fs/fuse when
> > CONFIG_SYSCALL_SPLICE is undefined.
> >
> > Signed-off-by: Pieter Smith 
> > ---
> >  fs/fuse/dev.c  | 4 ++--
> >  include/linux/fs.h | 6 ++
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index ca88731..f8f92a4 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, 
> > const struct iovec *iov,
> > return fuse_dev_do_read(fc, file, , iov_length(iov, nr_segs));
> >  }
> >
> > -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
> > +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t 
> > *ppos,
> > struct pipe_inode_info *pipe,
> > size_t len, unsigned int flags)
> >  {
> > @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = {
> > .llseek = no_llseek,
> > .read   = do_sync_read,
> > .aio_read   = fuse_dev_read,
> > -   .splice_read= fuse_dev_splice_read,
> > +   .splice_read= __splice_p(fuse_dev_splice_read),
> > .write  = do_sync_write,
> > .aio_write  = fuse_dev_write,
> > .splice_write   = fuse_dev_splice_write,
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a957d43..04c0975 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t 
> > start, loff_t end,
> > int datasync);
> >  extern void block_sync_page(struct page *page);
> >
> > +#ifdef CONFIG_SYSCALL_SPLICE
> > +#define __splice_p(x) x
> > +#else
> > +#define __splice_p(x) NULL
> > +#endif
> > +
> 
> This needs to go into a different patch.
> One logical change per patch please. :-)

Easy enough to merge this one into the patch introducing
CONFIG_SYSCALL_SPLICE, then.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
> Truly removing sendfile/sendpage means that you can't even compile NFS
> into the tree.

If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
stack of "select" and "depends on", both directly and indirectly; adding
a "select SPLICE_SYSCALL" to it seems fine.  (That select does need
adding, though.  Pieter, you need to test-compile more than just
tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
off, and make sure that compiles.)

Given the requirements of running a file server in the kernel, I'd
expect CONFIG_NFSD to end up with several more selects of optional
functionality in the future.  It seems rather likely that the average
embedded system will be compiling out NFS. :)

Also, this patch series compiles out splice and sendfile, including
several *users* of sendpage; it doesn't compile out the sendpage
support/infrastructure itself.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
 Truly removing sendfile/sendpage means that you can't even compile NFS
 into the tree.

If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
stack of select and depends on, both directly and indirectly; adding
a select SPLICE_SYSCALL to it seems fine.  (That select does need
adding, though.  Pieter, you need to test-compile more than just
tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
off, and make sure that compiles.)

Given the requirements of running a file server in the kernel, I'd
expect CONFIG_NFSD to end up with several more selects of optional
functionality in the future.  It seems rather likely that the average
embedded system will be compiling out NFS. :)

Also, this patch series compiles out splice and sendfile, including
several *users* of sendpage; it doesn't compile out the sendpage
support/infrastructure itself.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [fuse-devel] [PATCH 4/6] fs/fuse: support compiling out splice

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 11:29:08PM +0100, Richard Weinberger wrote:
 On Sun, Nov 23, 2014 at 3:20 PM, Pieter Smith pie...@boesman.nl wrote:
  To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This
  struct is exported by fs/splice. The goal of the larger patch set is to
  completely compile out fs/splice, so uses of the exported struct need to be
  compiled out along with fs/splice.
 
  This patch therefore compiles out splice support in fs/fuse when
  CONFIG_SYSCALL_SPLICE is undefined.
 
  Signed-off-by: Pieter Smith pie...@boesman.nl
  ---
   fs/fuse/dev.c  | 4 ++--
   include/linux/fs.h | 6 ++
   2 files changed, 8 insertions(+), 2 deletions(-)
 
  diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
  index ca88731..f8f92a4 100644
  --- a/fs/fuse/dev.c
  +++ b/fs/fuse/dev.c
  @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, 
  const struct iovec *iov,
  return fuse_dev_do_read(fc, file, cs, iov_length(iov, nr_segs));
   }
 
  -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
  +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t 
  *ppos,
  struct pipe_inode_info *pipe,
  size_t len, unsigned int flags)
   {
  @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = {
  .llseek = no_llseek,
  .read   = do_sync_read,
  .aio_read   = fuse_dev_read,
  -   .splice_read= fuse_dev_splice_read,
  +   .splice_read= __splice_p(fuse_dev_splice_read),
  .write  = do_sync_write,
  .aio_write  = fuse_dev_write,
  .splice_write   = fuse_dev_splice_write,
  diff --git a/include/linux/fs.h b/include/linux/fs.h
  index a957d43..04c0975 100644
  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t 
  start, loff_t end,
  int datasync);
   extern void block_sync_page(struct page *page);
 
  +#ifdef CONFIG_SYSCALL_SPLICE
  +#define __splice_p(x) x
  +#else
  +#define __splice_p(x) NULL
  +#endif
  +
 
 This needs to go into a different patch.
 One logical change per patch please. :-)

Easy enough to merge this one into the patch introducing
CONFIG_SYSCALL_SPLICE, then.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
 On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
  On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
   Truly removing sendfile/sendpage means that you can't even compile NFS
   into the tree.
  
  If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
  stack of select and depends on, both directly and indirectly; adding
  a select SPLICE_SYSCALL to it seems fine.  (That select does need
  adding, though.  Pieter, you need to test-compile more than just
  tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
  off, and make sure that compiles.)
 
 Did exacly that. Took forever on my hardware, but no problems.

Ah, I see.  Looking more closely at nfsd, it looks like it already has a
code path for filesystems that don't do splice.  I think, rather than
making nfsd select SPLICE_SYSCALL, that it would suffice to change the
rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c)
to:

rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);

Then nfsd should simply *always* fall back to its non-splice support.

That said, given that it seems exceedingly unlikely that anyone would
use the in-kernel nfsd on a system trying to minimize kernel size, it
still seems cleaner to just select SPLICE_SYSCALL from NFSD in
Kconfig.  That avoids making any changes at all to the nfsd source in
this patch series.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

2014-11-23 Thread Josh Triplett

On Sun, Nov 23, 2014 at 07:28:10PM -0500, Jeff Layton wrote:
 On Sun, 23 Nov 2014 15:36:37 -0800
 Josh Triplett j...@joshtriplett.org wrote:
 
  On Sun, Nov 23, 2014 at 09:30:40PM +0100, Pieter Smith wrote:
   On Sun, Nov 23, 2014 at 11:43:26AM -0800, Josh Triplett wrote:
On Sun, Nov 23, 2014 at 01:46:23PM -0500, David Miller wrote:
 Truly removing sendfile/sendpage means that you can't even compile NFS
 into the tree.

If you mean the in-kernel nfsd (CONFIG_NFSD), that already has a large
stack of select and depends on, both directly and indirectly; adding
a select SPLICE_SYSCALL to it seems fine.  (That select does need
adding, though.  Pieter, you need to test-compile more than just
tinyconfig and defconfig.  Try an allyesconfig with *just* splice turned
off, and make sure that compiles.)
   
   Did exacly that. Took forever on my hardware, but no problems.
  
  Ah, I see.  Looking more closely at nfsd, it looks like it already has a
  code path for filesystems that don't do splice.  I think, rather than
  making nfsd select SPLICE_SYSCALL, that it would suffice to change the
  rqstp-rq_splice_ok = true; in svc_process_common (net/sunrpc/svc.c)
  to:
  
  rqstp-rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
  
  Then nfsd should simply *always* fall back to its non-splice support.
  
 
 I'd probably prefer the above, actually. We have to keep supporting
 non-splice enabled fs' for the forseeable future, so we may as well
 allow people to run nfsd in such configurations. It could even be
 useful for testing the non-splice-enabled codepaths.

Good point!

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] net/core: support compiling out splice

2014-11-22 Thread Josh Triplett

[Please don't top-post.]

On Sat, Nov 22, 2014 at 11:50:51PM +0100, Pieter Smith wrote:
> splice exports a structure that is used by skbuf. Mocking out a function is
> straightforward. To my knowledge there is no elegant way of mocking out a
> splice_operations struct. I directly modified the code to prevent linking
> against the struct. Do you know of a better technique to get the same
> result?

No, I don't.  The approach you took seems fine; I'm just saying that you
need to explain the need for it in the commit message.

- JosH Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] fs/splice: full support for compiling out splice

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 10:00:01PM +0100, Pieter Smith wrote:
> Entirely compile out splice translation unit when the system is configured
> without splice family of syscalls (i.e. CONFIG_SYSCALL_SPLICE is undefined).
> 
> add/remove: 0/25 grow/shrink: 0/5 up/down: 0/-4845 (-4845)

Very nice!

- Josh Triplett

> function old new   delta
> pipe_to_null   4   -  -4
> generic_pipe_buf_nosteal   6   -  -6
> spd_release_page  10   - -10
> PageUptodate  22  11 -11
> lock_page 36  24 -12
> page_cache_pipe_buf_release   16   - -16
> splice_write_null 24   4 -20
> page_cache_pipe_buf_ops   20   - -20
> nosteal_pipe_buf_ops  20   - -20
> default_pipe_buf_ops  20   - -20
> generic_splice_sendpage   24   - -24
> splice_shrink_spd 27   - -27
> direct_splice_actor   47   - -47
> default_file_splice_write 49   - -49
> wakeup_pipe_writers   54   - -54
> write_pipe_buf71   - -71
> page_cache_pipe_buf_confirm   80   - -80
> splice_grow_spd   87   - -87
> splice_from_pipe  93   - -93
> splice_from_pipe_next106   --106
> pipe_to_sendpage 109   --109
> page_cache_pipe_buf_steal114   --114
> generic_file_splice_read 131   8-123
> do_splice_direct 148   --148
> __splice_from_pipe   246   --246
> splice_direct_to_actor   416   --416
> splice_to_pipe   417   --417
> default_file_splice_read 688   --688
> iter_file_splice_write   702   4-698
> __generic_file_splice_read  1109   -   -1109
> 
> Signed-off-by: Pieter Smith 
> ---
>  fs/Makefile|  3 ++-
>  fs/splice.c|  2 --
>  include/linux/fs.h | 26 ++
>  include/linux/splice.h | 43 +++
>  4 files changed, 71 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/Makefile b/fs/Makefile
> index fb7646e..9395622 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -10,7 +10,7 @@ obj-y :=open.o read_write.o file_table.o super.o \
>   ioctl.o readdir.o select.o dcache.o inode.o \
>   attr.o bad_inode.o file.o filesystems.o namespace.o \
>   seq_file.o xattr.o libfs.o fs-writeback.o \
> - pnode.o splice.o sync.o utimes.o \
> + pnode.o sync.o utimes.o \
>   stack.o fs_struct.o statfs.o fs_pin.o
>  
>  ifeq ($(CONFIG_BLOCK),y)
> @@ -22,6 +22,7 @@ endif
>  obj-$(CONFIG_PROC_FS) += proc_namespace.o
>  
>  obj-$(CONFIG_FSNOTIFY)   += notify/
> +obj-$(CONFIG_SYSCALL_SPLICE) += splice.o
>  obj-$(CONFIG_EPOLL)  += eventpoll.o
>  obj-$(CONFIG_ANON_INODES)+= anon_inodes.o
>  obj-$(CONFIG_SIGNALFD)   += signalfd.o
> diff --git a/fs/splice.c b/fs/splice.c
> index 7c4c695..44b201b 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1316,7 +1316,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
> struct file *out,
>   return ret;
>  }
>  
> -#ifdef CONFIG_SYSCALL_SPLICE
>  static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
>  struct pipe_inode_info *opipe,
>  size_t len, unsigned int flags);
> @@ -2201,5 +2200,4 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
> in_fd,
>   return do_sendfile(out_fd, in_fd, NULL, count, 0);
>  }
>  #endif
> -#endif
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 04c0975..9b3054e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2449,6 +2449,7 @@ extern void block_sync_page(struct page *page);
>  #define __splice_p(x) NULL
>  #endif
>  
> +#ifdef CONFIG_SYSCALL_SPLICE
>  /* fs/splice.c */
>  extern ssize_t generic_file_splice_read(struct file *, loff_t *,
>   struct pipe_inode_info *, size_t, unsigned int);
> @@ -2458,6 +2459,31 @@ extern ssize_t iter

Re: [PATCH 3/6] fs/splice: support compiling out splice-family syscalls

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 09:59:58PM +0100, Pieter Smith wrote:
> Many embedded systems will not need the splice-family syscalls (splice,
> vmsplice, tee and sendfile). Omitting them saves space.  This adds a new 
> EXPERT
> config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.
> 
> This patch removes almost all callers of .splice_read() and .splice_write()
> in the file_operations struct. This paves the way to eventually compile out 
> the
> .splice_read and .splice_write members of the file_operations struct as well 
> as
> the remaining splice-related infrastructure.

This commit message doesn't reflect the new approach of leaving those
members in the structure.

The patch looks good otherwise.

- Josh Triplett

> add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
> function old new   delta
> splice_direct_to_actor   348 416 +68
> splice_to_pipe   371 417 +46
> splice_from_pipe_next107 106  -1
> fdput 11   - -11
> signal_pending39  26 -13
> fdget 56  42 -14
> user_page_pipe_buf_ops20   - -20
> user_page_pipe_buf_steal  25   - -25
> file_end_write58  29 -29
> file_start_write  68  34 -34
> pipe_to_user  43   - -43
> wakeup_pipe_readers   54   - -54
> do_splice_to  87   - -87
> ipipe_prep.part   92   - -92
> opipe_prep.part  119   --119
> sys_sendfile 122   --122
> sys_sendfile64   126   --126
> sys_vmsplice 137   --137
> vmsplice_to_user 205   --205
> sys_tee  491   --491
> do_sendfile  492   --492
> vmsplice_to_pipe 558   --558
> sys_splice  1020   -   -1020
> 
> Signed-off-by: Pieter Smith 
> ---
>  fs/splice.c |  2 ++
>  init/Kconfig| 10 ++
>  kernel/sys_ni.c |  8 
>  3 files changed, 20 insertions(+)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 44b201b..7c4c695 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
> struct file *out,
>   return ret;
>  }
>  
> +#ifdef CONFIG_SYSCALL_SPLICE
>  static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
>  struct pipe_inode_info *opipe,
>  size_t len, unsigned int flags);
> @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
> in_fd,
>   return do_sendfile(out_fd, in_fd, NULL, count, 0);
>  }
>  #endif
> +#endif
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index d811d5f..dec9819 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1571,6 +1571,16 @@ config NTP
> system clock to an NTP server, you can disable this option to save
> space.
>  
> +config SYSCALL_SPLICE
> + bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT
> + default y
> + help
> +   This option enables the splice, vmsplice, tee and sendfile syscalls. 
> These
> +   are used by applications to: move data between buffers and arbitrary 
> file
> +   descriptors; "copy" data between buffers; or copy data from userspace 
> into
> +   buffers. If building an embedded system where no applications use 
> these
> +   syscalls, you can disable this option to save space.
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index d2f5b00..25d5551 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
>  cond_syscall(sys_stat);
>  cond_syscall(sys_uname);
>  cond_syscall(sys_olduname);
> +cond_syscall(sys_vmsplice);
> +cond_syscall(sys_splice);
> +cond_syscall(sys_tee);
> +cond_syscall(sys_sendfile);
> +cond_syscall(sys_sendfile64);
> +cond_syscall(compat_sys_vmsplice);
> +cond_syscall(compat_sys_sendfile);
> +cond_syscall(compat_sys_sendfile64);
>  
>  /* arch-specific weak syscall entries */
>  cond_syscall(sys_pciconfig_read);
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] net/core: support compiling out splice

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 10:00:00PM +0100, Pieter Smith wrote:
> Compile out splice support from networking core when the splice-family of
> syscalls is not supported by the system (i.e. CONFIG_SYSCALL_SPLICE is
> undefined).

Please explain in the commit message why this particular bit of splice
support needs compiling out when most other bits do not.

Also, a couple of style comments below.

> Signed-off-by: Pieter Smith 
> ---
>  include/linux/skbuff.h | 9 +
>  net/core/skbuff.c  | 9 ++---
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index a59d934..8c28524 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2640,9 +2640,18 @@ int skb_copy_bits(const struct sk_buff *skb, int 
> offset, void *to, int len);
>  int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int 
> len);
>  __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, u8 *to,
> int len, __wsum csum);
> +#ifdef CONFIG_SYSCALL_SPLICE
>  int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
>   struct pipe_inode_info *pipe, unsigned int len,
>   unsigned int flags);
> +#else /* #ifdef CONFIG_SYSCALL_SPLICE */
> +static inline int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
> + struct pipe_inode_info *pipe, unsigned int len,
> + unsigned int flags)
> +{
> + return -EPERM;
> +}
> +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */

These comments, and the one added below in the corresponding .c file,
don't match the prevailing style.  These are so short (fitting on one
screen) that they hardly seem worth the comments at all, but if you want
to include them, s/#ifdef// in the comment.

>  void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
>  unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
>  int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 61059a0..74fad8a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1781,9 +1781,9 @@ static bool __splice_segment(struct page *page, 
> unsigned int poff,
>   * Map linear and fragment data from the skb to spd. It reports true if the
>   * pipe is full or if we already spliced the requested length.
>   */
> -static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info 
> *pipe,
> -   unsigned int *offset, unsigned int *len,
> -   struct splice_pipe_desc *spd, struct sock *sk)
> +static bool __maybe_unused __skb_splice_bits(struct sk_buff *skb, struct 
> pipe_inode_info *pipe,
> +  unsigned int *offset, unsigned int 
> *len,
> +  struct splice_pipe_desc *spd, 
> struct sock *sk)
>  {
>   int seg;
>  
> @@ -1821,6 +1821,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, 
> struct pipe_inode_info *pipe,
>   * the frag list, if such a thing exists. We'd probably need to recurse to
>   * handle that cleanly.
>   */
> +#ifdef CONFIG_SYSCALL_SPLICE
>  int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
>   struct pipe_inode_info *pipe, unsigned int tlen,
>   unsigned int flags)
> @@ -1876,6 +1877,8 @@ done:
>  
>   return ret;
>  }
> +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */
> +

This extra blank line shouldn't be here.

>  /**
>   *   skb_store_bits - store bits from kernel buffer to skb
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] fs/fuse: support compiling out splice

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 09:59:59PM +0100, Pieter Smith wrote:
> Compile out splice support from fuse when the splice-family of syscalls is not
> supported by the system (i.e. CONFIG_SYSCALL_SPLICE is undefined).

This patch needs some additional explanation.  Could you explain in the
commit message why this particular splice_read function (and only the
splice_read function, not the splice_write function) needs compiling
out, when others do not?

- Josh Triplett

> Signed-off-by: Pieter Smith 
> ---
>  fs/fuse/dev.c  | 4 ++--
>  include/linux/fs.h | 6 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index ca88731..f8f92a4 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const 
> struct iovec *iov,
>   return fuse_dev_do_read(fc, file, , iov_length(iov, nr_segs));
>  }
>  
> -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
> +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t 
> *ppos,
>   struct pipe_inode_info *pipe,
>   size_t len, unsigned int flags)
>  {
> @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = {
>   .llseek = no_llseek,
>   .read   = do_sync_read,
>   .aio_read   = fuse_dev_read,
> - .splice_read= fuse_dev_splice_read,
> + .splice_read= __splice_p(fuse_dev_splice_read),
>   .write  = do_sync_write,
>   .aio_write  = fuse_dev_write,
>   .splice_write   = fuse_dev_splice_write,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a957d43..04c0975 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t 
> start, loff_t end,
>   int datasync);
>  extern void block_sync_page(struct page *page);
>  
> +#ifdef CONFIG_SYSCALL_SPLICE
> +#define __splice_p(x) x
> +#else
> +#define __splice_p(x) NULL
> +#endif
> +
>  /* fs/splice.c */
>  extern ssize_t generic_file_splice_read(struct file *, loff_t *,
>   struct pipe_inode_info *, size_t, unsigned int);
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] fs/fuse: support compiling out splice

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 09:59:59PM +0100, Pieter Smith wrote:
 Compile out splice support from fuse when the splice-family of syscalls is not
 supported by the system (i.e. CONFIG_SYSCALL_SPLICE is undefined).

This patch needs some additional explanation.  Could you explain in the
commit message why this particular splice_read function (and only the
splice_read function, not the splice_write function) needs compiling
out, when others do not?

- Josh Triplett

 Signed-off-by: Pieter Smith pie...@boesman.nl
 ---
  fs/fuse/dev.c  | 4 ++--
  include/linux/fs.h | 6 ++
  2 files changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
 index ca88731..f8f92a4 100644
 --- a/fs/fuse/dev.c
 +++ b/fs/fuse/dev.c
 @@ -1291,7 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const 
 struct iovec *iov,
   return fuse_dev_do_read(fc, file, cs, iov_length(iov, nr_segs));
  }
  
 -static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
 +static ssize_t __maybe_unused fuse_dev_splice_read(struct file *in, loff_t 
 *ppos,
   struct pipe_inode_info *pipe,
   size_t len, unsigned int flags)
  {
 @@ -2144,7 +2144,7 @@ const struct file_operations fuse_dev_operations = {
   .llseek = no_llseek,
   .read   = do_sync_read,
   .aio_read   = fuse_dev_read,
 - .splice_read= fuse_dev_splice_read,
 + .splice_read= __splice_p(fuse_dev_splice_read),
   .write  = do_sync_write,
   .aio_write  = fuse_dev_write,
   .splice_write   = fuse_dev_splice_write,
 diff --git a/include/linux/fs.h b/include/linux/fs.h
 index a957d43..04c0975 100644
 --- a/include/linux/fs.h
 +++ b/include/linux/fs.h
 @@ -2443,6 +2443,12 @@ extern int blkdev_fsync(struct file *filp, loff_t 
 start, loff_t end,
   int datasync);
  extern void block_sync_page(struct page *page);
  
 +#ifdef CONFIG_SYSCALL_SPLICE
 +#define __splice_p(x) x
 +#else
 +#define __splice_p(x) NULL
 +#endif
 +
  /* fs/splice.c */
  extern ssize_t generic_file_splice_read(struct file *, loff_t *,
   struct pipe_inode_info *, size_t, unsigned int);
 -- 
 1.9.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] net/core: support compiling out splice

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 10:00:00PM +0100, Pieter Smith wrote:
 Compile out splice support from networking core when the splice-family of
 syscalls is not supported by the system (i.e. CONFIG_SYSCALL_SPLICE is
 undefined).

Please explain in the commit message why this particular bit of splice
support needs compiling out when most other bits do not.

Also, a couple of style comments below.

 Signed-off-by: Pieter Smith pie...@boesman.nl
 ---
  include/linux/skbuff.h | 9 +
  net/core/skbuff.c  | 9 ++---
  2 files changed, 15 insertions(+), 3 deletions(-)
 
 diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
 index a59d934..8c28524 100644
 --- a/include/linux/skbuff.h
 +++ b/include/linux/skbuff.h
 @@ -2640,9 +2640,18 @@ int skb_copy_bits(const struct sk_buff *skb, int 
 offset, void *to, int len);
  int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int 
 len);
  __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, u8 *to,
 int len, __wsum csum);
 +#ifdef CONFIG_SYSCALL_SPLICE
  int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
   struct pipe_inode_info *pipe, unsigned int len,
   unsigned int flags);
 +#else /* #ifdef CONFIG_SYSCALL_SPLICE */
 +static inline int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 + struct pipe_inode_info *pipe, unsigned int len,
 + unsigned int flags)
 +{
 + return -EPERM;
 +}
 +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */

These comments, and the one added below in the corresponding .c file,
don't match the prevailing style.  These are so short (fitting on one
screen) that they hardly seem worth the comments at all, but if you want
to include them, s/#ifdef// in the comment.

  void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
  unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
  int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
 diff --git a/net/core/skbuff.c b/net/core/skbuff.c
 index 61059a0..74fad8a 100644
 --- a/net/core/skbuff.c
 +++ b/net/core/skbuff.c
 @@ -1781,9 +1781,9 @@ static bool __splice_segment(struct page *page, 
 unsigned int poff,
   * Map linear and fragment data from the skb to spd. It reports true if the
   * pipe is full or if we already spliced the requested length.
   */
 -static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info 
 *pipe,
 -   unsigned int *offset, unsigned int *len,
 -   struct splice_pipe_desc *spd, struct sock *sk)
 +static bool __maybe_unused __skb_splice_bits(struct sk_buff *skb, struct 
 pipe_inode_info *pipe,
 +  unsigned int *offset, unsigned int 
 *len,
 +  struct splice_pipe_desc *spd, 
 struct sock *sk)
  {
   int seg;
  
 @@ -1821,6 +1821,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, 
 struct pipe_inode_info *pipe,
   * the frag list, if such a thing exists. We'd probably need to recurse to
   * handle that cleanly.
   */
 +#ifdef CONFIG_SYSCALL_SPLICE
  int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
   struct pipe_inode_info *pipe, unsigned int tlen,
   unsigned int flags)
 @@ -1876,6 +1877,8 @@ done:
  
   return ret;
  }
 +#endif /* #ifdef CONFIG_SYSCALL_SPLICE */
 +

This extra blank line shouldn't be here.

  /**
   *   skb_store_bits - store bits from kernel buffer to skb
 -- 
 1.9.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/6] fs/splice: support compiling out splice-family syscalls

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 09:59:58PM +0100, Pieter Smith wrote:
 Many embedded systems will not need the splice-family syscalls (splice,
 vmsplice, tee and sendfile). Omitting them saves space.  This adds a new 
 EXPERT
 config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.
 
 This patch removes almost all callers of .splice_read() and .splice_write()
 in the file_operations struct. This paves the way to eventually compile out 
 the
 .splice_read and .splice_write members of the file_operations struct as well 
 as
 the remaining splice-related infrastructure.

This commit message doesn't reflect the new approach of leaving those
members in the structure.

The patch looks good otherwise.

- Josh Triplett

 add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
 function old new   delta
 splice_direct_to_actor   348 416 +68
 splice_to_pipe   371 417 +46
 splice_from_pipe_next107 106  -1
 fdput 11   - -11
 signal_pending39  26 -13
 fdget 56  42 -14
 user_page_pipe_buf_ops20   - -20
 user_page_pipe_buf_steal  25   - -25
 file_end_write58  29 -29
 file_start_write  68  34 -34
 pipe_to_user  43   - -43
 wakeup_pipe_readers   54   - -54
 do_splice_to  87   - -87
 ipipe_prep.part   92   - -92
 opipe_prep.part  119   --119
 sys_sendfile 122   --122
 sys_sendfile64   126   --126
 sys_vmsplice 137   --137
 vmsplice_to_user 205   --205
 sys_tee  491   --491
 do_sendfile  492   --492
 vmsplice_to_pipe 558   --558
 sys_splice  1020   -   -1020
 
 Signed-off-by: Pieter Smith pie...@boesman.nl
 ---
  fs/splice.c |  2 ++
  init/Kconfig| 10 ++
  kernel/sys_ni.c |  8 
  3 files changed, 20 insertions(+)
 
 diff --git a/fs/splice.c b/fs/splice.c
 index 44b201b..7c4c695 100644
 --- a/fs/splice.c
 +++ b/fs/splice.c
 @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
 struct file *out,
   return ret;
  }
  
 +#ifdef CONFIG_SYSCALL_SPLICE
  static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
  struct pipe_inode_info *opipe,
  size_t len, unsigned int flags);
 @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
 in_fd,
   return do_sendfile(out_fd, in_fd, NULL, count, 0);
  }
  #endif
 +#endif
  
 diff --git a/init/Kconfig b/init/Kconfig
 index d811d5f..dec9819 100644
 --- a/init/Kconfig
 +++ b/init/Kconfig
 @@ -1571,6 +1571,16 @@ config NTP
 system clock to an NTP server, you can disable this option to save
 space.
  
 +config SYSCALL_SPLICE
 + bool Enable splice/vmsplice/tee/sendfile syscalls if EXPERT
 + default y
 + help
 +   This option enables the splice, vmsplice, tee and sendfile syscalls. 
 These
 +   are used by applications to: move data between buffers and arbitrary 
 file
 +   descriptors; copy data between buffers; or copy data from userspace 
 into
 +   buffers. If building an embedded system where no applications use 
 these
 +   syscalls, you can disable this option to save space.
 +
  config PCI_QUIRKS
   default y
   bool Enable PCI quirk workarounds if EXPERT
 diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
 index d2f5b00..25d5551 100644
 --- a/kernel/sys_ni.c
 +++ b/kernel/sys_ni.c
 @@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
  cond_syscall(sys_stat);
  cond_syscall(sys_uname);
  cond_syscall(sys_olduname);
 +cond_syscall(sys_vmsplice);
 +cond_syscall(sys_splice);
 +cond_syscall(sys_tee);
 +cond_syscall(sys_sendfile);
 +cond_syscall(sys_sendfile64);
 +cond_syscall(compat_sys_vmsplice);
 +cond_syscall(compat_sys_sendfile);
 +cond_syscall(compat_sys_sendfile64);
  
  /* arch-specific weak syscall entries */
  cond_syscall(sys_pciconfig_read);
 -- 
 1.9.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] fs/splice: full support for compiling out splice

2014-11-22 Thread Josh Triplett

On Sat, Nov 22, 2014 at 10:00:01PM +0100, Pieter Smith wrote:
 Entirely compile out splice translation unit when the system is configured
 without splice family of syscalls (i.e. CONFIG_SYSCALL_SPLICE is undefined).
 
 add/remove: 0/25 grow/shrink: 0/5 up/down: 0/-4845 (-4845)

Very nice!

- Josh Triplett

 function old new   delta
 pipe_to_null   4   -  -4
 generic_pipe_buf_nosteal   6   -  -6
 spd_release_page  10   - -10
 PageUptodate  22  11 -11
 lock_page 36  24 -12
 page_cache_pipe_buf_release   16   - -16
 splice_write_null 24   4 -20
 page_cache_pipe_buf_ops   20   - -20
 nosteal_pipe_buf_ops  20   - -20
 default_pipe_buf_ops  20   - -20
 generic_splice_sendpage   24   - -24
 splice_shrink_spd 27   - -27
 direct_splice_actor   47   - -47
 default_file_splice_write 49   - -49
 wakeup_pipe_writers   54   - -54
 write_pipe_buf71   - -71
 page_cache_pipe_buf_confirm   80   - -80
 splice_grow_spd   87   - -87
 splice_from_pipe  93   - -93
 splice_from_pipe_next106   --106
 pipe_to_sendpage 109   --109
 page_cache_pipe_buf_steal114   --114
 generic_file_splice_read 131   8-123
 do_splice_direct 148   --148
 __splice_from_pipe   246   --246
 splice_direct_to_actor   416   --416
 splice_to_pipe   417   --417
 default_file_splice_read 688   --688
 iter_file_splice_write   702   4-698
 __generic_file_splice_read  1109   -   -1109
 
 Signed-off-by: Pieter Smith pie...@boesman.nl
 ---
  fs/Makefile|  3 ++-
  fs/splice.c|  2 --
  include/linux/fs.h | 26 ++
  include/linux/splice.h | 43 +++
  4 files changed, 71 insertions(+), 3 deletions(-)
 
 diff --git a/fs/Makefile b/fs/Makefile
 index fb7646e..9395622 100644
 --- a/fs/Makefile
 +++ b/fs/Makefile
 @@ -10,7 +10,7 @@ obj-y :=open.o read_write.o file_table.o super.o \
   ioctl.o readdir.o select.o dcache.o inode.o \
   attr.o bad_inode.o file.o filesystems.o namespace.o \
   seq_file.o xattr.o libfs.o fs-writeback.o \
 - pnode.o splice.o sync.o utimes.o \
 + pnode.o sync.o utimes.o \
   stack.o fs_struct.o statfs.o fs_pin.o
  
  ifeq ($(CONFIG_BLOCK),y)
 @@ -22,6 +22,7 @@ endif
  obj-$(CONFIG_PROC_FS) += proc_namespace.o
  
  obj-$(CONFIG_FSNOTIFY)   += notify/
 +obj-$(CONFIG_SYSCALL_SPLICE) += splice.o
  obj-$(CONFIG_EPOLL)  += eventpoll.o
  obj-$(CONFIG_ANON_INODES)+= anon_inodes.o
  obj-$(CONFIG_SIGNALFD)   += signalfd.o
 diff --git a/fs/splice.c b/fs/splice.c
 index 7c4c695..44b201b 100644
 --- a/fs/splice.c
 +++ b/fs/splice.c
 @@ -1316,7 +1316,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
 struct file *out,
   return ret;
  }
  
 -#ifdef CONFIG_SYSCALL_SPLICE
  static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
  struct pipe_inode_info *opipe,
  size_t len, unsigned int flags);
 @@ -2201,5 +2200,4 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
 in_fd,
   return do_sendfile(out_fd, in_fd, NULL, count, 0);
  }
  #endif
 -#endif
  
 diff --git a/include/linux/fs.h b/include/linux/fs.h
 index 04c0975..9b3054e 100644
 --- a/include/linux/fs.h
 +++ b/include/linux/fs.h
 @@ -2449,6 +2449,7 @@ extern void block_sync_page(struct page *page);
  #define __splice_p(x) NULL
  #endif
  
 +#ifdef CONFIG_SYSCALL_SPLICE
  /* fs/splice.c */
  extern ssize_t generic_file_splice_read(struct file *, loff_t *,
   struct pipe_inode_info *, size_t, unsigned int);
 @@ -2458,6 +2459,31 @@ extern ssize_t iter_file_splice_write(struct 
 pipe_inode_info *,
   struct file *, loff_t *, size_t, unsigned int);
  extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
   struct file *out, loff_t *, size_t len, unsigned int flags);
 +#else
 +static inline ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 + struct pipe_inode_info *pipe

Re: [PATCH 5/6] net/core: support compiling out splice

2014-11-22 Thread Josh Triplett

[Please don't top-post.]

On Sat, Nov 22, 2014 at 11:50:51PM +0100, Pieter Smith wrote:
 splice exports a structure that is used by skbuf. Mocking out a function is
 straightforward. To my knowledge there is no elegant way of mocking out a
 splice_operations struct. I directly modified the code to prevent linking
 against the struct. Do you know of a better technique to get the same
 result?

No, I don't.  The approach you took seems fine; I'm just saying that you
need to explain the need for it in the commit message.

- JosH Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: [PATCH 1/1] init:add boot option "initramfs_packnum"

2014-11-20 Thread Josh Triplett

On Fri, Nov 21, 2014 at 11:46:42AM +0800, leizhen wrote:
> During the development phase, we usually enlarge initrd-end to reserve more
> memory than the initfs zip exactly occupied. Then we can easily add or delete
> files in zip package, without generate fdt again and again. But unfortunately,
> if too many zeros followed initfs zip, it will take a long time to break the
> loop.
> 
> while (!message && len) {
> ... ...
> if (!*buf) {
>   buf++;
> ... ...
> continue;
> }
> 
> So, use the boot option "initramfs_packnum" to specify how many zip packages
> in each initrd area. When the specified number of packages decompressed in one
> area, immediately terminate the loop. Have no impact on current use by 
> default.

This seems like the wrong approach to me.  Rather than changing Linux to
ignore excess data in the initramfs, could you change your bootloader to
pass the true length of the initramfs to Linux?  Even if you reserve
extra space, if you know how much data you've actually included, you
could have your bootloader pass that information to Linux.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: [PATCH 1/1] init:add boot option initramfs_packnum

2014-11-20 Thread Josh Triplett

On Fri, Nov 21, 2014 at 11:46:42AM +0800, leizhen wrote:
 During the development phase, we usually enlarge initrd-end to reserve more
 memory than the initfs zip exactly occupied. Then we can easily add or delete
 files in zip package, without generate fdt again and again. But unfortunately,
 if too many zeros followed initfs zip, it will take a long time to break the
 loop.
 
 while (!message  len) {
 ... ...
 if (!*buf) {
   buf++;
 ... ...
 continue;
 }
 
 So, use the boot option initramfs_packnum to specify how many zip packages
 in each initrd area. When the specified number of packages decompressed in one
 area, immediately terminate the loop. Have no impact on current use by 
 default.

This seems like the wrong approach to me.  Rather than changing Linux to
ignore excess data in the initramfs, could you change your bootloader to
pass the true length of the initramfs to Linux?  Even if you reserve
extra space, if you know how much data you've actually included, you
could have your bootloader pass that information to Linux.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the net-next tree

2014-11-17 Thread Josh Triplett

On Mon, Nov 17, 2014 at 04:35:03PM +1100, Stephen Rothwell wrote:
> Hi Josh,
> 
> Today's linux-next merge of the tiny tree got a conflict in
> net/openvswitch/Kconfig between commit 8cd4313aa775 ("openvswitch: Fix
> build failure") from the net-next tree and commit b043d487e255 ("lib:
> Conditionally compile flex_array") from the tiny tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Looks good to me.

> diff --cc net/openvswitch/Kconfig
> index b7d818c59423,1d979cecd66e..
> --- a/net/openvswitch/Kconfig
> +++ b/net/openvswitch/Kconfig
> @@@ -4,9 -4,8 +4,10 @@@
>   
>   config OPENVSWITCH
>   tristate "Open vSwitch"
>  +depends on INET
>   select LIBCRC32C
>  +select NET_MPLS_GSO
> + select FLEX_ARRAY
>   ---help---
> Open vSwitch is a multilayer Ethernet switch targeted at virtualized
> environments.  In addition to supporting a variety of features


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the net-next tree

2014-11-17 Thread Josh Triplett

On Mon, Nov 17, 2014 at 04:35:03PM +1100, Stephen Rothwell wrote:
 Hi Josh,
 
 Today's linux-next merge of the tiny tree got a conflict in
 net/openvswitch/Kconfig between commit 8cd4313aa775 (openvswitch: Fix
 build failure) from the net-next tree and commit b043d487e255 (lib:
 Conditionally compile flex_array) from the tiny tree.
 
 I fixed it up (see below) and can carry the fix as necessary (no action
 is required).

Looks good to me.

 diff --cc net/openvswitch/Kconfig
 index b7d818c59423,1d979cecd66e..
 --- a/net/openvswitch/Kconfig
 +++ b/net/openvswitch/Kconfig
 @@@ -4,9 -4,8 +4,10 @@@
   
   config OPENVSWITCH
   tristate Open vSwitch
  +depends on INET
   select LIBCRC32C
  +select NET_MPLS_GSO
 + select FLEX_ARRAY
   ---help---
 Open vSwitch is a multilayer Ethernet switch targeted at virtualized
 environments.  In addition to supporting a variety of features


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-16 Thread Josh Triplett

On Sun, Nov 16, 2014 at 07:42:30AM -0800, Andy Lutomirski wrote:
> On Sun, Nov 16, 2014 at 5:32 AM, Theodore Ts'o  wrote:
> > On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote:
> >>
> >> That may be a bug with the user namespace permission check.  Perhaps we
> >> shouldn't allow dropping groups that aren't mapped in the user
> >> namespace.
> >
> > I'm not saying that we can't change the behavior of whether or not a
> > user can drop a group permission.  I'm just saying that we need to do
> > so consciously.  The setgroups()/getgroups() ABI isn't part of
> > POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those
> > people who care about that.
> 
> It may make sense to reach out to some place like oss-security.
> 
> FWIW, I think we should ask, at the same time, about:
> 
>  - Dropping supplementary groups.
>  - Switching gid/egid/sgid to a supplementary group.
>  - Denying ptrace of a process with supplementary groups that the
> tracer doesn't have.

I wonder how crazy it would be to just require either CAP_SYS_PTRACE or
cred1 == cred2 (as in, you have *exactly* the same credentials as the
target)?

> Also, I much prefer a sysctl to a boot option.  Boot options are nasty
> to configure in many distributions.

Agreed.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-16 Thread Josh Triplett

On Sun, Nov 16, 2014 at 08:32:30AM -0500, Theodore Ts'o wrote:
> On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote:
> > That may be a bug with the user namespace permission check.  Perhaps we
> > shouldn't allow dropping groups that aren't mapped in the user
> > namespace.
> 
> I'm not saying that we can't change the behavior of whether or not a
> user can drop a group permission.  I'm just saying that we need to do
> so consciously.

Agreed.

> The setgroups()/getgroups() ABI isn't part of
> POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those
> people who care about that.

POSIX.1-2001 actually specifies getgroups, but not setgroups.  In any
case, yes, POSIX doesn't say anything about this behavior.

> The bigger deal is that it's very different from how BSD 4.x has
> handled things, which means there is two decades of history that we're
> looking at here.  And there are times when taking away permissions in
> an expected fashion can cause security problems.  (As a silly example;
> some architect at Digital wrote a spec that said that setuid must
> return EINVAL for values greater than 32k --- back in the days when
> uid's were a signed short.  The junior programmer who implemented this
> for Ultrix made the check for 32,000 decimal.  Guess what happened
> when /bin/login got a failure with setuid when it wasn't expecting one
> --- since root could never get an error with that system call, right?

Ignored it and kept going, starting the user's shell as root?

I'd guess that a similar story motivated the note in the Linux manpages
for setuid, setresuid, and similar, saying "Note: there are cases where
setuid() can fail even when the caller is UID 0; it is a grave security
error to omit checking for a failure return from setuid().".

(Also, these days, glibc marks setuid and similar with the
warn_unused_result attribute.)

> And MIT Project Athena started ran out of lower numbered uid's and
> froshlings started getting assigned uid's > 32,000)
> 
> In this particular case, the change is probably a little less likely
> to cause serious problems, although the fact that sudo does allow
> negative group assignments is an example of another potential
> breakage.
> 
> OTOH, I'm aware of how this could cause major problems to the concept
> of allowing an untrusted user to set up their own containers to
> constrain what program with a possibly untrusted provinance might be
> able to do.  I can see times when I might want to run in a container
> where the user didn't have access to groups that I have access to by
> default --- including groups such as disk, sudo, lpadmin, etc.
> 
> If we do want to make such a change, my suggestion is to keep things
> *very* simple.  Let it be a boot-time option whether or not users are
> allowed to drop group permissions, and let it affect all possible ways
> that users can drop groups.  And we can create a shell script that
> will search for the obvious ways that a user could get screwed by
> enabling this, which we can encourage distributions to package up for
> their end users.  And then we document the heck out of the fact that
> this option exists, and when/if we want to make it the default, so
> it's perfectly clear and transparent to all what is happening.

An option sounds sensible to me.  I think a sysctl makes more sense,
though.  I'll add one in v4.

What did you have in mind about the shell script? Something like:
grep -r !% /etc/sudoers /etc/sudoers.d
?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 51/56] drivers/char/mem: support compiling out splice

2014-11-16 Thread Josh Triplett

[Please don't top-post.]

On Sun, Nov 16, 2014 at 01:18:27PM +0100, Pieter Smith wrote:
> I had a look at the numbers (see below): For a tinyconfig kernel, savings
> outside fs/splice.c are negligible. With the addition of networking
> however, net/core and net/ipv4 can be squeezed for usable savings.
> 
> Should I leave networking (and the rest) until we can come up with a
> non-icky approach outside fs/splice.c?
>
> *THE NUMBERS*
> 
> Given a tinyconfig, fs/splice.c can free up 8.2K (3.5K + 4.7K) with
> CONFIG_SYSCALL_SPLICE. The rest are not worth the effort:
>   fs/splice:
> syscalls only: add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693
> (-3579)
> the remainder: add/remove: 0/24 grow/shrink: 0/4 up/down: 0/-4824
> (-4824)

Nice!

Go ahead and submit the patches for that portion, and the rest can wait
until we get compiler support for omitting structure fields.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 51/56] drivers/char/mem: support compiling out splice

2014-11-16 Thread Josh Triplett

[Please don't top-post.]

On Sun, Nov 16, 2014 at 01:18:27PM +0100, Pieter Smith wrote:
 I had a look at the numbers (see below): For a tinyconfig kernel, savings
 outside fs/splice.c are negligible. With the addition of networking
 however, net/core and net/ipv4 can be squeezed for usable savings.
 
 Should I leave networking (and the rest) until we can come up with a
 non-icky approach outside fs/splice.c?

 *THE NUMBERS*
 
 Given a tinyconfig, fs/splice.c can free up 8.2K (3.5K + 4.7K) with
 CONFIG_SYSCALL_SPLICE. The rest are not worth the effort:
   fs/splice:
 syscalls only: add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693
 (-3579)
 the remainder: add/remove: 0/24 grow/shrink: 0/4 up/down: 0/-4824
 (-4824)

Nice!

Go ahead and submit the patches for that portion, and the rest can wait
until we get compiler support for omitting structure fields.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-16 Thread Josh Triplett

On Sun, Nov 16, 2014 at 08:32:30AM -0500, Theodore Ts'o wrote:
 On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote:
  That may be a bug with the user namespace permission check.  Perhaps we
  shouldn't allow dropping groups that aren't mapped in the user
  namespace.
 
 I'm not saying that we can't change the behavior of whether or not a
 user can drop a group permission.  I'm just saying that we need to do
 so consciously.

Agreed.

 The setgroups()/getgroups() ABI isn't part of
 POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those
 people who care about that.

POSIX.1-2001 actually specifies getgroups, but not setgroups.  In any
case, yes, POSIX doesn't say anything about this behavior.

 The bigger deal is that it's very different from how BSD 4.x has
 handled things, which means there is two decades of history that we're
 looking at here.  And there are times when taking away permissions in
 an expected fashion can cause security problems.  (As a silly example;
 some architect at Digital wrote a spec that said that setuid must
 return EINVAL for values greater than 32k --- back in the days when
 uid's were a signed short.  The junior programmer who implemented this
 for Ultrix made the check for 32,000 decimal.  Guess what happened
 when /bin/login got a failure with setuid when it wasn't expecting one
 --- since root could never get an error with that system call, right?

Ignored it and kept going, starting the user's shell as root?

I'd guess that a similar story motivated the note in the Linux manpages
for setuid, setresuid, and similar, saying Note: there are cases where
setuid() can fail even when the caller is UID 0; it is a grave security
error to omit checking for a failure return from setuid()..

(Also, these days, glibc marks setuid and similar with the
warn_unused_result attribute.)

 And MIT Project Athena started ran out of lower numbered uid's and
 froshlings started getting assigned uid's  32,000)
 
 In this particular case, the change is probably a little less likely
 to cause serious problems, although the fact that sudo does allow
 negative group assignments is an example of another potential
 breakage.
 
 OTOH, I'm aware of how this could cause major problems to the concept
 of allowing an untrusted user to set up their own containers to
 constrain what program with a possibly untrusted provinance might be
 able to do.  I can see times when I might want to run in a container
 where the user didn't have access to groups that I have access to by
 default --- including groups such as disk, sudo, lpadmin, etc.
 
 If we do want to make such a change, my suggestion is to keep things
 *very* simple.  Let it be a boot-time option whether or not users are
 allowed to drop group permissions, and let it affect all possible ways
 that users can drop groups.  And we can create a shell script that
 will search for the obvious ways that a user could get screwed by
 enabling this, which we can encourage distributions to package up for
 their end users.  And then we document the heck out of the fact that
 this option exists, and when/if we want to make it the default, so
 it's perfectly clear and transparent to all what is happening.

An option sounds sensible to me.  I think a sysctl makes more sense,
though.  I'll add one in v4.

What did you have in mind about the shell script? Something like:
grep -r !% /etc/sudoers /etc/sudoers.d
?

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-16 Thread Josh Triplett

On Sun, Nov 16, 2014 at 07:42:30AM -0800, Andy Lutomirski wrote:
 On Sun, Nov 16, 2014 at 5:32 AM, Theodore Ts'o ty...@mit.edu wrote:
  On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote:
 
  That may be a bug with the user namespace permission check.  Perhaps we
  shouldn't allow dropping groups that aren't mapped in the user
  namespace.
 
  I'm not saying that we can't change the behavior of whether or not a
  user can drop a group permission.  I'm just saying that we need to do
  so consciously.  The setgroups()/getgroups() ABI isn't part of
  POSIX/SuSv3 so we wouldn't be breaking POSIX compatibility, for those
  people who care about that.
 
 It may make sense to reach out to some place like oss-security.
 
 FWIW, I think we should ask, at the same time, about:
 
  - Dropping supplementary groups.
  - Switching gid/egid/sgid to a supplementary group.
  - Denying ptrace of a process with supplementary groups that the
 tracer doesn't have.

I wonder how crazy it would be to just require either CAP_SYS_PTRACE or
cred1 == cred2 (as in, you have *exactly* the same credentials as the
target)?

 Also, I much prefer a sysctl to a boot option.  Boot options are nasty
 to configure in many distributions.

Agreed.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add supplementary UIDs, and getusers/setusers system calls

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 11:08:31PM -0800, Josh Triplett wrote:
>  asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist);
> +asmlinkage long sys_setusers(int uidsetsize, uid_t __user *grouplist);

Obvious typo here: s/grouplist/userlist/.  Will fix in a v2, but I'll
wait for other feedback on v1 first.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH manpages] Document supplementary user IDs

2014-11-15 Thread Josh Triplett

Add new manpages for getusers(2) and setusers(2).  Discuss supplementary
UIDs in credentials(7).  Update manpages for seteuid(2), setfsuid(2),
setresuid(2), setreuid(2), and setuid(2).

Signed-off-by: Josh Triplett 
---
 man2/getusers.2| 177 +
 man2/seteuid.2 |  10 ++-
 man2/setfsuid.2|   4 +-
 man2/setresuid.2   |   8 ++-
 man2/setreuid.2|  12 +++-
 man2/setuid.2  |   7 ++-
 man2/setusers.2|   1 +
 man7/credentials.7 |  21 ++-
 8 files changed, 229 insertions(+), 11 deletions(-)
 create mode 100644 man2/getusers.2
 create mode 100644 man2/setusers.2

diff --git a/man2/getusers.2 b/man2/getusers.2
new file mode 100644
index 000..e5dd371
--- /dev/null
+++ b/man2/getusers.2
@@ -0,0 +1,177 @@
+.\" Copyright 2014 Josh Triplett 
+.\" Based on getgroups.2:
+.\" Copyright 1993 Rickard E. Faith (fa...@cs.unc.edu)
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.\" Modified Thu Oct 31 12:04:29 1996 by Eric S. Raymond 
+.\" Modified, 27 May 2004, Michael Kerrisk 
+.\" Added notes on capability requirements
+.\" 2008-05-03, mtk, expanded and rewrote parts of DESCRIPTION and RETURN
+.\" VALUE, made style of page more consistent with man-pages style.
+.\"
+.TH GETUSERS 2 2014-10-20 "Linux" "Linux Programmer's Manual"
+.SH NAME
+getusers, setusers \- get/set list of supplementary user IDs
+.SH SYNOPSIS
+.B #include 
+.br
+.B #include 
+.sp
+.BI "int getusers(int " size ", uid_t *" list );
+.sp
+.BI "int setusers(int " size ", const uid_t *" list );
+.sp
+.in -4n
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.in
+.sp
+.BR getusers (),
+.BR setusers ():
+_GNU_SOURCE
+.SH DESCRIPTION
+.PP
+.BR getusers ()
+returns the supplementary user IDs of the calling process in
+.IR list .
+The argument
+.I size
+should be set to the maximum number of items that can be stored in the
+buffer pointed to by
+.IR list .
+If the calling process has more than
+.I size
+supplementary user IDs,
+.BR getusers ()
+returns an error.
+The returned list may or may not include the real, effective, saved, or 
filesystem user ID of the calling process.
+
+If
+.I size
+is zero,
+.I list
+is not modified, but the total number of supplementary user IDs for the
+process is returned.
+This allows the caller to determine the size of a dynamically allocated
+.I list
+to be used in a further call to
+.BR getusers ().
+.PP
+.BR setusers ()
+sets the supplementary user IDs for the calling process.
+The
+.I size
+argument specifies the number of supplementary user IDs
+in the buffer pointed to by
+.IR list .
+
+Any process may drop user IDs from its supplementary list; however, adding user
+IDs not already present in the supplementary list or in the real, effective, or
+saved user ID requires the
+.B CAP_SETUID
+capability.
+.SH RETURN VALUE
+On success,
+.BR getusers ()
+returns the number of supplementary user IDs.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+
+On success,
+.BR setusers ()
+returns 0.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+.TP
+.B EFAULT
+.I list
+has an invalid address.
+.PP
+.BR getusers ()
+can additionally fail with the following error:
+.TP
+.B EINVAL
+.I size
+is less than the number of supplementary user IDs, but is not zero.
+.PP
+.BR setusers ()
+can additionally fail with the following errors:
+.TP
+.B EINVAL
+.I size
+is greater than
+.B NUSERS_MAX
+(65536).
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B EPERM
+.I list
+contains user IDs not present in the current supplemental user IDs list,
+and not equal to the real, effective, or saved user ID, and the calling
+process does not ha

[PATCH] Add supplementary UIDs, and getusers/setusers system calls

2014-11-15 Thread Josh Triplett

Analogous to the supplementary GID list, the supplementary UID list
provides a set of additional user credentials that a process can act as.
A process with CAP_SETUID can set its UID list arbitrarily; a process
without CAP_SETUID can only reduce its UID list.

This allows each user to have a set of UIDs that they can then use to
further sandbox individual child processes without first escalating to
root to change UIDs.  For instance, a PAM module could give each user a
block of UIDs to work with.

Tested via the following test program:

#include 
#include 
#include 
#include 

static int getusers(int count, uid_t *uids)
{
return syscall(322, count, uids);
}

static int setusers(int count, const uid_t *uids)
{
return syscall(323, count, uids);
}

static void show_users(void)
{
uid_t uids[65536];
int i, count = getusers(65536, uids);
if (count < 0)
err(1, "getusers");
printf("UIDs:");
for (i = 0; i < count; i++)
printf(" %u", (unsigned)uids[i]);
printf("\n");
}

int main(void)
{
uid_t list1[] = { 1, 2, 3, 4, 5 };
uid_t list2[] = { 1, 2, 3, 4 };
uid_t list3[] = { 2, 3, 4 };
show_users();
if (setusers(5, list1) < 0)
err(1, "setusers 1");
show_users();
if (setresgid(1, 1, 1) < 0)
err(1, "setresgid");
if (setresuid(1, 1, 1) < 0)
err(1, "setresuid");
if (setusers(4, list2) < 0)
err(1, "setusers 2");
show_users();
if (setusers(3, list3) < 0)
err(1, "setusers 3");
show_users();
if (setusers(4, list2) < 0)
err(1, "setusers 4");
show_users();
if (setresuid(2, 2, 2) < 0)
err(1, "setresuid 2");
if (setusers(5, list1) < 0)
err(1, "setusers 5");
show_users();

return 0;
}

In this test, all but the last call to setusers succeeds; the last call
fails with EPERM because the unprivileged process attempts to add UID 5
to the supplementary UID list, which it does not currently have.

Signed-off-by: Josh Triplett 
---
 arch/x86/syscalls/syscall_32.tbl  |   2 +
 arch/x86/syscalls/syscall_64.tbl  |   2 +
 include/linux/cred.h  |  66 +++
 include/linux/syscalls.h  |   2 +
 include/uapi/asm-generic/unistd.h |   6 +-
 include/uapi/linux/limits.h   |   1 +
 init/Kconfig  |   9 ++
 kernel/cred.c |   4 +
 kernel/groups.c   | 173 ++
 kernel/sys.c  |  21 +++--
 kernel/sys_ni.c   |   2 +
 11 files changed, 280 insertions(+), 8 deletions(-)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..55717d7 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,5 @@
 355i386getrandom   sys_getrandom
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
+358i386getuserssys_getusers
+359i386setuserssys_setusers
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..5572e67 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,8 @@
 319common  memfd_createsys_memfd_create
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
+322common  getuserssys_getusers
+323common  setuserssys_setusers
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/cred.h b/include/linux/cred.h
index b2d0820..31169fe 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -76,6 +76,8 @@ extern int groups_search(const struct group_info *, kgid_t);
 extern int in_group_p(kgid_t);
 extern int in_egroup_p(kgid_t);
 
+struct user_info;
+
 /*
  * The security context of a task
  *
@@ -135,6 +137,12 @@ struct cred {
struct user_struct *user;   /* real user ID subscription */
struct user_namespace *user_ns; /* user_ns the caps and keyrings are 
relative to. */
struct group_info *group_info;  /* supplementary groups for euid/fsgid 
*/
+#ifdef CONFIG_SUPPLEMENTARY_UIDS
+   struct user_info *user_info;/* supplementary users */
+#define INIT_USER_INFO .user_info = _users,
+#else
+#define INIT_USER_INFO
+#endif
struct rcu_head rcu;/* RCU deletion hook */
 };
 
@@ -381,4 +389,62 @@ do {   \
*(_fsgid) = __cred->fsgid;  \
 } while(0)
 
+#ifdef CONFIG_SUPPLEMENTARY_UIDS
+struct user_info {
+   atomic_tusage;
+   int nusers;
+   int nblocks;
+   kuid_t  small_block[NGROU

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote:
> Josh Triplett  writes:
> > On November 15, 2014 6:05:11 PM PST, Theodore Ts'o  wrote:
> >>On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote:
> >>> > However, sudoers seems to allow negative group matches.  So maybe
> >>> > allowing this only with no_new_privs already set would make sense.
> >>> 
> >>> Sigh, bad sudo.  Sure, restricting this to no_new_privs only seems
> >>fine.
> >>> I'll do that in v2, and document that in the manpage.
> >>
> >>I've also seen use cases (generally back in the bad old days of big
> >>timesharing VAX 750's :-) where the system admin might assign someone
> >>to the "games-abusers" group, and then set /usr/games to mode 705
> >>root:games-abusers --- presumably because it's easier to add a few
> >>people to the deny list rather than having to add all of the EECS
> >>department to the games group minus the abusers.
> >>
> >>So arbitrarily anyone to drop groups from their supplemental group
> >>list will result in a change from both existing practice and legacy
> >>Unix systems, and it could potentially lead to a security exposure.
> >
> > As Andy pointed out, you can already do that with a user namespace,
> 
> That may be a bug with the user namespace permission check.  Perhaps we
> shouldn't allow dropping groups that aren't mapped in the user
> namespace.

Changing that would break containers for users on many common Linux
distributions, which put users in various supplementary groups by
default.

I do actually have another patch planned, to allow users to set up gid
maps for groups they currently have permission for, not just for their
primary group.  But even with that change, breaking the ability for
container-root to use setgroups to drop all its supplementary groups
seems very wrong, and seems highly likely to break real-world software
running in containers.

> To add to the discussion there are also sg and newgroup, though I don't
> think they touch supplementary groups.

Both sg and newgrp do prove a different point, though: an unprivileged
user should be able to change their gid to one of their supplementary
groups.  That's another patch I planned to submit after this one:
allow setgid/setresgid/etc to a supplementary group ID.

> > for any case not involving a setuid or setgid (or otherwise
> > privilege-gaining) program.  And requiring no_new_privs handles that.
> 
> Frankly adding a no_new_privs check to allow something/anything scares
> me.  That converts no_new_privs from something simple and easy to
> understand to something much more complicated.

no_new_privs already exists in large part to support such use cases,
such as setting seccomp filters, which would be dangerous for
privilege-escalating programs to inherit.

> > Given the combination of those two things, do you still see any
> > problematic cases?
> 
> Josh I think it is time to stop and make certain you understand how
> unix groups work in the large.
>
> It seems to me that either supplementary groups can be dropped or
> supplementary groups can not be dropped.  If they can be dropped we
> shouldn't need complicated checks to allow them to be dropped in some
> circumstances but not others.
> 
> Unix groups are an old interface and they have been used in a lot of
> ways over the years.  We need to be careful any change we make is good
> and truly backwards compatible and that we do our darndest not to
> introduce security holes.

Seeking the decades-old precedents and corner cases that might
contradict the most obvious semantics is a large part of what I'm
looking for out of this thread.

However, I don't necessarily think that we must permit unprivileged
setgroups in *all* cases if we permit it in *any*; it may well make
sense to reduce the surface area of the change by limiting the
circumstances in which you can do this.  Whether it does end up making
sense to do so depends on the outcome of this discussion.

I added the no_new_privs check at Andy's suggestion, since the use case
I had in mind will work just fine with that restriction.  I'm also open
to just allowing setgroups in all cases, if that semantic seems
preferable, or to adding some separate check specific to this case if
that really seems necessary.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 10:40:06PM -0500, Theodore Ts'o wrote:
> On Sat, Nov 15, 2014 at 06:35:05PM -0800, Josh Triplett wrote:
> > >So arbitrarily anyone to drop groups from their supplemental group
> > >list will result in a change from both existing practice and legacy
> > >Unix systems, and it could potentially lead to a security exposure.
> > 
> > As Andy pointed out, you can already do that with a user namespace,
> > for any case not involving a setuid or setgid (or otherwise
> > privilege-gaining) program.  And requiring no_new_privs handles
> > that.
> 
> Well, it's no worse than what we can do already with the user
> namespace, yes.  I'm still worried it's going to come as a surprise
> for some configurations because it's a change from what was allowed
> historically.  Then again, pretty much all of the tripwire and rootkit
> scanners won't notice a "setuid" program that uses capabilities
> instead of the traditional setuid bit, and most sysadmins won't think
> to check for an executable with a forced capability mask, so this
> isn't exactly a new problem

We certainly have introduced orthogonal APIs in various areas before,
such that applications written prior to those APIs may interact
interestingly with them; we don't allow *breaking* those applications,
or introducing security holes, but the existence of applications
designed to block one particular way to do something doesn't
*automatically* rule out the possibility of adding another way to do it.
It does require some careful thought, though.

When we introduced seccomp filters for syscalls, we tied them to
no_new_privs to prevent any possible security holes caused by selective
syscall denial/filtration.

In this case, I'm indifferent about whether unprivileged setgroups works
without no_new_privs; if people are comfortable with that, fine, and if
people would prefer no_new_privs (or for that matter a sysctl, a
compile-time option, or any other means of making the behavior
optional), I can do that too.  The security model of "having a group
gives you less privilege than not having it" seems crazy, but
nonetheless I can see a couple of easy ways that we can avoid breaking
that pattern, no_new_privs being one of them.  I'd like to make sure
that nobody sees any other real-world corner case that unprivileged
setgroups would break.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On November 15, 2014 6:05:11 PM PST, Theodore Ts'o  wrote:
>On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote:
>> > However, sudoers seems to allow negative group matches.  So maybe
>> > allowing this only with no_new_privs already set would make sense.
>> 
>> Sigh, bad sudo.  Sure, restricting this to no_new_privs only seems
>fine.
>> I'll do that in v2, and document that in the manpage.
>
>I've also seen use cases (generally back in the bad old days of big
>timesharing VAX 750's :-) where the system admin might assign someone
>to the "games-abusers" group, and then set /usr/games to mode 705
>root:games-abusers --- presumably because it's easier to add a few
>people to the deny list rather than having to add all of the EECS
>department to the games group minus the abusers.
>
>So arbitrarily anyone to drop groups from their supplemental group
>list will result in a change from both existing practice and legacy
>Unix systems, and it could potentially lead to a security exposure.

As Andy pointed out, you can already do that with a user namespace, for any 
case not involving a setuid or setgid (or otherwise privilege-gaining) program. 
 And requiring no_new_privs handles that.

Given the combination of those two things, do you still see any problematic 
cases?

- Josh Triplett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv3 1/2] groups: Factor out a function to set a pre-sorted group list

2014-11-15 Thread Josh Triplett

This way, functions that already need to sort the group list need not do
so twice.

The new set_groups_sorted is intentionally not exported.

Signed-off-by: Josh Triplett 
---
v2, v3: No changes to patch 1/2.
 kernel/groups.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f..f0667e7 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 }
 
 /**
+ * set_groups_sorted - Change a group subscription in a set of credentials
+ * @new: The newly prepared set of credentials to alter
+ * @group_info: The group list to install; must be sorted
+ */
+static void set_groups_sorted(struct cred *new, struct group_info *group_info)
+{
+   put_group_info(new->group_info);
+   get_group_info(group_info);
+   new->group_info = group_info;
+}
+
+/**
  * set_groups - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
  * @group_info: The group list to install
  */
 void set_groups(struct cred *new, struct group_info *group_info)
 {
-   put_group_info(new->group_info);
groups_sort(group_info);
-   get_group_info(group_info);
-   new->group_info = group_info;
+   set_groups_sorted(new, group_info);
 }
 
 EXPORT_SYMBOL(set_groups);
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] getgroups.2: Document unprivileged setgroups calls

2014-11-15 Thread Josh Triplett

Signed-off-by: Josh Triplett 
---
v3: Document use of gid/egid/sgid.
v2: Document requirement for no_new_privs.

(If this doesn't end up going into 3.18, the version number in the patch will
need updating.)

 man2/getgroups.2 | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/man2/getgroups.2 b/man2/getgroups.2
index 373c204..e2b834e 100644
--- a/man2/getgroups.2
+++ b/man2/getgroups.2
@@ -81,9 +81,16 @@ to be used in a further call to
 .PP
 .BR setgroups ()
 sets the supplementary group IDs for the calling process.
-Appropriate privileges (Linux: the
+A process with the
 .B CAP_SETGID
-capability) are required.
+capability may change its supplementary group IDs arbitrarily.
+As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS (see
+.BR prctl (2))
+may drop supplementary groups, or add any of the current real UID, the current
+effective UID, or the current saved set-user-ID; adding any other group ID
+requires the
+.B CAP_SETGID
+capability.
 The
 .I size
 argument specifies the number of supplementary group IDs
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv3 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

Currently, unprivileged processes (without CAP_SETGID) cannot call
setgroups at all.  In particular, processes with a set of supplementary
groups cannot further drop permissions without obtaining elevated
permissions first.

Allow unprivileged processes to call setgroups with a subset of their
current groups; only require CAP_SETGID to add a group the process does
not currently have (either as a supplementary group, or as its gid,
egid, or sgid).

Since some privileged programs (such as sudo) allow tests for
non-membership in a group, require no_new_privs to drop group
privileges.

The kernel already maintains the list of supplementary group IDs in
sorted order, and setgroups already needs to sort the new list, so this
just requires a linear comparison of the two sorted lists.

This moves the CAP_SETGID test from setgroups into set_current_groups.

Tested via the following test program:

void run_id(void)
{
pid_t p = fork();
switch (p) {
case -1:
err(1, "fork");
case 0:
execl("/usr/bin/id", "id", NULL);
err(1, "exec");
default:
if (waitpid(p, NULL, 0) < 0)
err(1, "waitpid");
}
}

int main(void)
{
gid_t list1[] = { 1, 2, 3, 4, 5 };
gid_t list2[] = { 2, 3, 4 };
run_id();
if (setgroups(5, list1) < 0)
err(1, "setgroups 1");
run_id();
if (setresgid(1, 1, 1) < 0)
err(1, "setresgid");
if (setresuid(1, 1, 1) < 0)
err(1, "setresuid");
run_id();
if (setgroups(3, list2) < 0)
err(1, "setgroups 2");
run_id();
if (setgroups(5, list1) < 0)
err(1, "setgroups 3");
run_id();

return 0;
}

Without this patch, the test program gets EPERM from the second
setgroups call, after dropping root privileges.  With this patch, the
test program successfully drops groups 1 and 5, but then gets EPERM from
the third setgroups call, since that call attempts to add groups the
process does not currently have.

Signed-off-by: Josh Triplett 
---
v3: Allow gid, egid, or sgid.
v2: Require no_new_privs.

 kernel/groups.c | 42 +++---
 kernel/uid16.c  |  2 --
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0667e7..5114155 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -153,6 +153,37 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
return 0;
 }
 
+/* Return true if the group_info is a subset of the group_info of the specified
+ * credentials.  Also allow the first group_info to contain the gid, egid, or
+ * sgid of the credentials.
+ */
+static bool group_subset(const struct group_info *g1,
+const struct cred *cred2)
+{
+   const struct group_info *g2 = cred2->group_info;
+   unsigned int i, j;
+
+   for (i = 0, j = 0; i < g1->ngroups; i++) {
+   kgid_t gid1 = GROUP_AT(g1, i);
+   kgid_t gid2;
+   for (; j < g2->ngroups; j++) {
+   gid2 = GROUP_AT(g2, j);
+   if (gid_lte(gid1, gid2))
+   break;
+   }
+   if (j >= g2->ngroups || !gid_eq(gid1, gid2)) {
+   if (!gid_eq(gid1, cred2->gid)
+   && !gid_eq(gid1, cred2->egid)
+   && !gid_eq(gid1, cred2->sgid))
+   return false;
+   } else {
+   j++;
+   }
+   }
+
+   return true;
+}
+
 /**
  * set_groups_sorted - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
@@ -189,11 +220,18 @@ int set_current_groups(struct group_info *group_info)
 {
struct cred *new;
 
+   groups_sort(group_info);
new = prepare_creds();
if (!new)
return -ENOMEM;
+   if (!(ns_capable(current_user_ns(), CAP_SETGID)
+ || (task_no_new_privs(current)
+ && group_subset(group_info, new {
+   abort_creds(new);
+   return -EPERM;
+   }
 
-   set_groups(new, group_info);
+   set_groups_sorted(new, group_info);
return commit_creds(new);
 }
 
@@ -233,8 +271,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, 
grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bb..b27e167 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t 
__user *, group

[PATCHv2 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

Currently, unprivileged processes (without CAP_SETGID) cannot call
setgroups at all.  In particular, processes with a set of supplementary
groups cannot further drop permissions without obtaining elevated
permissions first.

Allow unprivileged processes to call setgroups with a subset of their
current groups; only require CAP_SETGID to add a group the process does
not currently have.

Since some privileged programs (such as sudo) allow tests for
non-membership in a group, require no_new_privs to drop group
privileges.

The kernel already maintains the list of supplementary group IDs in
sorted order, and setgroups already needs to sort the new list, so this
just requires a linear comparison of the two sorted lists.

This moves the CAP_SETGID test from setgroups into set_current_groups.

Tested via the following test program:

void run_id(void)
{
pid_t p = fork();
switch (p) {
case -1:
err(1, "fork");
case 0:
execl("/usr/bin/id", "id", NULL);
err(1, "exec");
default:
if (waitpid(p, NULL, 0) < 0)
err(1, "waitpid");
}
}

int main(void)
{
gid_t list1[] = { 1, 2, 3, 4, 5 };
gid_t list2[] = { 2, 3, 4 };
run_id();
if (setgroups(5, list1) < 0)
err(1, "setgroups 1");
run_id();
if (setresgid(1, 1, 1) < 0)
err(1, "setresgid");
if (setresuid(1, 1, 1) < 0)
err(1, "setresuid");
run_id();
if (setgroups(3, list2) < 0)
err(1, "setgroups 2");
run_id();
if (setgroups(5, list1) < 0)
err(1, "setgroups 3");
run_id();

return 0;
}

Without this patch, the test program gets EPERM from the second
setgroups call, after dropping root privileges.  With this patch, the
test program successfully drops groups 1 and 5, but then gets EPERM from
the third setgroups call, since that call attempts to add groups the
process does not currently have.

Signed-off-by: Josh Triplett 
---
v2: Require no_new_privs.

 kernel/groups.c | 34 +++---
 kernel/uid16.c  |  2 --
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0667e7..f7fb8dd 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
return 0;
 }
 
+/* Compare two sorted group lists; return true if the first is a subset of the
+ * second.
+ */
+static bool is_subset(const struct group_info *g1, const struct group_info *g2)
+{
+   unsigned int i, j;
+
+   for (i = 0, j = 0; i < g1->ngroups; i++) {
+   kgid_t gid1 = GROUP_AT(g1, i);
+   kgid_t gid2;
+   for (; j < g2->ngroups; j++) {
+   gid2 = GROUP_AT(g2, j);
+   if (gid_lte(gid1, gid2))
+   break;
+   }
+   if (j >= g2->ngroups || !gid_eq(gid1, gid2))
+   return false;
+   j++;
+   }
+
+   return true;
+}
+
 /**
  * set_groups_sorted - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
@@ -189,11 +212,18 @@ int set_current_groups(struct group_info *group_info)
 {
struct cred *new;
 
+   groups_sort(group_info);
new = prepare_creds();
if (!new)
return -ENOMEM;
+   if (!(ns_capable(current_user_ns(), CAP_SETGID)
+ || (task_no_new_privs(current)
+ && is_subset(group_info, new->group_info {
+   abort_creds(new);
+   return -EPERM;
+   }
 
-   set_groups(new, group_info);
+   set_groups_sorted(new, group_info);
return commit_creds(new);
 }
 
@@ -233,8 +263,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, 
grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bb..b27e167 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t 
__user *, grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 manpages] getgroups.2: Document unprivileged setgroups calls

2014-11-15 Thread Josh Triplett

Signed-off-by: Josh Triplett 
---
v2: Document requirement for no_new_privs.

(If this doesn't end up going into 3.18, the version number in the patch will
need updating.)

 man2/getgroups.2 | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/man2/getgroups.2 b/man2/getgroups.2
index 373c204..3f3d330 100644
--- a/man2/getgroups.2
+++ b/man2/getgroups.2
@@ -81,9 +81,11 @@ to be used in a further call to
 .PP
 .BR setgroups ()
 sets the supplementary group IDs for the calling process.
-Appropriate privileges (Linux: the
+As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS may drop
+supplementary groups, but may not add new groups. Adding groups, or making any
+change at all without no_new_privs enabled, requires the
 .B CAP_SETGID
-capability) are required.
+capability.
 The
 .I size
 argument specifies the number of supplementary group IDs
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 1/2] groups: Factor out a function to set a pre-sorted group list

2014-11-15 Thread Josh Triplett

This way, functions that already need to sort the group list need not do
so twice.

The new set_groups_sorted is intentionally not exported.

Signed-off-by: Josh Triplett 
---
 kernel/groups.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f..f0667e7 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 }
 
 /**
+ * set_groups_sorted - Change a group subscription in a set of credentials
+ * @new: The newly prepared set of credentials to alter
+ * @group_info: The group list to install; must be sorted
+ */
+static void set_groups_sorted(struct cred *new, struct group_info *group_info)
+{
+   put_group_info(new->group_info);
+   get_group_info(group_info);
+   new->group_info = group_info;
+}
+
+/**
  * set_groups - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
  * @group_info: The group list to install
  */
 void set_groups(struct cred *new, struct group_info *group_info)
 {
-   put_group_info(new->group_info);
groups_sort(group_info);
-   get_group_info(group_info);
-   new->group_info = group_info;
+   set_groups_sorted(new, group_info);
 }
 
 EXPORT_SYMBOL(set_groups);
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 12:06:20PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 15, 2014 at 11:29 AM, Josh Triplett  wrote:
> > On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote:
> >> Josh Triplett  writes:
> >>
> >> > Currently, unprivileged processes (without CAP_SETGID) cannot call
> >> > setgroups at all.  In particular, processes with a set of supplementary
> >> > groups cannot further drop permissions without obtaining elevated
> >> > permissions first.
> >> >
> >> > Allow unprivileged processes to call setgroups with a subset of their
> >> > current groups; only require CAP_SETGID to add a group the process does
> >> > not currently have.
> >>
> >> A couple of questions.
> >> - Is there precedence in other unix flavors for this?
> >
> > I found a few references to now-nonexistent pages at MIT about a system
> > with this property, but other than that no.
> >
> > I've also found more than a few references to people wanting this
> > functionality.
> >
> >> - What motiviates this change?
> >
> > I have a series of patches planned to add more ways to drop elevated
> > privileges without requiring a transition through root to do so.  That
> > would improve the ability for unprivileged users to run programs
> > sandboxed with even *less* privileges.  (Among other things, that would
> > also allow programs running with no_new_privs to further *reduce* their
> > privileges, which they can't currently do in this case.)
> >
> >> - Have you looked to see if anything might for bug compatibilty
> >>   require applications not to be able to drop supplementary groups?
> >
> > I haven't found any such case; that doesn't mean such a case does not
> > exist.  Feedback welcome.
> >
> > The only case I can think of (and I don't know of any examples of such a
> > system): some kind of quota system that limits the members of a group to
> > a certain amount of storage, but places no such limit on non-members.
> >
> > However, the idea of *holding* a credential (a supplementary group ID)
> > giving *less* privileges, and *dropping* a credential giving *more*
> > privileges, would completely invert normal security models.  (The sane
> > way to design such a system would be to have a privileged group for
> > "users who can exceed the quota".)
> 
> Agreed.  And, if you want to bypass quotas by dropping a supplementary
> group, you already can by unsharing your user namespace.

Good point!  Given that a process can run with a new user namespace and
no other namespaces, and then drop all its other privileges that way,
the ability to drop privileges without using a user namespace seems
completely harmless, with one exception that you noted:

> However, sudoers seems to allow negative group matches.  So maybe
> allowing this only with no_new_privs already set would make sense.

Sigh, bad sudo.  Sure, restricting this to no_new_privs only seems fine.
I'll do that in v2, and document that in the manpage.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote:
> Josh Triplett  writes:
> 
> > Currently, unprivileged processes (without CAP_SETGID) cannot call
> > setgroups at all.  In particular, processes with a set of supplementary
> > groups cannot further drop permissions without obtaining elevated
> > permissions first.
> >
> > Allow unprivileged processes to call setgroups with a subset of their
> > current groups; only require CAP_SETGID to add a group the process does
> > not currently have.
> 
> A couple of questions.
> - Is there precedence in other unix flavors for this?

I found a few references to now-nonexistent pages at MIT about a system
with this property, but other than that no.

I've also found more than a few references to people wanting this
functionality.

> - What motiviates this change?

I have a series of patches planned to add more ways to drop elevated
privileges without requiring a transition through root to do so.  That
would improve the ability for unprivileged users to run programs
sandboxed with even *less* privileges.  (Among other things, that would
also allow programs running with no_new_privs to further *reduce* their
privileges, which they can't currently do in this case.)

> - Have you looked to see if anything might for bug compatibilty
>   require applications not to be able to drop supplementary groups?

I haven't found any such case; that doesn't mean such a case does not
exist.  Feedback welcome.

The only case I can think of (and I don't know of any examples of such a
system): some kind of quota system that limits the members of a group to
a certain amount of storage, but places no such limit on non-members.

However, the idea of *holding* a credential (a supplementary group ID)
giving *less* privileges, and *dropping* a credential giving *more*
privileges, would completely invert normal security models.  (The sane
way to design such a system would be to have a privileged group for
"users who can exceed the quota".)

If it turns out that a real case exists that people care about, I could
easily make this configurable, either at compile time or via a sysctl.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH manpages] getgroups.2: Document unprivileged setgroups calls

2014-11-15 Thread Josh Triplett

Signed-off-by: Josh Triplett 
---
This should probably also include appropriate documentation for what
kernel introduces this behavior.

 man2/getgroups.2 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/man2/getgroups.2 b/man2/getgroups.2
index 373c204..edca37c 100644
--- a/man2/getgroups.2
+++ b/man2/getgroups.2
@@ -81,9 +81,10 @@ to be used in a further call to
 .PP
 .BR setgroups ()
 sets the supplementary group IDs for the calling process.
-Appropriate privileges (Linux: the
+Any process may drop groups from its list, but adding groups requires
+appropriate privileges (Linux: the
 .B CAP_SETGID
-capability) are required.
+capability).
 The
 .I size
 argument specifies the number of supplementary group IDs
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] groups: Factor out a function to set a pre-sorted group list

2014-11-15 Thread Josh Triplett

This way, functions that already need to sort the group list need not do
so twice.

The new set_groups_sorted is intentionally not exported.

Signed-off-by: Josh Triplett 
---
 kernel/groups.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f..f0667e7 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 }
 
 /**
+ * set_groups_sorted - Change a group subscription in a set of credentials
+ * @new: The newly prepared set of credentials to alter
+ * @group_info: The group list to install; must be sorted
+ */
+static void set_groups_sorted(struct cred *new, struct group_info *group_info)
+{
+   put_group_info(new->group_info);
+   get_group_info(group_info);
+   new->group_info = group_info;
+}
+
+/**
  * set_groups - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
  * @group_info: The group list to install
  */
 void set_groups(struct cred *new, struct group_info *group_info)
 {
-   put_group_info(new->group_info);
groups_sort(group_info);
-   get_group_info(group_info);
-   new->group_info = group_info;
+   set_groups_sorted(new, group_info);
 }
 
 EXPORT_SYMBOL(set_groups);
-- 
2.1.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

Currently, unprivileged processes (without CAP_SETGID) cannot call
setgroups at all.  In particular, processes with a set of supplementary
groups cannot further drop permissions without obtaining elevated
permissions first.

Allow unprivileged processes to call setgroups with a subset of their
current groups; only require CAP_SETGID to add a group the process does
not currently have.

The kernel already maintains the list of supplementary group IDs in
sorted order, and setgroups already needs to sort the new list, so this
just requires a linear comparison of the two sorted lists.

This moves the CAP_SETGID test from setgroups into set_current_groups.

Tested via the following test program:

#include 
#include 
#include 
#include 
#include 

void run_id(void)
{
pid_t p = fork();
switch (p) {
case -1:
err(1, "fork");
case 0:
execl("/usr/bin/id", "id", NULL);
err(1, "exec");
default:
if (waitpid(p, NULL, 0) < 0)
err(1, "waitpid");
}
}

int main(void)
{
gid_t list1[] = { 1, 2, 3, 4, 5 };
gid_t list2[] = { 2, 3, 4 };
run_id();
if (setgroups(5, list1) < 0)
err(1, "setgroups 1");
run_id();
if (setresgid(1, 1, 1) < 0)
err(1, "setresgid");
if (setresuid(1, 1, 1) < 0)
err(1, "setresuid");
run_id();
if (setgroups(3, list2) < 0)
err(1, "setgroups 2");
run_id();
if (setgroups(5, list1) < 0)
err(1, "setgroups 3");
run_id();

return 0;
}

Without this patch, the test program gets EPERM from the second
setgroups call, after dropping root privileges.  With this patch, the
test program successfully drops groups 1 and 5, but then gets EPERM from
the third setgroups call, since that call attempts to add groups the
process does not currently have.

Signed-off-by: Josh Triplett 
---
 kernel/groups.c | 33 ++---
 kernel/uid16.c  |  2 --
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0667e7..fe7367d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
return 0;
 }
 
+/* Compare two sorted group lists; return true if the first is a subset of the
+ * second.
+ */
+static bool is_subset(const struct group_info *g1, const struct group_info *g2)
+{
+   unsigned int i, j;
+
+   for (i = 0, j = 0; i < g1->ngroups; i++) {
+   kgid_t gid1 = GROUP_AT(g1, i);
+   kgid_t gid2;
+   for (; j < g2->ngroups; j++) {
+   gid2 = GROUP_AT(g2, j);
+   if (gid_lte(gid1, gid2))
+   break;
+   }
+   if (j >= g2->ngroups || !gid_eq(gid1, gid2))
+   return false;
+   j++;
+   }
+
+   return true;
+}
+
 /**
  * set_groups_sorted - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
@@ -189,11 +212,17 @@ int set_current_groups(struct group_info *group_info)
 {
struct cred *new;
 
+   groups_sort(group_info);
new = prepare_creds();
if (!new)
return -ENOMEM;
+   if (!ns_capable(current_user_ns(), CAP_SETGID)
+   && !is_subset(group_info, new->group_info)) {
+   abort_creds(new);
+   return -EPERM;
+   }
 
-   set_groups(new, group_info);
+   set_groups_sorted(new, group_info);
return commit_creds(new);
 }
 
@@ -233,8 +262,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, 
grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bb..b27e167 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t 
__user *, grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] groups: Factor out a function to set a pre-sorted group list

2014-11-15 Thread Josh Triplett

This way, functions that already need to sort the group list need not do
so twice.

The new set_groups_sorted is intentionally not exported.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
 kernel/groups.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f..f0667e7 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 }
 
 /**
+ * set_groups_sorted - Change a group subscription in a set of credentials
+ * @new: The newly prepared set of credentials to alter
+ * @group_info: The group list to install; must be sorted
+ */
+static void set_groups_sorted(struct cred *new, struct group_info *group_info)
+{
+   put_group_info(new-group_info);
+   get_group_info(group_info);
+   new-group_info = group_info;
+}
+
+/**
  * set_groups - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
  * @group_info: The group list to install
  */
 void set_groups(struct cred *new, struct group_info *group_info)
 {
-   put_group_info(new-group_info);
groups_sort(group_info);
-   get_group_info(group_info);
-   new-group_info = group_info;
+   set_groups_sorted(new, group_info);
 }
 
 EXPORT_SYMBOL(set_groups);
-- 
2.1.3
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

Currently, unprivileged processes (without CAP_SETGID) cannot call
setgroups at all.  In particular, processes with a set of supplementary
groups cannot further drop permissions without obtaining elevated
permissions first.

Allow unprivileged processes to call setgroups with a subset of their
current groups; only require CAP_SETGID to add a group the process does
not currently have.

The kernel already maintains the list of supplementary group IDs in
sorted order, and setgroups already needs to sort the new list, so this
just requires a linear comparison of the two sorted lists.

This moves the CAP_SETGID test from setgroups into set_current_groups.

Tested via the following test program:

#include err.h
#include grp.h
#include stdio.h
#include sys/types.h
#include unistd.h

void run_id(void)
{
pid_t p = fork();
switch (p) {
case -1:
err(1, fork);
case 0:
execl(/usr/bin/id, id, NULL);
err(1, exec);
default:
if (waitpid(p, NULL, 0)  0)
err(1, waitpid);
}
}

int main(void)
{
gid_t list1[] = { 1, 2, 3, 4, 5 };
gid_t list2[] = { 2, 3, 4 };
run_id();
if (setgroups(5, list1)  0)
err(1, setgroups 1);
run_id();
if (setresgid(1, 1, 1)  0)
err(1, setresgid);
if (setresuid(1, 1, 1)  0)
err(1, setresuid);
run_id();
if (setgroups(3, list2)  0)
err(1, setgroups 2);
run_id();
if (setgroups(5, list1)  0)
err(1, setgroups 3);
run_id();

return 0;
}

Without this patch, the test program gets EPERM from the second
setgroups call, after dropping root privileges.  With this patch, the
test program successfully drops groups 1 and 5, but then gets EPERM from
the third setgroups call, since that call attempts to add groups the
process does not currently have.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
 kernel/groups.c | 33 ++---
 kernel/uid16.c  |  2 --
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0667e7..fe7367d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
return 0;
 }
 
+/* Compare two sorted group lists; return true if the first is a subset of the
+ * second.
+ */
+static bool is_subset(const struct group_info *g1, const struct group_info *g2)
+{
+   unsigned int i, j;
+
+   for (i = 0, j = 0; i  g1-ngroups; i++) {
+   kgid_t gid1 = GROUP_AT(g1, i);
+   kgid_t gid2;
+   for (; j  g2-ngroups; j++) {
+   gid2 = GROUP_AT(g2, j);
+   if (gid_lte(gid1, gid2))
+   break;
+   }
+   if (j = g2-ngroups || !gid_eq(gid1, gid2))
+   return false;
+   j++;
+   }
+
+   return true;
+}
+
 /**
  * set_groups_sorted - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
@@ -189,11 +212,17 @@ int set_current_groups(struct group_info *group_info)
 {
struct cred *new;
 
+   groups_sort(group_info);
new = prepare_creds();
if (!new)
return -ENOMEM;
+   if (!ns_capable(current_user_ns(), CAP_SETGID)
+!is_subset(group_info, new-group_info)) {
+   abort_creds(new);
+   return -EPERM;
+   }
 
-   set_groups(new, group_info);
+   set_groups_sorted(new, group_info);
return commit_creds(new);
 }
 
@@ -233,8 +262,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, 
grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize  NGROUPS_MAX)
return -EINVAL;
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bb..b27e167 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t 
__user *, grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize  NGROUPS_MAX)
return -EINVAL;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH manpages] getgroups.2: Document unprivileged setgroups calls

2014-11-15 Thread Josh Triplett

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
This should probably also include appropriate documentation for what
kernel introduces this behavior.

 man2/getgroups.2 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/man2/getgroups.2 b/man2/getgroups.2
index 373c204..edca37c 100644
--- a/man2/getgroups.2
+++ b/man2/getgroups.2
@@ -81,9 +81,10 @@ to be used in a further call to
 .PP
 .BR setgroups ()
 sets the supplementary group IDs for the calling process.
-Appropriate privileges (Linux: the
+Any process may drop groups from its list, but adding groups requires
+appropriate privileges (Linux: the
 .B CAP_SETGID
-capability) are required.
+capability).
 The
 .I size
 argument specifies the number of supplementary group IDs
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote:
 Josh Triplett j...@joshtriplett.org writes:
 
  Currently, unprivileged processes (without CAP_SETGID) cannot call
  setgroups at all.  In particular, processes with a set of supplementary
  groups cannot further drop permissions without obtaining elevated
  permissions first.
 
  Allow unprivileged processes to call setgroups with a subset of their
  current groups; only require CAP_SETGID to add a group the process does
  not currently have.
 
 A couple of questions.
 - Is there precedence in other unix flavors for this?

I found a few references to now-nonexistent pages at MIT about a system
with this property, but other than that no.

I've also found more than a few references to people wanting this
functionality.

 - What motiviates this change?

I have a series of patches planned to add more ways to drop elevated
privileges without requiring a transition through root to do so.  That
would improve the ability for unprivileged users to run programs
sandboxed with even *less* privileges.  (Among other things, that would
also allow programs running with no_new_privs to further *reduce* their
privileges, which they can't currently do in this case.)

 - Have you looked to see if anything might for bug compatibilty
   require applications not to be able to drop supplementary groups?

I haven't found any such case; that doesn't mean such a case does not
exist.  Feedback welcome.

The only case I can think of (and I don't know of any examples of such a
system): some kind of quota system that limits the members of a group to
a certain amount of storage, but places no such limit on non-members.

However, the idea of *holding* a credential (a supplementary group ID)
giving *less* privileges, and *dropping* a credential giving *more*
privileges, would completely invert normal security models.  (The sane
way to design such a system would be to have a privileged group for
users who can exceed the quota.)

If it turns out that a real case exists that people care about, I could
easily make this configurable, either at compile time or via a sysctl.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 12:06:20PM -0800, Andy Lutomirski wrote:
 On Sat, Nov 15, 2014 at 11:29 AM, Josh Triplett j...@joshtriplett.org wrote:
  On Sat, Nov 15, 2014 at 09:37:27AM -0600, Eric W. Biederman wrote:
  Josh Triplett j...@joshtriplett.org writes:
 
   Currently, unprivileged processes (without CAP_SETGID) cannot call
   setgroups at all.  In particular, processes with a set of supplementary
   groups cannot further drop permissions without obtaining elevated
   permissions first.
  
   Allow unprivileged processes to call setgroups with a subset of their
   current groups; only require CAP_SETGID to add a group the process does
   not currently have.
 
  A couple of questions.
  - Is there precedence in other unix flavors for this?
 
  I found a few references to now-nonexistent pages at MIT about a system
  with this property, but other than that no.
 
  I've also found more than a few references to people wanting this
  functionality.
 
  - What motiviates this change?
 
  I have a series of patches planned to add more ways to drop elevated
  privileges without requiring a transition through root to do so.  That
  would improve the ability for unprivileged users to run programs
  sandboxed with even *less* privileges.  (Among other things, that would
  also allow programs running with no_new_privs to further *reduce* their
  privileges, which they can't currently do in this case.)
 
  - Have you looked to see if anything might for bug compatibilty
require applications not to be able to drop supplementary groups?
 
  I haven't found any such case; that doesn't mean such a case does not
  exist.  Feedback welcome.
 
  The only case I can think of (and I don't know of any examples of such a
  system): some kind of quota system that limits the members of a group to
  a certain amount of storage, but places no such limit on non-members.
 
  However, the idea of *holding* a credential (a supplementary group ID)
  giving *less* privileges, and *dropping* a credential giving *more*
  privileges, would completely invert normal security models.  (The sane
  way to design such a system would be to have a privileged group for
  users who can exceed the quota.)
 
 Agreed.  And, if you want to bypass quotas by dropping a supplementary
 group, you already can by unsharing your user namespace.

Good point!  Given that a process can run with a new user namespace and
no other namespaces, and then drop all its other privileges that way,
the ability to drop privileges without using a user namespace seems
completely harmless, with one exception that you noted:

 However, sudoers seems to allow negative group matches.  So maybe
 allowing this only with no_new_privs already set would make sense.

Sigh, bad sudo.  Sure, restricting this to no_new_privs only seems fine.
I'll do that in v2, and document that in the manpage.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 1/2] groups: Factor out a function to set a pre-sorted group list

2014-11-15 Thread Josh Triplett

This way, functions that already need to sort the group list need not do
so twice.

The new set_groups_sorted is intentionally not exported.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
 kernel/groups.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f..f0667e7 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 }
 
 /**
+ * set_groups_sorted - Change a group subscription in a set of credentials
+ * @new: The newly prepared set of credentials to alter
+ * @group_info: The group list to install; must be sorted
+ */
+static void set_groups_sorted(struct cred *new, struct group_info *group_info)
+{
+   put_group_info(new-group_info);
+   get_group_info(group_info);
+   new-group_info = group_info;
+}
+
+/**
  * set_groups - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
  * @group_info: The group list to install
  */
 void set_groups(struct cred *new, struct group_info *group_info)
 {
-   put_group_info(new-group_info);
groups_sort(group_info);
-   get_group_info(group_info);
-   new-group_info = group_info;
+   set_groups_sorted(new, group_info);
 }
 
 EXPORT_SYMBOL(set_groups);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

Currently, unprivileged processes (without CAP_SETGID) cannot call
setgroups at all.  In particular, processes with a set of supplementary
groups cannot further drop permissions without obtaining elevated
permissions first.

Allow unprivileged processes to call setgroups with a subset of their
current groups; only require CAP_SETGID to add a group the process does
not currently have.

Since some privileged programs (such as sudo) allow tests for
non-membership in a group, require no_new_privs to drop group
privileges.

The kernel already maintains the list of supplementary group IDs in
sorted order, and setgroups already needs to sort the new list, so this
just requires a linear comparison of the two sorted lists.

This moves the CAP_SETGID test from setgroups into set_current_groups.

Tested via the following test program:

void run_id(void)
{
pid_t p = fork();
switch (p) {
case -1:
err(1, fork);
case 0:
execl(/usr/bin/id, id, NULL);
err(1, exec);
default:
if (waitpid(p, NULL, 0)  0)
err(1, waitpid);
}
}

int main(void)
{
gid_t list1[] = { 1, 2, 3, 4, 5 };
gid_t list2[] = { 2, 3, 4 };
run_id();
if (setgroups(5, list1)  0)
err(1, setgroups 1);
run_id();
if (setresgid(1, 1, 1)  0)
err(1, setresgid);
if (setresuid(1, 1, 1)  0)
err(1, setresuid);
run_id();
if (setgroups(3, list2)  0)
err(1, setgroups 2);
run_id();
if (setgroups(5, list1)  0)
err(1, setgroups 3);
run_id();

return 0;
}

Without this patch, the test program gets EPERM from the second
setgroups call, after dropping root privileges.  With this patch, the
test program successfully drops groups 1 and 5, but then gets EPERM from
the third setgroups call, since that call attempts to add groups the
process does not currently have.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
v2: Require no_new_privs.

 kernel/groups.c | 34 +++---
 kernel/uid16.c  |  2 --
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0667e7..f7fb8dd 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -153,6 +153,29 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
return 0;
 }
 
+/* Compare two sorted group lists; return true if the first is a subset of the
+ * second.
+ */
+static bool is_subset(const struct group_info *g1, const struct group_info *g2)
+{
+   unsigned int i, j;
+
+   for (i = 0, j = 0; i  g1-ngroups; i++) {
+   kgid_t gid1 = GROUP_AT(g1, i);
+   kgid_t gid2;
+   for (; j  g2-ngroups; j++) {
+   gid2 = GROUP_AT(g2, j);
+   if (gid_lte(gid1, gid2))
+   break;
+   }
+   if (j = g2-ngroups || !gid_eq(gid1, gid2))
+   return false;
+   j++;
+   }
+
+   return true;
+}
+
 /**
  * set_groups_sorted - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
@@ -189,11 +212,18 @@ int set_current_groups(struct group_info *group_info)
 {
struct cred *new;
 
+   groups_sort(group_info);
new = prepare_creds();
if (!new)
return -ENOMEM;
+   if (!(ns_capable(current_user_ns(), CAP_SETGID)
+ || (task_no_new_privs(current)
+  is_subset(group_info, new-group_info {
+   abort_creds(new);
+   return -EPERM;
+   }
 
-   set_groups(new, group_info);
+   set_groups_sorted(new, group_info);
return commit_creds(new);
 }
 
@@ -233,8 +263,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, 
grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize  NGROUPS_MAX)
return -EINVAL;
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bb..b27e167 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t 
__user *, grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize  NGROUPS_MAX)
return -EINVAL;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 manpages] getgroups.2: Document unprivileged setgroups calls

2014-11-15 Thread Josh Triplett

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
v2: Document requirement for no_new_privs.

(If this doesn't end up going into 3.18, the version number in the patch will
need updating.)

 man2/getgroups.2 | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/man2/getgroups.2 b/man2/getgroups.2
index 373c204..3f3d330 100644
--- a/man2/getgroups.2
+++ b/man2/getgroups.2
@@ -81,9 +81,11 @@ to be used in a further call to
 .PP
 .BR setgroups ()
 sets the supplementary group IDs for the calling process.
-Appropriate privileges (Linux: the
+As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS may drop
+supplementary groups, but may not add new groups. Adding groups, or making any
+change at all without no_new_privs enabled, requires the
 .B CAP_SETGID
-capability) are required.
+capability.
 The
 .I size
 argument specifies the number of supplementary group IDs
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv3 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

Currently, unprivileged processes (without CAP_SETGID) cannot call
setgroups at all.  In particular, processes with a set of supplementary
groups cannot further drop permissions without obtaining elevated
permissions first.

Allow unprivileged processes to call setgroups with a subset of their
current groups; only require CAP_SETGID to add a group the process does
not currently have (either as a supplementary group, or as its gid,
egid, or sgid).

Since some privileged programs (such as sudo) allow tests for
non-membership in a group, require no_new_privs to drop group
privileges.

The kernel already maintains the list of supplementary group IDs in
sorted order, and setgroups already needs to sort the new list, so this
just requires a linear comparison of the two sorted lists.

This moves the CAP_SETGID test from setgroups into set_current_groups.

Tested via the following test program:

void run_id(void)
{
pid_t p = fork();
switch (p) {
case -1:
err(1, fork);
case 0:
execl(/usr/bin/id, id, NULL);
err(1, exec);
default:
if (waitpid(p, NULL, 0)  0)
err(1, waitpid);
}
}

int main(void)
{
gid_t list1[] = { 1, 2, 3, 4, 5 };
gid_t list2[] = { 2, 3, 4 };
run_id();
if (setgroups(5, list1)  0)
err(1, setgroups 1);
run_id();
if (setresgid(1, 1, 1)  0)
err(1, setresgid);
if (setresuid(1, 1, 1)  0)
err(1, setresuid);
run_id();
if (setgroups(3, list2)  0)
err(1, setgroups 2);
run_id();
if (setgroups(5, list1)  0)
err(1, setgroups 3);
run_id();

return 0;
}

Without this patch, the test program gets EPERM from the second
setgroups call, after dropping root privileges.  With this patch, the
test program successfully drops groups 1 and 5, but then gets EPERM from
the third setgroups call, since that call attempts to add groups the
process does not currently have.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
v3: Allow gid, egid, or sgid.
v2: Require no_new_privs.

 kernel/groups.c | 42 +++---
 kernel/uid16.c  |  2 --
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0667e7..5114155 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -153,6 +153,37 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
return 0;
 }
 
+/* Return true if the group_info is a subset of the group_info of the specified
+ * credentials.  Also allow the first group_info to contain the gid, egid, or
+ * sgid of the credentials.
+ */
+static bool group_subset(const struct group_info *g1,
+const struct cred *cred2)
+{
+   const struct group_info *g2 = cred2-group_info;
+   unsigned int i, j;
+
+   for (i = 0, j = 0; i  g1-ngroups; i++) {
+   kgid_t gid1 = GROUP_AT(g1, i);
+   kgid_t gid2;
+   for (; j  g2-ngroups; j++) {
+   gid2 = GROUP_AT(g2, j);
+   if (gid_lte(gid1, gid2))
+   break;
+   }
+   if (j = g2-ngroups || !gid_eq(gid1, gid2)) {
+   if (!gid_eq(gid1, cred2-gid)
+!gid_eq(gid1, cred2-egid)
+!gid_eq(gid1, cred2-sgid))
+   return false;
+   } else {
+   j++;
+   }
+   }
+
+   return true;
+}
+
 /**
  * set_groups_sorted - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
@@ -189,11 +220,18 @@ int set_current_groups(struct group_info *group_info)
 {
struct cred *new;
 
+   groups_sort(group_info);
new = prepare_creds();
if (!new)
return -ENOMEM;
+   if (!(ns_capable(current_user_ns(), CAP_SETGID)
+ || (task_no_new_privs(current)
+  group_subset(group_info, new {
+   abort_creds(new);
+   return -EPERM;
+   }
 
-   set_groups(new, group_info);
+   set_groups_sorted(new, group_info);
return commit_creds(new);
 }
 
@@ -233,8 +271,6 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, 
grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize  NGROUPS_MAX)
return -EINVAL;
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bb..b27e167 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,8 +176,6 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t 
__user *, grouplist)
struct group_info *group_info;
int retval;
 
-   if (!ns_capable(current_user_ns(), CAP_SETGID))
-   return -EPERM;
if ((unsigned)gidsetsize  NGROUPS_MAX

[PATCH] getgroups.2: Document unprivileged setgroups calls

2014-11-15 Thread Josh Triplett

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
v3: Document use of gid/egid/sgid.
v2: Document requirement for no_new_privs.

(If this doesn't end up going into 3.18, the version number in the patch will
need updating.)

 man2/getgroups.2 | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/man2/getgroups.2 b/man2/getgroups.2
index 373c204..e2b834e 100644
--- a/man2/getgroups.2
+++ b/man2/getgroups.2
@@ -81,9 +81,16 @@ to be used in a further call to
 .PP
 .BR setgroups ()
 sets the supplementary group IDs for the calling process.
-Appropriate privileges (Linux: the
+A process with the
 .B CAP_SETGID
-capability) are required.
+capability may change its supplementary group IDs arbitrarily.
+As of Linux 3.18, any process that has enabled PR_SET_NO_NEW_PRIVS (see
+.BR prctl (2))
+may drop supplementary groups, or add any of the current real UID, the current
+effective UID, or the current saved set-user-ID; adding any other group ID
+requires the
+.B CAP_SETGID
+capability.
 The
 .I size
 argument specifies the number of supplementary group IDs
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv3 1/2] groups: Factor out a function to set a pre-sorted group list

2014-11-15 Thread Josh Triplett

This way, functions that already need to sort the group list need not do
so twice.

The new set_groups_sorted is intentionally not exported.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
v2, v3: No changes to patch 1/2.
 kernel/groups.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f..f0667e7 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -154,16 +154,26 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 }
 
 /**
+ * set_groups_sorted - Change a group subscription in a set of credentials
+ * @new: The newly prepared set of credentials to alter
+ * @group_info: The group list to install; must be sorted
+ */
+static void set_groups_sorted(struct cred *new, struct group_info *group_info)
+{
+   put_group_info(new-group_info);
+   get_group_info(group_info);
+   new-group_info = group_info;
+}
+
+/**
  * set_groups - Change a group subscription in a set of credentials
  * @new: The newly prepared set of credentials to alter
  * @group_info: The group list to install
  */
 void set_groups(struct cred *new, struct group_info *group_info)
 {
-   put_group_info(new-group_info);
groups_sort(group_info);
-   get_group_info(group_info);
-   new-group_info = group_info;
+   set_groups_sorted(new, group_info);
 }
 
 EXPORT_SYMBOL(set_groups);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On November 15, 2014 6:05:11 PM PST, Theodore Ts'o ty...@mit.edu wrote:
On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote:
  However, sudoers seems to allow negative group matches.  So maybe
  allowing this only with no_new_privs already set would make sense.
 
 Sigh, bad sudo.  Sure, restricting this to no_new_privs only seems
fine.
 I'll do that in v2, and document that in the manpage.

I've also seen use cases (generally back in the bad old days of big
timesharing VAX 750's :-) where the system admin might assign someone
to the games-abusers group, and then set /usr/games to mode 705
root:games-abusers --- presumably because it's easier to add a few
people to the deny list rather than having to add all of the EECS
department to the games group minus the abusers.

So arbitrarily anyone to drop groups from their supplemental group
list will result in a change from both existing practice and legacy
Unix systems, and it could potentially lead to a security exposure.

As Andy pointed out, you can already do that with a user namespace, for any 
case not involving a setuid or setgid (or otherwise privilege-gaining) program. 
 And requiring no_new_privs handles that.

Given the combination of those two things, do you still see any problematic 
cases?

- Josh Triplett

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 10:40:06PM -0500, Theodore Ts'o wrote:
 On Sat, Nov 15, 2014 at 06:35:05PM -0800, Josh Triplett wrote:
  So arbitrarily anyone to drop groups from their supplemental group
  list will result in a change from both existing practice and legacy
  Unix systems, and it could potentially lead to a security exposure.
  
  As Andy pointed out, you can already do that with a user namespace,
  for any case not involving a setuid or setgid (or otherwise
  privilege-gaining) program.  And requiring no_new_privs handles
  that.
 
 Well, it's no worse than what we can do already with the user
 namespace, yes.  I'm still worried it's going to come as a surprise
 for some configurations because it's a change from what was allowed
 historically.  Then again, pretty much all of the tripwire and rootkit
 scanners won't notice a setuid program that uses capabilities
 instead of the traditional setuid bit, and most sysadmins won't think
 to check for an executable with a forced capability mask, so this
 isn't exactly a new problem

We certainly have introduced orthogonal APIs in various areas before,
such that applications written prior to those APIs may interact
interestingly with them; we don't allow *breaking* those applications,
or introducing security holes, but the existence of applications
designed to block one particular way to do something doesn't
*automatically* rule out the possibility of adding another way to do it.
It does require some careful thought, though.

When we introduced seccomp filters for syscalls, we tied them to
no_new_privs to prevent any possible security holes caused by selective
syscall denial/filtration.

In this case, I'm indifferent about whether unprivileged setgroups works
without no_new_privs; if people are comfortable with that, fine, and if
people would prefer no_new_privs (or for that matter a sysctl, a
compile-time option, or any other means of making the behavior
optional), I can do that too.  The security model of having a group
gives you less privilege than not having it seems crazy, but
nonetheless I can see a couple of easy ways that we can avoid breaking
that pattern, no_new_privs being one of them.  I'd like to make sure
that nobody sees any other real-world corner case that unprivileged
setgroups would break.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] groups: Allow unprivileged processes to use setgroups to drop groups

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 09:08:07PM -0600, Eric W. Biederman wrote:
 Josh Triplett j...@joshtriplett.org writes:
  On November 15, 2014 6:05:11 PM PST, Theodore Ts'o ty...@mit.edu wrote:
 On Sat, Nov 15, 2014 at 12:20:42PM -0800, Josh Triplett wrote:
   However, sudoers seems to allow negative group matches.  So maybe
   allowing this only with no_new_privs already set would make sense.
  
  Sigh, bad sudo.  Sure, restricting this to no_new_privs only seems
 fine.
  I'll do that in v2, and document that in the manpage.
 
 I've also seen use cases (generally back in the bad old days of big
 timesharing VAX 750's :-) where the system admin might assign someone
 to the games-abusers group, and then set /usr/games to mode 705
 root:games-abusers --- presumably because it's easier to add a few
 people to the deny list rather than having to add all of the EECS
 department to the games group minus the abusers.
 
 So arbitrarily anyone to drop groups from their supplemental group
 list will result in a change from both existing practice and legacy
 Unix systems, and it could potentially lead to a security exposure.
 
  As Andy pointed out, you can already do that with a user namespace,
 
 That may be a bug with the user namespace permission check.  Perhaps we
 shouldn't allow dropping groups that aren't mapped in the user
 namespace.

Changing that would break containers for users on many common Linux
distributions, which put users in various supplementary groups by
default.

I do actually have another patch planned, to allow users to set up gid
maps for groups they currently have permission for, not just for their
primary group.  But even with that change, breaking the ability for
container-root to use setgroups to drop all its supplementary groups
seems very wrong, and seems highly likely to break real-world software
running in containers.

 To add to the discussion there are also sg and newgroup, though I don't
 think they touch supplementary groups.

Both sg and newgrp do prove a different point, though: an unprivileged
user should be able to change their gid to one of their supplementary
groups.  That's another patch I planned to submit after this one:
allow setgid/setresgid/etc to a supplementary group ID.

  for any case not involving a setuid or setgid (or otherwise
  privilege-gaining) program.  And requiring no_new_privs handles that.
 
 Frankly adding a no_new_privs check to allow something/anything scares
 me.  That converts no_new_privs from something simple and easy to
 understand to something much more complicated.

no_new_privs already exists in large part to support such use cases,
such as setting seccomp filters, which would be dangerous for
privilege-escalating programs to inherit.

  Given the combination of those two things, do you still see any
  problematic cases?
 
 Josh I think it is time to stop and make certain you understand how
 unix groups work in the large.

 It seems to me that either supplementary groups can be dropped or
 supplementary groups can not be dropped.  If they can be dropped we
 shouldn't need complicated checks to allow them to be dropped in some
 circumstances but not others.
 
 Unix groups are an old interface and they have been used in a lot of
 ways over the years.  We need to be careful any change we make is good
 and truly backwards compatible and that we do our darndest not to
 introduce security holes.

Seeking the decades-old precedents and corner cases that might
contradict the most obvious semantics is a large part of what I'm
looking for out of this thread.

However, I don't necessarily think that we must permit unprivileged
setgroups in *all* cases if we permit it in *any*; it may well make
sense to reduce the surface area of the change by limiting the
circumstances in which you can do this.  Whether it does end up making
sense to do so depends on the outcome of this discussion.

I added the no_new_privs check at Andy's suggestion, since the use case
I had in mind will work just fine with that restriction.  I'm also open
to just allowing setgroups in all cases, if that semantic seems
preferable, or to adding some separate check specific to this case if
that really seems necessary.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH manpages] Document supplementary user IDs

2014-11-15 Thread Josh Triplett

Add new manpages for getusers(2) and setusers(2).  Discuss supplementary
UIDs in credentials(7).  Update manpages for seteuid(2), setfsuid(2),
setresuid(2), setreuid(2), and setuid(2).

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
 man2/getusers.2| 177 +
 man2/seteuid.2 |  10 ++-
 man2/setfsuid.2|   4 +-
 man2/setresuid.2   |   8 ++-
 man2/setreuid.2|  12 +++-
 man2/setuid.2  |   7 ++-
 man2/setusers.2|   1 +
 man7/credentials.7 |  21 ++-
 8 files changed, 229 insertions(+), 11 deletions(-)
 create mode 100644 man2/getusers.2
 create mode 100644 man2/setusers.2

diff --git a/man2/getusers.2 b/man2/getusers.2
new file mode 100644
index 000..e5dd371
--- /dev/null
+++ b/man2/getusers.2
@@ -0,0 +1,177 @@
+.\ Copyright 2014 Josh Triplett j...@joshtriplett.org
+.\ Based on getgroups.2:
+.\ Copyright 1993 Rickard E. Faith (fa...@cs.unc.edu)
+.\
+.\ %%%LICENSE_START(VERBATIM)
+.\ Permission is granted to make and distribute verbatim copies of this
+.\ manual provided the copyright notice and this permission notice are
+.\ preserved on all copies.
+.\
+.\ Permission is granted to copy and distribute modified versions of this
+.\ manual under the conditions for verbatim copying, provided that the
+.\ entire resulting derived work is distributed under the terms of a
+.\ permission notice identical to this one.
+.\
+.\ Since the Linux kernel and libraries are constantly changing, this
+.\ manual page may be incorrect or out-of-date.  The author(s) assume no
+.\ responsibility for errors or omissions, or for damages resulting from
+.\ the use of the information contained herein.  The author(s) may not
+.\ have taken the same level of care in the production of this manual,
+.\ which is licensed free of charge, as they might when working
+.\ professionally.
+.\
+.\ Formatted or processed versions of this manual, if unaccompanied by
+.\ the source, must acknowledge the copyright and authors of this work.
+.\ %%%LICENSE_END
+.\
+.\ Modified Thu Oct 31 12:04:29 1996 by Eric S. Raymond e...@thyrsus.com
+.\ Modified, 27 May 2004, Michael Kerrisk mtk.manpa...@gmail.com
+.\ Added notes on capability requirements
+.\ 2008-05-03, mtk, expanded and rewrote parts of DESCRIPTION and RETURN
+.\ VALUE, made style of page more consistent with man-pages style.
+.\
+.TH GETUSERS 2 2014-10-20 Linux Linux Programmer's Manual
+.SH NAME
+getusers, setusers \- get/set list of supplementary user IDs
+.SH SYNOPSIS
+.B #include sys/types.h
+.br
+.B #include unistd.h
+.sp
+.BI int getusers(int  size , uid_t * list );
+.sp
+.BI int setusers(int  size , const uid_t * list );
+.sp
+.in -4n
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.in
+.sp
+.BR getusers (),
+.BR setusers ():
+_GNU_SOURCE
+.SH DESCRIPTION
+.PP
+.BR getusers ()
+returns the supplementary user IDs of the calling process in
+.IR list .
+The argument
+.I size
+should be set to the maximum number of items that can be stored in the
+buffer pointed to by
+.IR list .
+If the calling process has more than
+.I size
+supplementary user IDs,
+.BR getusers ()
+returns an error.
+The returned list may or may not include the real, effective, saved, or 
filesystem user ID of the calling process.
+
+If
+.I size
+is zero,
+.I list
+is not modified, but the total number of supplementary user IDs for the
+process is returned.
+This allows the caller to determine the size of a dynamically allocated
+.I list
+to be used in a further call to
+.BR getusers ().
+.PP
+.BR setusers ()
+sets the supplementary user IDs for the calling process.
+The
+.I size
+argument specifies the number of supplementary user IDs
+in the buffer pointed to by
+.IR list .
+
+Any process may drop user IDs from its supplementary list; however, adding user
+IDs not already present in the supplementary list or in the real, effective, or
+saved user ID requires the
+.B CAP_SETUID
+capability.
+.SH RETURN VALUE
+On success,
+.BR getusers ()
+returns the number of supplementary user IDs.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+
+On success,
+.BR setusers ()
+returns 0.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+.TP
+.B EFAULT
+.I list
+has an invalid address.
+.PP
+.BR getusers ()
+can additionally fail with the following error:
+.TP
+.B EINVAL
+.I size
+is less than the number of supplementary user IDs, but is not zero.
+.PP
+.BR setusers ()
+can additionally fail with the following errors:
+.TP
+.B EINVAL
+.I size
+is greater than
+.B NUSERS_MAX
+(65536).
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B EPERM
+.I list
+contains user IDs not present in the current supplemental user IDs list,
+and not equal to the real, effective, or saved user ID, and the calling
+process does not have
+.BR CAP_SETUID .
+.SH CONFORMING TO
+.BR getusers (),
+.BR setusers (),
+and supplementary user IDs in general are Linux-specific and should not
+be used

[PATCH] Add supplementary UIDs, and getusers/setusers system calls

2014-11-15 Thread Josh Triplett

Analogous to the supplementary GID list, the supplementary UID list
provides a set of additional user credentials that a process can act as.
A process with CAP_SETUID can set its UID list arbitrarily; a process
without CAP_SETUID can only reduce its UID list.

This allows each user to have a set of UIDs that they can then use to
further sandbox individual child processes without first escalating to
root to change UIDs.  For instance, a PAM module could give each user a
block of UIDs to work with.

Tested via the following test program:

#include err.h
#include stdio.h
#include sys/syscall.h
#include unistd.h

static int getusers(int count, uid_t *uids)
{
return syscall(322, count, uids);
}

static int setusers(int count, const uid_t *uids)
{
return syscall(323, count, uids);
}

static void show_users(void)
{
uid_t uids[65536];
int i, count = getusers(65536, uids);
if (count  0)
err(1, getusers);
printf(UIDs:);
for (i = 0; i  count; i++)
printf( %u, (unsigned)uids[i]);
printf(\n);
}

int main(void)
{
uid_t list1[] = { 1, 2, 3, 4, 5 };
uid_t list2[] = { 1, 2, 3, 4 };
uid_t list3[] = { 2, 3, 4 };
show_users();
if (setusers(5, list1)  0)
err(1, setusers 1);
show_users();
if (setresgid(1, 1, 1)  0)
err(1, setresgid);
if (setresuid(1, 1, 1)  0)
err(1, setresuid);
if (setusers(4, list2)  0)
err(1, setusers 2);
show_users();
if (setusers(3, list3)  0)
err(1, setusers 3);
show_users();
if (setusers(4, list2)  0)
err(1, setusers 4);
show_users();
if (setresuid(2, 2, 2)  0)
err(1, setresuid 2);
if (setusers(5, list1)  0)
err(1, setusers 5);
show_users();

return 0;
}

In this test, all but the last call to setusers succeeds; the last call
fails with EPERM because the unprivileged process attempts to add UID 5
to the supplementary UID list, which it does not currently have.

Signed-off-by: Josh Triplett j...@joshtriplett.org
---
 arch/x86/syscalls/syscall_32.tbl  |   2 +
 arch/x86/syscalls/syscall_64.tbl  |   2 +
 include/linux/cred.h  |  66 +++
 include/linux/syscalls.h  |   2 +
 include/uapi/asm-generic/unistd.h |   6 +-
 include/uapi/linux/limits.h   |   1 +
 init/Kconfig  |   9 ++
 kernel/cred.c |   4 +
 kernel/groups.c   | 173 ++
 kernel/sys.c  |  21 +++--
 kernel/sys_ni.c   |   2 +
 11 files changed, 280 insertions(+), 8 deletions(-)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..55717d7 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,5 @@
 355i386getrandom   sys_getrandom
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
+358i386getuserssys_getusers
+359i386setuserssys_setusers
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..5572e67 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,8 @@
 319common  memfd_createsys_memfd_create
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
+322common  getuserssys_getusers
+323common  setuserssys_setusers
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/cred.h b/include/linux/cred.h
index b2d0820..31169fe 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -76,6 +76,8 @@ extern int groups_search(const struct group_info *, kgid_t);
 extern int in_group_p(kgid_t);
 extern int in_egroup_p(kgid_t);
 
+struct user_info;
+
 /*
  * The security context of a task
  *
@@ -135,6 +137,12 @@ struct cred {
struct user_struct *user;   /* real user ID subscription */
struct user_namespace *user_ns; /* user_ns the caps and keyrings are 
relative to. */
struct group_info *group_info;  /* supplementary groups for euid/fsgid 
*/
+#ifdef CONFIG_SUPPLEMENTARY_UIDS
+   struct user_info *user_info;/* supplementary users */
+#define INIT_USER_INFO .user_info = init_users,
+#else
+#define INIT_USER_INFO
+#endif
struct rcu_head rcu;/* RCU deletion hook */
 };
 
@@ -381,4 +389,62 @@ do {   \
*(_fsgid) = __cred-fsgid;  \
 } while(0)
 
+#ifdef CONFIG_SUPPLEMENTARY_UIDS
+struct user_info {
+   atomic_tusage;
+   int nusers;
+   int nblocks;
+   kuid_t  small_block[NGROUPS_SMALL];
+   kuid_t  *blocks[0];
+};
+
+#define USER_AT(ui, i) GROUP_AT(ui, i)
+extern struct user_info init_users

Re: [PATCH] Add supplementary UIDs, and getusers/setusers system calls

2014-11-15 Thread Josh Triplett

On Sat, Nov 15, 2014 at 11:08:31PM -0800, Josh Triplett wrote:
  asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist);
 +asmlinkage long sys_setusers(int uidsetsize, uid_t __user *grouplist);

Obvious typo here: s/grouplist/userlist/.  Will fix in a v2, but I'll
wait for other feedback on v1 first.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build warnings after merge of the tiny tree

2014-11-14 Thread Josh Triplett

On Tue, Nov 11, 2014 at 05:43:04PM +1100, Stephen Rothwell wrote:
> Hi Josh,
> 
> After merging the tiny tree, today's linux-next build (powerpc64
> allnoconfig) produced these warnings:
> 
> kernel/compat.c:33:12: warning: 'compat_get_timex' defined but not used 
> [-Wunused-function]
>  static int compat_get_timex(struct timex *txc, struct compat_timex __user 
> *utp)
> ^
> kernel/compat.c:63:12: warning: 'compat_put_timex' defined but not used 
> [-Wunused-function]
>  static int compat_put_timex(struct compat_timex __user *utp, struct timex 
> *txc)
> ^
> 
> Introduced by commit 7beb114f18e0 ("kernel: time: Compile out NTP
> support").  Neither of these functions is needed if CONFIG_NTP is not
> set.

Fixed.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build warnings after merge of the tiny tree

2014-11-14 Thread Josh Triplett

On Tue, Nov 11, 2014 at 05:43:04PM +1100, Stephen Rothwell wrote:
 Hi Josh,
 
 After merging the tiny tree, today's linux-next build (powerpc64
 allnoconfig) produced these warnings:
 
 kernel/compat.c:33:12: warning: 'compat_get_timex' defined but not used 
 [-Wunused-function]
  static int compat_get_timex(struct timex *txc, struct compat_timex __user 
 *utp)
 ^
 kernel/compat.c:63:12: warning: 'compat_put_timex' defined but not used 
 [-Wunused-function]
  static int compat_put_timex(struct compat_timex __user *utp, struct timex 
 *txc)
 ^
 
 Introduced by commit 7beb114f18e0 (kernel: time: Compile out NTP
 support).  Neither of these functions is needed if CONFIG_NTP is not
 set.

Fixed.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with Linus' tree

2014-11-10 Thread Josh Triplett

On Mon, Nov 10, 2014 at 02:28:03PM +1100, Stephen Rothwell wrote:
> Hi Josh,
> 
> Today's linux-next merge of the tiny tree got a conflict in net/Kconfig
> between commit f89b7755f517 ("bpf: split eBPF out of NET") from Linus'
> tree and commit 4ecea0db79ef ("lib: rhashtable: Make rhashtable.c
> optional") from the tiny tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

This resolution looks correct to me.

- Josh Triplett

> diff --cc net/Kconfig
> index 99815b5454bf,02badd46823f..
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@@ -6,7 -6,8 +6,8 @@@ menuconfig NE
>   bool "Networking support"
>   select NLATTR
>   select GENERIC_NET_UTILS
>  -select ANON_INODES
>  +select BPF
> + select RHASHTABLE
>   ---help---
> Unless you really know what you are doing, you should say Y here.
> The reason is that some programs need kernel networking support even


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the akpm-current tree with the tiny tree

2014-11-10 Thread Josh Triplett

On Mon, Nov 10, 2014 at 07:59:12PM +1100, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in
> lib/Kconfig between commit 3a8aeefbd4e9 ("lib: halfmd4: Make halfmd4.c
> optional") from the tiny tree and commit c7f379b518a3 ("lib: Add CRC64
> ECMA module") from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Looks good.

- Josh Triplett

> diff --cc lib/Kconfig
> index b8c97f05140d,2faf7b2de5b3..
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@@ -186,9 -180,13 +186,16 @@@ config CRC
> when they need to do cyclic redundancy check according CRC8
> algorithm. Module will be called crc8.
>   
> + config CRC64_ECMA
> + tristate "CRC64 ECMA function"
> + help
> +   This option provides CRC64 ECMA function. Drivers may select this
> +   when they need to do cyclic redundancy check according to the CRC64
> +   ECMA algorithm.
> + 
>  +config HALFMD4
>  +bool
>  +
>   config AUDIT_GENERIC
>   bool
>   depends on AUDIT && !AUDIT_ARCH


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the net-next tree

2014-11-10 Thread Josh Triplett

On Mon, Nov 10, 2014 at 02:25:11PM +1100, Stephen Rothwell wrote:
> Hi Josh,
> 
> Today's linux-next merge of the tiny tree got a conflict in
> lib/Makefile between commit e5a2c8999576 ("fast_hash: avoid indirect
> function calls") from the net-next tree and commit 4ecea0db79ef ("lib:
> rhashtable: Make rhashtable.c optional") from the tiny tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

This resolution looks correct to me.

- Josh Triplett

> diff --cc lib/Makefile
> index 04e53dd16070,47b8305288e2..
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@@ -22,11 -22,14 +22,14 @@@ lib-$(CONFIG_SMP) += cpumask.
>   lib-y   += kobject.o klist.o
>   obj-y   += lockref.o
>   
> - obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
> + obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
>bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
> -  gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
> +  gcd.o lcm.o list_sort.o uuid.o iovec.o clz_ctz.o \
>bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
> -  percpu-refcount.o percpu_ida.o rhashtable.o
>  - percpu-refcount.o percpu_ida.o hash.o
> ++ percpu-refcount.o percpu_ida.o
> + obj-$(CONFIG_FLEX_ARRAY) += flex_array.o
> + obj-$(CONFIG_HALFMD4) += halfmd4.o
> + obj-$(CONFIG_RHASHTABLE) += rhashtable.o
>   obj-y += string_helpers.o
>   obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>   obj-y += kstrtox.o


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the tiny tree

2014-11-10 Thread Josh Triplett

On Sun, Nov 09, 2014 at 11:56:04PM -0800, Josh Triplett wrote:
> On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote:
> > Hi Josh,
> > 
> > After merging the tiny tree, today's linux-next build (powerpc64 
> > allnoconfig)
> > failed like this:
> > 
> > arch/powerpc/kernel/built-in.o: In function `sys_call_table':
> > (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex'
> > arch/powerpc/kernel/built-in.o: In function `sys_call_table':
> > (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime'
> > 
> > Caused by commit be1e48a8e48f ("kernel: time: Compile out NTP support").
> 
> Good catch, thanks.  Not clear why that build failure didn't show up on
> x86.
> 
> Looks like the two compat syscalls need cond_syscall lines as well.
> I'll add those to the patch.

Done in current tiny/next.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the tiny tree

2014-11-10 Thread Josh Triplett

On Sun, Nov 09, 2014 at 11:56:04PM -0800, Josh Triplett wrote:
 On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote:
  Hi Josh,
  
  After merging the tiny tree, today's linux-next build (powerpc64 
  allnoconfig)
  failed like this:
  
  arch/powerpc/kernel/built-in.o: In function `sys_call_table':
  (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex'
  arch/powerpc/kernel/built-in.o: In function `sys_call_table':
  (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime'
  
  Caused by commit be1e48a8e48f (kernel: time: Compile out NTP support).
 
 Good catch, thanks.  Not clear why that build failure didn't show up on
 x86.
 
 Looks like the two compat syscalls need cond_syscall lines as well.
 I'll add those to the patch.

Done in current tiny/next.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with the net-next tree

2014-11-10 Thread Josh Triplett

On Mon, Nov 10, 2014 at 02:25:11PM +1100, Stephen Rothwell wrote:
 Hi Josh,
 
 Today's linux-next merge of the tiny tree got a conflict in
 lib/Makefile between commit e5a2c8999576 (fast_hash: avoid indirect
 function calls) from the net-next tree and commit 4ecea0db79ef (lib:
 rhashtable: Make rhashtable.c optional) from the tiny tree.
 
 I fixed it up (see below) and can carry the fix as necessary (no action
 is required).

This resolution looks correct to me.

- Josh Triplett

 diff --cc lib/Makefile
 index 04e53dd16070,47b8305288e2..
 --- a/lib/Makefile
 +++ b/lib/Makefile
 @@@ -22,11 -22,14 +22,14 @@@ lib-$(CONFIG_SMP) += cpumask.
   lib-y   += kobject.o klist.o
   obj-y   += lockref.o
   
 - obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 + obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
 -  gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
 +  gcd.o lcm.o list_sort.o uuid.o iovec.o clz_ctz.o \
bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
 -  percpu-refcount.o percpu_ida.o rhashtable.o
  - percpu-refcount.o percpu_ida.o hash.o
 ++ percpu-refcount.o percpu_ida.o
 + obj-$(CONFIG_FLEX_ARRAY) += flex_array.o
 + obj-$(CONFIG_HALFMD4) += halfmd4.o
 + obj-$(CONFIG_RHASHTABLE) += rhashtable.o
   obj-y += string_helpers.o
   obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
   obj-y += kstrtox.o


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the akpm-current tree with the tiny tree

2014-11-10 Thread Josh Triplett

On Mon, Nov 10, 2014 at 07:59:12PM +1100, Stephen Rothwell wrote:
 Hi Andrew,
 
 Today's linux-next merge of the akpm-current tree got a conflict in
 lib/Kconfig between commit 3a8aeefbd4e9 (lib: halfmd4: Make halfmd4.c
 optional) from the tiny tree and commit c7f379b518a3 (lib: Add CRC64
 ECMA module) from the akpm-current tree.
 
 I fixed it up (see below) and can carry the fix as necessary (no action
 is required).

Looks good.

- Josh Triplett

 diff --cc lib/Kconfig
 index b8c97f05140d,2faf7b2de5b3..
 --- a/lib/Kconfig
 +++ b/lib/Kconfig
 @@@ -186,9 -180,13 +186,16 @@@ config CRC
 when they need to do cyclic redundancy check according CRC8
 algorithm. Module will be called crc8.
   
 + config CRC64_ECMA
 + tristate CRC64 ECMA function
 + help
 +   This option provides CRC64 ECMA function. Drivers may select this
 +   when they need to do cyclic redundancy check according to the CRC64
 +   ECMA algorithm.
 + 
  +config HALFMD4
  +bool
  +
   config AUDIT_GENERIC
   bool
   depends on AUDIT  !AUDIT_ARCH


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the tiny tree with Linus' tree

2014-11-10 Thread Josh Triplett

On Mon, Nov 10, 2014 at 02:28:03PM +1100, Stephen Rothwell wrote:
 Hi Josh,
 
 Today's linux-next merge of the tiny tree got a conflict in net/Kconfig
 between commit f89b7755f517 (bpf: split eBPF out of NET) from Linus'
 tree and commit 4ecea0db79ef (lib: rhashtable: Make rhashtable.c
 optional) from the tiny tree.
 
 I fixed it up (see below) and can carry the fix as necessary (no action
 is required).

This resolution looks correct to me.

- Josh Triplett

 diff --cc net/Kconfig
 index 99815b5454bf,02badd46823f..
 --- a/net/Kconfig
 +++ b/net/Kconfig
 @@@ -6,7 -6,8 +6,8 @@@ menuconfig NE
   bool Networking support
   select NLATTR
   select GENERIC_NET_UTILS
  -select ANON_INODES
  +select BPF
 + select RHASHTABLE
   ---help---
 Unless you really know what you are doing, you should say Y here.
 The reason is that some programs need kernel networking support even


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the tiny tree

2014-11-09 Thread Josh Triplett

On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote:
> Hi Josh,
> 
> After merging the tiny tree, today's linux-next build (powerpc64 allnoconfig)
> failed like this:
> 
> arch/powerpc/kernel/built-in.o: In function `sys_call_table':
> (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex'
> arch/powerpc/kernel/built-in.o: In function `sys_call_table':
> (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime'
> 
> Caused by commit be1e48a8e48f ("kernel: time: Compile out NTP support").

Good catch, thanks.  Not clear why that build failure didn't show up on
x86.

Looks like the two compat syscalls need cond_syscall lines as well.
I'll add those to the patch.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the tiny tree

2014-11-09 Thread Josh Triplett

On Mon, Nov 10, 2014 at 06:17:16PM +1100, Stephen Rothwell wrote:
 Hi Josh,
 
 After merging the tiny tree, today's linux-next build (powerpc64 allnoconfig)
 failed like this:
 
 arch/powerpc/kernel/built-in.o: In function `sys_call_table':
 (.rodata+0xd58): undefined reference to `.compat_sys_adjtimex'
 arch/powerpc/kernel/built-in.o: In function `sys_call_table':
 (.rodata+0x1b48): undefined reference to `.compat_sys_clock_adjtime'
 
 Caused by commit be1e48a8e48f (kernel: time: Compile out NTP support).

Good catch, thanks.  Not clear why that build failure didn't show up on
x86.

Looks like the two compat syscalls need cond_syscall lines as well.
I'll add those to the patch.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info

2014-11-08 Thread Josh Triplett

On Sat, Nov 08, 2014 at 12:49:29PM -0800, Joe Perches wrote:
> On Sat, 2014-11-08 at 11:01 -0800, Josh Triplett wrote:
> > On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote:
> > >  CREDITS |  7 +++
> > >  MAINTAINERS | 10 ++
> > >  2 files changed, 17 insertions(+)
> > 
> > Given the wide variety of folks who patch CREDITS and MAINTAINERS, might
> > I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS
> > file (perhaps in the same entry as get_maintainer.pl), so that
> > get_maintainer.pl does not fall back to git history for them?
> 
> Thanks but:
> 
> There is no MAINTAINERS/CREDITS maintainer.
> 
> I'm not volunteering.

Oh, I thought from all your work on file patterns and other such
improvements that you were the de-facto MAINTAINERS maintainer.

> Maybe you?

No thank you.

> But maybe something like this would help:
> 
> It has to have both an M: and an S: entry
> with maintained or supported to stop get_maintainers
> from doing a git history lookup.
> 
> ---
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0662378..584fd69 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5881,6 +5881,12 @@ F: drivers/mailbox/
>  F:   include/linux/mailbox_client.h
>  F:   include/linux/mailbox_controller.h
>  
> +MAINTAINERS/CREDITS
> +M:   linux-kernel@vger.kernel.org
> +S:   Maintained
> +F:   MAINTAINERS
> +F:   CREDITS

Sending something only to LKML seems likely to go un-noticed.  I think
we need a better solution than this.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] scripts/ksize: Add kernel build size report

2014-11-08 Thread Josh Triplett

On Sat, Nov 08, 2014 at 03:18:18PM -0800, Darren Hart wrote:
> ksize generates hierarchical build size reports from vmlinux, *.o, and
> built-in.o files.
> 
> ksize is useful in preparing minimal configurations and comparing
> similar configurations across kernel versions.
> 
> Signed-off-by: Darren Hart 
> Cc: Josh Triplett 

One comment below; with that addressed:
Reviewed-by: Josh Triplett 

> +def main(argv):
> +try:
> +opts, args = getopt.getopt(argv[1:], "dh", ["help"])
> +except getopt.GetoptError, err:
> +print '%s' % str(err)
> +usage()
> +return 2
> +
> +driver_detail = False
> +for o, a in opts:
> +if o == '-d':
> +driver_detail = True
> +elif o in ('-h', '--help'):
> +usage()
> +return 0
> +else:
> +assert False, "unhandled option"
> +
> +cols = term_width()
> +
> +# Determine kernel version
> +p = Popen("strings vmlinux | grep 'Linux version' | cut -d ' ' -f 3",
> +  shell=True, stdout=PIPE, stderr=PIPE)
> +version = p.communicate()[0].strip()

This seems like a very fragile, Perl-y way to obtain the kernel version.

I'd suggest either not including the version (just as bloat-o-meter
doesn't), or parsing it out using objdump -h -t and file offsetting
(looking for the offset and size of linux_banner).  Personally I'd go
with the former.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info

2014-11-08 Thread Josh Triplett

On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote:
>  CREDITS |  7 +++
>  MAINTAINERS | 10 ++
>  2 files changed, 17 insertions(+)

Given the wide variety of folks who patch CREDITS and MAINTAINERS, might
I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS
file (perhaps in the same entry as get_maintainer.pl), so that
get_maintainer.pl does not fall back to git history for them?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info

2014-11-08 Thread Josh Triplett

On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote:
  CREDITS |  7 +++
  MAINTAINERS | 10 ++
  2 files changed, 17 insertions(+)

Given the wide variety of folks who patch CREDITS and MAINTAINERS, might
I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS
file (perhaps in the same entry as get_maintainer.pl), so that
get_maintainer.pl does not fall back to git history for them?

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] scripts/ksize: Add kernel build size report

2014-11-08 Thread Josh Triplett

On Sat, Nov 08, 2014 at 03:18:18PM -0800, Darren Hart wrote:
 ksize generates hierarchical build size reports from vmlinux, *.o, and
 built-in.o files.
 
 ksize is useful in preparing minimal configurations and comparing
 similar configurations across kernel versions.
 
 Signed-off-by: Darren Hart dvh...@linux.intel.com
 Cc: Josh Triplett j...@joshtriplett.org

One comment below; with that addressed:
Reviewed-by: Josh Triplett j...@joshtriplett.org

 +def main(argv):
 +try:
 +opts, args = getopt.getopt(argv[1:], dh, [help])
 +except getopt.GetoptError, err:
 +print '%s' % str(err)
 +usage()
 +return 2
 +
 +driver_detail = False
 +for o, a in opts:
 +if o == '-d':
 +driver_detail = True
 +elif o in ('-h', '--help'):
 +usage()
 +return 0
 +else:
 +assert False, unhandled option
 +
 +cols = term_width()
 +
 +# Determine kernel version
 +p = Popen(strings vmlinux | grep 'Linux version' | cut -d ' ' -f 3,
 +  shell=True, stdout=PIPE, stderr=PIPE)
 +version = p.communicate()[0].strip()

This seems like a very fragile, Perl-y way to obtain the kernel version.

I'd suggest either not including the version (just as bloat-o-meter
doesn't), or parsing it out using objdump -h -t and file offsetting
(looking for the offset and size of linux_banner).  Personally I'd go
with the former.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 06/24] Update MAINTAINERS and CREDITS files with amdkfd info

2014-11-08 Thread Josh Triplett

On Sat, Nov 08, 2014 at 12:49:29PM -0800, Joe Perches wrote:
 On Sat, 2014-11-08 at 11:01 -0800, Josh Triplett wrote:
  On Sat, Nov 08, 2014 at 08:37:27PM +0200, Oded Gabbay wrote:
CREDITS |  7 +++
MAINTAINERS | 10 ++
2 files changed, 17 insertions(+)
  
  Given the wide variety of folks who patch CREDITS and MAINTAINERS, might
  I suggest adding CREDITS and MAINTAINERS themselves to the MAINTAINERS
  file (perhaps in the same entry as get_maintainer.pl), so that
  get_maintainer.pl does not fall back to git history for them?
 
 Thanks but:
 
 There is no MAINTAINERS/CREDITS maintainer.
 
 I'm not volunteering.

Oh, I thought from all your work on file patterns and other such
improvements that you were the de-facto MAINTAINERS maintainer.

 Maybe you?

No thank you.

 But maybe something like this would help:
 
 It has to have both an M: and an S: entry
 with maintained or supported to stop get_maintainers
 from doing a git history lookup.
 
 ---
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 0662378..584fd69 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -5881,6 +5881,12 @@ F: drivers/mailbox/
  F:   include/linux/mailbox_client.h
  F:   include/linux/mailbox_controller.h
  
 +MAINTAINERS/CREDITS
 +M:   linux-kernel@vger.kernel.org
 +S:   Maintained
 +F:   MAINTAINERS
 +F:   CREDITS

Sending something only to LKML seems likely to go un-noticed.  I think
we need a better solution than this.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lib/flex_array: make build optional

2014-11-06 Thread Josh Triplett

On Thu, Nov 06, 2014 at 04:50:56PM -0500, Aristeu Rozanski wrote:
> On Thu, Nov 06, 2014 at 01:44:54PM -0800, Dave Hansen wrote:
> > On 11/06/2014 01:33 PM, Dave Hansen wrote:
> > > On 11/06/2014 01:27 PM, Aristeu Rozanski wrote:
> > >> +config FLEX_ARRAY
> > >> +bool "Flexible array"
> > >> +default n
> > >> +help
> > >> +  This option enables an implementation of flexible arrays which
> > >> +  allows creating arrays of fixed size elements with an 
> > >> arbritrary
> > >> +  size without requiring the single allocation of a contiguous 
> > >> area.
> > >> +
> > >> +  See Documentation/flexible-arrays.txt
> > > 
> > > Is there any reason to expose this to the user via Kconfig?
> > > 
> > > No sane person would even turn it on if they don't need it.
> > 
> > IOW, I think you should just make it:
> > 
> > config FLEX_ARRAY
> > def_bool n
> 
> Joe Pershes complained on a similar patch on making it default to 'n'.
> Will rework the patches this way.

https://git.kernel.org/cgit/linux/kernel/git/josh/linux.git/commit/?h=tiny/unflex-array=6631d5fb4cf395ebd2dc0f2da05525b9d3436a3f

Already done.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lib/flex_array: make build optional

2014-11-06 Thread Josh Triplett

On Thu, Nov 06, 2014 at 04:50:56PM -0500, Aristeu Rozanski wrote:
 On Thu, Nov 06, 2014 at 01:44:54PM -0800, Dave Hansen wrote:
  On 11/06/2014 01:33 PM, Dave Hansen wrote:
   On 11/06/2014 01:27 PM, Aristeu Rozanski wrote:
   +config FLEX_ARRAY
   +bool Flexible array
   +default n
   +help
   +  This option enables an implementation of flexible arrays which
   +  allows creating arrays of fixed size elements with an 
   arbritrary
   +  size without requiring the single allocation of a contiguous 
   area.
   +
   +  See Documentation/flexible-arrays.txt
   
   Is there any reason to expose this to the user via Kconfig?
   
   No sane person would even turn it on if they don't need it.
  
  IOW, I think you should just make it:
  
  config FLEX_ARRAY
  def_bool n
 
 Joe Pershes complained on a similar patch on making it default to 'n'.
 Will rework the patches this way.

https://git.kernel.org/cgit/linux/kernel/git/josh/linux.git/commit/?h=tiny/unflex-arrayid=6631d5fb4cf395ebd2dc0f2da05525b9d3436a3f

Already done.

- Josh Triplett
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CodingStyle: Add a chapter on conditional compilation

2014-11-03 Thread Josh Triplett

On Mon, Nov 03, 2014 at 09:47:40AM -0800, Joe Perches wrote:
> On Mon, 2014-11-03 at 11:46 -0500, Jonathan Corbet wrote:
> > On Wed, 29 Oct 2014 11:15:17 -0700
> > Josh Triplett  wrote:
> > 
> > > Document several common practices and conventions regarding conditional
> > > compilation, most notably the preference for ifdefs in headers rather
> > > than .c files.
> > 
> > OK, I've picked this one up for my 3.19 docs pull.
> 
> I think that Al Viro's suggestion from awhile ago:
> 
> https://lkml.org/lkml/2013/3/20/388
> 
> could still be in CodingStyle somewhere or in
> another document like CodingStyleSuggestions.

I think that text needs some cleanup to better fit CodingStyle, but the
intent and recommendations definitely ought to go in.  A few of those
seem too far down the road of "don't stuff beans up your nose", and some
of them need shortening (just "don't put an else after an if condition
ending with break or return; remember to handle errors via break,
return, or continue, and outdent the subsequent code").

> Another thing that could go is the suggestion to
> use Lindent.
> 
> https://lkml.org/lkml/2013/2/11/390

Agreed completely.  We might consider coming up with settings for
clang-format, which seems like a far more capable replacement that
actually understands C.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] compiler: Correct macro parameter expansion problem

2014-11-03 Thread Josh Triplett

On Mon, Nov 03, 2014 at 06:34:13AM -0800, Jeff Kirsher wrote:
> From: Mark Rustad 
> 
> The macro __compiletime_error_fallback has an error in that it
> lacks parens around the expansion of an expression. It also
> lacks a conversion to a boolean value. The first problem can
> result in a mis-evaluation. The second problem can also result
> in an error if the value comes out negative. That would thwart
> the intended error generation. That is, the error being asserted
> would not in fact generate an error. Fix both problems by adding
> !!() to the expansion.
> 
> Signed-off-by: Mark Rustad 
> Tested-by: Aaron Brown 
> Signed-off-by: Jeff Kirsher 

Good catch.

Reviewed-by: Josh Triplett 

>  include/linux/compiler.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index d5ad7b1..59dc611 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -331,7 +331,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, 
> int val, int expect);
>   */
>  # ifndef __CHECKER__
>  #  define __compiletime_error_fallback(condition) \
> - do { ((void)sizeof(char[1 - 2 * condition])); } while (0)
> + do { ((void)sizeof(char[1 - 2 * !!(condition)])); } while (0)
>  # endif
>  #endif
>  #ifndef __compiletime_error_fallback
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 10/10] x86: Support compiling out userspace IO (iopl and ioperm)

2014-11-03 Thread Josh Triplett

On Mon, Nov 03, 2014 at 12:10:49PM +, One Thousand Gnomes wrote:
> On Sun, 2 Nov 2014 09:33:01 -0800
> Josh Triplett  wrote:
> 
> > On the vast majority of modern systems, no processes will use the
> > userspsace IO syscalls, iopl and ioperm.  Add a new config option,
> > CONFIG_X86_IOPORT, to support configuring them out of the kernel
> > entirely.  Most current systems do not run programs using these
> > syscalls, so X86_IOPORT does not depend on EXPERT, though it does still
> > default to y.
> 
> This isn't unreasonable but there are drivers with userspace helpers that
> use iopl/ioperm type functionality where you should be doing a SELECT of
> X86_IOPORT. The one that comes to mind is the uvesa driver. From a quick
> scan it may these days be the only mainstream one that needs the select
> adding.

Should kernel drivers really express dependencies that only their
(current instances of) corresponding userspace components need?
Something seems wrong about that.

> Some X servers for legacy cards still use io port access.

Sure, X servers using UMS rather than KMS seem like a common reason to
need this.

> There are also
> a couple of other highly non-obvious userspace users that hang on for
> some systems - eg some older servers DMI and error records can only by
> read via a real mode BIOS call so management tools have no choice but to
> go the lrmi/io path.

As with any userspace interface, some callers may potentially still
exist.  And this still has "default y", too, to avoid user surprises.

> Still makes sense IMHO.
> 
> From a code perspective however you could define IO_BITMAP_LONGS to 0,
> add an IO_BITMAP_SIZE (defined as LONGS + 1 or 0) and as far as I can see
> gcc would then optimise out a lot of the code you are ifdeffing

IO_BITMAP_LONGS already gets defined to (0/sizeof(long)).  And as far as
I can tell, that would only work for init_tss_io, not anything else.
Even then, that would only work with a zero-size array left around in
tss_struct, which doesn't seem appropriate.  The remaining ifdefs wrap
code that GCC could not constant-fold away, and making that code
constant-foldable seems significantly more invasive than the ifdefs.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 2693 matches

Mail list logo