Re: getsockopt(2)

2018-03-05 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>What is wrong with your idea of updatesockopt(2) ? Or maybe call it
>>getsockopt2() and only use it for the "get with extra argument" case.
>
>I guess getsockopt2/updatesockopt is not that bad after all. Perhaps
>we should go with that?

This is working for me:

Index: uipc_syscalls.c
===
RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.191
diff -u -r1.191 uipc_syscalls.c
--- uipc_syscalls.c 12 Feb 2018 16:01:35 -  1.191
+++ uipc_syscalls.c 6 Mar 2018 00:47:02 -
@@ -1266,6 +1266,68 @@
return error;
 }
 
+int
+sys_getsockopt2(struct lwp *l, const struct sys_getsockopt2_args *uap,
+register_t *retval)
+{
+   /* {
+   syscallarg(int) s;
+   syscallarg(int) level;
+   syscallarg(int) name;
+   syscallarg(void *)  val;
+   syscallarg(unsigned int *)  avalsize;
+   } */
+   struct sockopt  sopt;
+   struct socket   *so;
+   file_t  *fp;
+   unsigned intvalsize, len;
+   int error;
+
+   if (SCARG(uap, val) != NULL) {
+   error = copyin(SCARG(uap, avalsize), &valsize, sizeof(valsize));
+   if (error)
+   return error;
+   } else
+   valsize = 0;
+
+   if (valsize > MCLBYTES)
+   return EINVAL;
+
+   if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
+   return (error);
+
+   sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
+   if (valsize > 0) {
+   error = copyin(SCARG(uap, val), sopt.sopt_data, valsize);
+   if (error)
+   goto out;
+   }
+
+   if (fp->f_flag & FNOSIGPIPE)
+   so->so_options |= SO_NOSIGPIPE;
+   else
+   so->so_options &= ~SO_NOSIGPIPE;
+   error = sogetopt(so, &sopt);
+   if (error)
+   goto out;
+
+   if (valsize > 0) {
+   len = min(valsize, sopt.sopt_size);
+   error = copyout(sopt.sopt_data, SCARG(uap, val), len);
+   if (error)
+   goto out;
+
+   error = copyout(&len, SCARG(uap, avalsize), sizeof(len));
+   if (error)
+   goto out;
+   }
+
+ out:
+   sockopt_destroy(&sopt);
+   fd_putfile(SCARG(uap, s));
+   return error;
+}
+
 #ifdef PIPE_SOCKETPAIR
 
 int
Index: syscalls.master
===
RCS file: /cvsroot/src/sys/kern/syscalls.master,v
retrieving revision 1.291
diff -u -r1.291 syscalls.master
--- syscalls.master 6 Jan 2018 16:41:23 -   1.291
+++ syscalls.master 6 Mar 2018 00:47:03 -
@@ -986,3 +986,5 @@
siginfo_t *info); }
 482STD { int|sys||clock_getcpuclockid2(idtype_t idtype, \
id_t id, clockid_t *clock_id); }
+483STD RUMP{ int|sys||getsockopt2(int s, int level, int name, \
+   void *val, socklen_t *avalsize); }


Re: getsockopt(2)

2017-12-23 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>What is wrong with your idea of updatesockopt(2) ? Or maybe call it
>getsockopt2() and only use it for the "get with extra argument" case.

I guess getsockopt2/updatesockopt is not that bad after all. Perhaps
we should go with that?

christos



Re: getsockopt(2)

2017-12-23 Thread Robert Swindells

chris...@astron.com (Christos Zoulas)
>In article ,
>Robert Swindells   wrote:
>>There are about 40 options defined by SCTP. I would need to go through
>>the list to see how many of them were of which type.

FreeBSD has 32 options that require an input parameter to be able to
get them, we don't implement all of them yet.

There are only a couple of options that are really new socket
operations, they are used to implement the sctp_connectx() function
described in the lkml thread. They copy in an array of sockaddr structs
and return a uint32_t.

  Selectively copy out optval for setsockopt(2):
>>>
>>>I don't like breaking the API, or changing the other protocols. Too 
>>>intrusive.
>>
>>I'm not really in favour of this myself, I was just listing all the
>>alternatives.
>
>So we don't want this. But it is starting to become the only viable option.

What is wrong with your idea of updatesockopt(2) ? Or maybe call it
getsockopt2() and only use it for the "get with extra argument" case.

All the calls to it would be hidden in the userland SCTP library
functions.

  Add new syscall to read/write optval:
>>
>>It would be messy within the kernel to have to use setsockopt(2) to set
>>them and ioctl(2) to get them.
>
>Yes the asymmetry sucks... I meant use ioctl for both get and set then.

The API defined in RFC 6458 only provides an alternative function to
"get" options, to set them you have to use setsockopt(2), it isn't
possible to avoid the asymmetry.

>>One possible problem is that some of them are defined as taking variable
>>sized arguments, are there other ioctl(2) requests that do this ?
>
>That can be done. You can pass a struct to it and the struct can contain
>pointers... 

I found an example to copy in sys/dev/ic/ath.c.

  sctp_peeloff():
>>>
>>>Perhaps map it into ioctl, instead? Ioctl, the swiss knife syscall.
>>
>>I think this will work.
>
>That's the easy part though.

I have done this bit. Not tested yet as it needs the other stuff.



Re: getsockopt(2)

2017-12-18 Thread Robert Swindells

David Holland  wrote:
>On Mon, Dec 18, 2017 at 07:21:33PM +, David Holland wrote:
> > On Thu, Dec 07, 2017 at 03:04:32PM +, Robert Swindells wrote:
> >  > I wrote:
> >  >> Does anyone know why we don't use the input 'optlen' parameter to the
> >  >> getsockopt(2) syscall, we do write back to it on return.
> >  >>  [...]
> >  >> 
> >  >> There are also lots of places in sctp_usrreq that want to use it.
> >  >> 
> >  >> I can set it with the following patch (line numbers will be slightly
> >  >> out), but wondered if there was a reason for the current behaviour.
> >  >>  [...]
> >  > 
> >  > Has anyone had any other thoughts on this ?
> >  > 
> >  > I still think that the one line change shown here is correct, it will 
> >  > allocate a buffer that gets filled by the stuff to be copied back to
> >  > userspace.
> > 
> > The man page for getsockopt says [irrelevant stuff]
>
>Oops, I guess I am missing something.

The current code isn't wrong.

In the "get" path, sockopt_set() or sockopt_setmbuf() will allocate a
buffer if needed, most other values are ints and can go in the 4 byte
buffer in the sockopt.

I guess it avoids needing to free the buffer if there is an error while
building up the data to return.


Re: getsockopt(2)

2017-12-18 Thread David Holland
On Mon, Dec 18, 2017 at 07:21:33PM +, David Holland wrote:
 > On Thu, Dec 07, 2017 at 03:04:32PM +, Robert Swindells wrote:
 >  > I wrote:
 >  >> Does anyone know why we don't use the input 'optlen' parameter to the
 >  >> getsockopt(2) syscall, we do write back to it on return.
 >  >>  [...]
 >  >> 
 >  >> There are also lots of places in sctp_usrreq that want to use it.
 >  >> 
 >  >> I can set it with the following patch (line numbers will be slightly
 >  >> out), but wondered if there was a reason for the current behaviour.
 >  >>  [...]
 >  > 
 >  > Has anyone had any other thoughts on this ?
 >  > 
 >  > I still think that the one line change shown here is correct, it will 
 >  > allocate a buffer that gets filled by the stuff to be copied back to
 >  > userspace.
 > 
 > The man page for getsockopt says [irrelevant stuff]

Oops, I guess I am missing something.

-- 
David A. Holland
dholl...@netbsd.org


Re: getsockopt(2)

2017-12-18 Thread David Holland
On Thu, Dec 07, 2017 at 03:04:32PM +, Robert Swindells wrote:
 > I wrote:
 >> Does anyone know why we don't use the input 'optlen' parameter to the
 >> getsockopt(2) syscall, we do write back to it on return.
 >>  [...]
 >> 
 >> There are also lots of places in sctp_usrreq that want to use it.
 >> 
 >> I can set it with the following patch (line numbers will be slightly
 >> out), but wondered if there was a reason for the current behaviour.
 >>  [...]
 > 
 > Has anyone had any other thoughts on this ?
 > 
 > I still think that the one line change shown here is correct, it will 
 > allocate a buffer that gets filled by the stuff to be copied back to
 > userspace.

The man page for getsockopt says that the optlen argument should be
initialized with the size before calling getsockopt:

 The parameters optval and optlen are used to access option values for
 setsockopt().  For getsockopt() they identify a buffer in which the value
 for the requested option(s) are to be returned.  For getsockopt(), optlen
 is a value-result parameter, initially containing the size of the buffer
 pointed to by optval, and modified on return to indicate the actual size
 of the value returned.

This is also the traditional behavior of this type of call (compare
e.g. getpeername) and there's no particular reason getsockopt should
be different.

So I think (a) any code we have that doesn't initialize the value is
wrong and should be fixed; (b) any existing kernel code that blats
values out to the buffer without checking the user's length is wrong
and should be fixed; and (c) if this is an endemic problem we should
fix all the wrong user code and then version the syscall before
changing the kernel to behave as documented. (but hopefully that isn't
necessary)

Am I missing something? I'm not sure how the subsequent discussion
morphed into ways to hack setsockopt to serve as getsockopt...

-- 
David A. Holland
dholl...@netbsd.org


Re: getsockopt(2)

2017-12-17 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>There are about 40 options defined by SCTP. I would need to go through
>the list to see how many of them were of which type.
>
>There isn't any source shared between FreeBSD and NetBSD anymore, their
>stack has diverged a lot from KAME.

Ok.

>
>>>  Selectively copy out optval for setsockopt(2):
>>
>>I don't like breaking the API, or changing the other protocols. Too intrusive.
>
>I'm not really in favour of this myself, I was just listing all the
>alternatives.

So we don't want this. But it is starting to become the only viable option.

>>>  Add new syscall to read/write optval:
>
>It would be messy within the kernel to have to use setsockopt(2) to set
>them and ioctl(2) to get them.

Yes the asymmetry sucks... I meant use ioctl for both get and set then.

>One possible problem is that some of them are defined as taking variable
>sized arguments, are there other ioctl(2) requests that do this ?

That can be done. You can pass a struct to it and the struct can contain
pointers... 

>>>  sctp_peeloff():
>>
>>Perhaps map it into ioctl, instead? Ioctl, the swiss knife syscall.
>
>I think this will work.

That's the easy part though.

christos



Re: getsockopt(2)

2017-12-17 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>Options for NetBSD:
>>
>>  Copy in optval for getsockopt(2):
>>
>>Pro: Allows "get" of IP_IPSEC_POLICY using original KAME API.
>>Con: Diverges from standard. Broken KAME API hasn't been a problem.
>>
>>Wouldn't be too much work to move SCTP operations that were
>>triggered by setsockopt() requests in KAME sources to be triggered
>>by getsockopt() requests instead.
>
>Well, are the semantics though such as to "set" things? While the getsockopt
>call change the state? If yes, perhaps this is not the most intuitive way.
>How many calls are we talking about? I think leaving the source the same
>is a big plus, or coming up with a macro that works for both NetBSD and
>FreeBSD.

There are about 40 options defined by SCTP. I would need to go through
the list to see how many of them were of which type.

There isn't any source shared between FreeBSD and NetBSD anymore, their
stack has diverged a lot from KAME.

>>  Selectively copy out optval for setsockopt(2):
>>
>>Pro: no change to SCTP code.
>>Con: Breaks standard API. Need small change to all other protocols.
>
>I don't like breaking the API, or changing the other protocols. Too intrusive.

I'm not really in favour of this myself, I was just listing all the
alternatives.

>>  Add new syscall to read/write optval:
>>
>>Pro: Doesn't break standard for [set/get]sockopt() API.
>>Con: Doesn't fix IPSEC userland API.
>>
>>Could map onto PRCO_GETOPT within the kernel to minimize changes
>>to protocol code.
>
>We could use ioctl instead too :-)

I would need to check which options are defined to work with
setsockopt(2).

It would be messy within the kernel to have to use setsockopt(2) to set
them and ioctl(2) to get them.

One possible problem is that some of them are defined as taking variable
sized arguments, are there other ioctl(2) requests that do this ?

>>  sctp_peeloff():
>>
>>Add syscall for it:
>>
>>  Pro: Code from KAME builds under -current and is ready to commit.
>>  Con: Adds a new syscall.
>>
>>Map it onto a [set/get/update]sockopt() request:
>>
>>  Pro: Doesn't need a new syscall.
>>  Con: Would need to rewrite implementation. Operation needs to do
>>   file descriptor manipulation which doesn't belong in network
>>   protocol code.
>
>Perhaps map it into ioctl, instead? Ioctl, the swiss knife syscall.

I think this will work.


Re: getsockopt(2)

2017-12-14 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>chris...@astron.com (Christos Zoulas) wrote:
>>In article ,
>>Robert Swindells   wrote:
>>>
>>>chris...@astron.com (Christos Zoulas) wrote:
>>>>In article ,
>>>>Robert Swindells   wrote:
>>>>>
>>>>>chris...@astron.com (Christos Zoulas) wrote:
>>>>>>In article ,
>>>>>>
>>>>>>So depending on the contents of the sockval we do something different?
>>>>>
>>>>>FreeBSD does. The calls to copy in or out the data are scattered
>>>>>throughout the network stack.
>>>>
>>>>Is sockval always an int? 
>>>
>>>Do you mean optname or optval ?
>>
>>optval.
>
>Ok. No it isn't always an int.
>
>An example of setting something bigger than this would be
>IP_IPSEC_POLICY in sys/netinet/ip_output.c line 1206.
>
>The data for accept filters is bigger than an int too.
>
>>>The operation required for setsockopt() is to mostly set things but in
>>>certain cases to return an int value that results from setting stuff.
>>
>>I've lost track of what we are trying to achieve :-( Can you please
>>post an example?
>
>There are two categories of extra things that we need to support:
>
>1) Some "get" operations require an extra parameter to select the
>correct thing to return.
>
>The "get" side of IP_IPSEC_POLICY is currently wrapped in #ifdef 0
>as it won't work without an extra parameter.
>
>SCTP can have multiple endpoints defined for each socket, identified
>by an "association" handle, in order to get the correct socket value(s)
>the association needs to be passed in as well as the optname that
>identifies which type of value is required.
>
>2) In the SCTP code there are extra operations defined that are really
>extensions to the socket API, these could be mapped onto new syscalls
>but have mostly been implemented by being triggered by setsockopt(2)
>requests.
>
>>Also lets compile a table of choices that has:
>>
>>- description of what's done
>>- which os implemented it (if any)
>>- pros
>>- cons
>
>KAME:
>
>  Added code to sys/kern/uipc_syscalls.c to copy optval contents in and
>  out for both setsockopt(2) and getsockopt(2) when built for NetBSD.
>
>  Implements draft of SCTP API.
>
>  Added syscall for sctp_peeloff() operation.
>
>  Other SCTP operations map onto setsockopt(2) requests.
>
>FreeBSD
>
>  Copies optval into kernel as well as out for some getsockopt() and
>  setsockopt() requests, code is scattered throughout network stack.
>
>  Manpages don't match implementation, optval shown as 'const' in
>  manpage but isn't in kernel source.
>
>  Implements RFC 6458 SCTP API.
>
>  Has sctp_peeloff() as syscall.
>
>  Other SCTP operations map onto setsockopt(2) requests.
>
>Linux:
>
>  Copies optval into kernel as well as out for getsockopt() but only
>  for SCTP protocol sockets.
>  
>  Implements RFC 6458 SCTP API.
>
>  sctp_peeloff() is done through setsockopt() not a syscall.
>
>  Return value of setsockopt(2) used to copy back an int value
>  for extra SCTP socket operations:
>  <https://sourceforge.net/p/lksctp/mailman/message/18962336/>
>
>  Manpages nearly match implementation, missing is description of
>  meaning of positive return value from setsockopt(2) and that
>  getsockopt(2) can sometimes copy in optval.
>
>NetBSD:
>
>  Conforms strictly to standard for getsockopt(2) and setsockopt(2).
>
>  Manpages match implementation.
>
>  I suspect that the "get" of IP_IPSEC_POLICY has never worked since we
>  imported the IPSEC code.
>
>  Doesn't yet have way to pass extra parameter to "get" socket options.
>
>  Doesn't yet have sctp_peeloff().
>
>  Doesn't have a way of triggering extra SCTP socket operations and
>  getting results back to userland.


Thanks for documenting all of this!

>
>
>
>Options for NetBSD:
>
>  Copy in optval for getsockopt(2):
>
>Pro: Allows "get" of IP_IPSEC_POLICY using original KAME API.
>Con: Diverges from standard. Broken KAME API hasn't been a problem.
>
>Wouldn't be too much work to move SCTP operations that were
>triggered by setsockopt() requests in KAME sources to be triggered
>by getsockopt() requests instead.

Well, are the semantics though such as to "set" things? While the getsockopt
call change the state? If yes, perhaps this is not the most intuitive way.
How many calls are we talking about

Re: getsockopt(2)

2017-12-13 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>chris...@astron.com (Christos Zoulas) wrote:
>>>In article ,
>>>Robert Swindells   wrote:
>>>>
>>>>chris...@astron.com (Christos Zoulas) wrote:
>>>>>In article ,
>>>>>
>>>>>So depending on the contents of the sockval we do something different?
>>>>
>>>>FreeBSD does. The calls to copy in or out the data are scattered
>>>>throughout the network stack.
>>>
>>>Is sockval always an int? 
>>
>>Do you mean optname or optval ?
>
>optval.

Ok. No it isn't always an int.

An example of setting something bigger than this would be
IP_IPSEC_POLICY in sys/netinet/ip_output.c line 1206.

The data for accept filters is bigger than an int too.

>>The operation required for setsockopt() is to mostly set things but in
>>certain cases to return an int value that results from setting stuff.
>
>I've lost track of what we are trying to achieve :-( Can you please
>post an example?

There are two categories of extra things that we need to support:

1) Some "get" operations require an extra parameter to select the
correct thing to return.

The "get" side of IP_IPSEC_POLICY is currently wrapped in #ifdef 0
as it won't work without an extra parameter.

SCTP can have multiple endpoints defined for each socket, identified
by an "association" handle, in order to get the correct socket value(s)
the association needs to be passed in as well as the optname that
identifies which type of value is required.

2) In the SCTP code there are extra operations defined that are really
extensions to the socket API, these could be mapped onto new syscalls
but have mostly been implemented by being triggered by setsockopt(2)
requests.

>Also lets compile a table of choices that has:
>
>- description of what's done
>- which os implemented it (if any)
>- pros
>- cons

KAME:

  Added code to sys/kern/uipc_syscalls.c to copy optval contents in and
  out for both setsockopt(2) and getsockopt(2) when built for NetBSD.

  Implements draft of SCTP API.

  Added syscall for sctp_peeloff() operation.

  Other SCTP operations map onto setsockopt(2) requests.

FreeBSD

  Copies optval into kernel as well as out for some getsockopt() and
  setsockopt() requests, code is scattered throughout network stack.

  Manpages don't match implementation, optval shown as 'const' in
  manpage but isn't in kernel source.

  Implements RFC 6458 SCTP API.

  Has sctp_peeloff() as syscall.

  Other SCTP operations map onto setsockopt(2) requests.

Linux:

  Copies optval into kernel as well as out for getsockopt() but only
  for SCTP protocol sockets.
  
  Implements RFC 6458 SCTP API.

  sctp_peeloff() is done through setsockopt() not a syscall.

  Return value of setsockopt(2) used to copy back an int value
  for extra SCTP socket operations:
  <https://sourceforge.net/p/lksctp/mailman/message/18962336/>

  Manpages nearly match implementation, missing is description of
  meaning of positive return value from setsockopt(2) and that
  getsockopt(2) can sometimes copy in optval.

NetBSD:

  Conforms strictly to standard for getsockopt(2) and setsockopt(2).

  Manpages match implementation.

  I suspect that the "get" of IP_IPSEC_POLICY has never worked since we
  imported the IPSEC code.

  Doesn't yet have way to pass extra parameter to "get" socket options.

  Doesn't yet have sctp_peeloff().

  Doesn't have a way of triggering extra SCTP socket operations and
  getting results back to userland.



Options for NetBSD:

  Copy in optval for getsockopt(2):

Pro: Allows "get" of IP_IPSEC_POLICY using original KAME API.
Con: Diverges from standard. Broken KAME API hasn't been a problem.

Wouldn't be too much work to move SCTP operations that were
triggered by setsockopt() requests in KAME sources to be triggered
by getsockopt() requests instead.

  Selectively copy out optval for setsockopt(2):

Pro: no change to SCTP code.
Con: Breaks standard API. Need small change to all other protocols.

  Add new syscall to read/write optval:

Pro: Doesn't break standard for [set/get]sockopt() API.
Con: Doesn't fix IPSEC userland API.

Could map onto PRCO_GETOPT within the kernel to minimize changes
to protocol code.

  sctp_peeloff():

Add syscall for it:

  Pro: Code from KAME builds under -current and is ready to commit.
  Con: Adds a new syscall.

Map it onto a [set/get/update]sockopt() request:

  Pro: Doesn't need a new syscall.
  Con: Would need to rewrite implementation. Operation needs to do
   file descriptor manipulation which doesn't belong in network
   protocol code.



Re: getsockopt(2)

2017-12-12 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>chris...@astron.com (Christos Zoulas) wrote:
>>In article ,
>>Robert Swindells   wrote:
>>>
>>>chris...@astron.com (Christos Zoulas) wrote:
In article ,

So depending on the contents of the sockval we do something different?
>>>
>>>FreeBSD does. The calls to copy in or out the data are scattered
>>>throughout the network stack.
>>
>>Is sockval always an int? 
>
>Do you mean optname or optval ?

optval.

>The operation required for setsockopt() is to mostly set things but in
>certain cases to return an int value that results from setting stuff.

I've lost track of what we are trying to achieve :-( Can you please
post an example?


Also lets compile a table of choices that has:

- description of what's done
- which os implemented it (if any)
- pros
- cons

Or something along those lines so we can make a decision?

Thanks,

christos



Re: getsockopt(2)

2017-12-11 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>chris...@astron.com (Christos Zoulas) wrote:
>>>In article ,
>>>
>>>So depending on the contents of the sockval we do something different?
>>
>>FreeBSD does. The calls to copy in or out the data are scattered
>>throughout the network stack.
>
>Is sockval always an int? 

Do you mean optname or optval ?

>>Linux seems to cheat and make use of the fact that their errors are
>>negative to return a positive integer word value from
>>setsockopt(). Their implementation of getsockopt() copies in and out.
>
>Hmm, that is not what the man page claims :-)

Linux doesn't really do man pages though, I looked at the source. The
FreeBSD man page is wrong too.

There is a thread on lkml describing what Linux did.

>>I would still prefer to change both getsockopt(2) and setsockopt(2).
>>
>>I would make getsockopt always copy in the supplied optval argument.
>
>Well, it has to copyout, so presumably that won't break existing code.

I have run a kernel for a while with this change to getsockopt() and
it seemed fine.

>>For setsockopt, the protocol code can update sopt_size to indicate the
>>amount of data to be copied back. I would add a line to the PRCO_SETOPT
>>handler in most protocols to set sopt_size to 0 so that nothing was
>>copied back.
>
>That could break things; grepping for sopt_size finds many checks for it.
>There is also sockopt_setmbuf that can potentially allocate larger than
>MCLBYTES socket option?

I meant that I would set sopt_size to zero after all the processing
of the input data has been finished.

The only place that I could see mbufs being used was in ipsec, which
doesn't work because it needs these changes too.

>>The syscall definition for setsockopt() would need to be changed to
>>remove the 'const' from the optval argument.
>
>That is arguably the worst part of the change. The API for it is part of
>the standard. Well, we can go the linux way and make getsockopt read and
>write, but that makes me cringe. Perhaps adding an updatesockopt() is better?

It is the FreeBSD way as well.

The operation required for getsockopt() isn't really *updating* the
socket options, it is more "get with an extra parameter".

The operation required for setsockopt() is to mostly set things but in
certain cases to return an int value that results from setting stuff.


Re: getsockopt(2)

2017-12-11 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>chris...@astron.com (Christos Zoulas) wrote:
>>In article ,
>>
>>So depending on the contents of the sockval we do something different?
>
>FreeBSD does. The calls to copy in or out the data are scattered
>throughout the network stack.

Is sockval always an int? 

>Linux seems to cheat and make use of the fact that their errors are
>negative to return a positive integer word value from
>setsockopt(). Their implementation of getsockopt() copies in and out.

Hmm, that is not what the man page claims :-)

>I would still prefer to change both getsockopt(2) and setsockopt(2).
>
>I would make getsockopt always copy in the supplied optval argument.

Well, it has to copyout, so presumably that won't break existing code.

>For setsockopt, the protocol code can update sopt_size to indicate the
>amount of data to be copied back. I would add a line to the PRCO_SETOPT
>handler in most protocols to set sopt_size to 0 so that nothing was
>copied back.

That could break things; grepping for sopt_size finds many checks for it.
There is also sockopt_setmbuf that can potentially allocate larger than
MCLBYTES socket option?

>The syscall definition for setsockopt() would need to be changed to
>remove the 'const' from the optval argument.

That is arguably the worst part of the change. The API for it is part of
the standard. Well, we can go the linux way and make getsockopt read and
write, but that makes me cringe. Perhaps adding an updatesockopt() is better?

christos



Re: getsockopt(2)

2017-12-11 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>I wrote:
>>>Does anyone know why we don't use the input 'optlen' parameter to the
>>>getsockopt(2) syscall, we do write back to it on return.
>>>
>>>In ip_output() there is this, which suggests that someone had come
>>>across this before.
>>>
>>>#if 0   /* defined(IPSEC) */
>>>case IP_IPSEC_POLICY:
>>>{
>>>struct mbuf *m = NULL;
>>>
>>>/* XXX this will return EINVAL as sopt is empty */
>>>error = ipsec4_get_policy(inp, sopt->sopt_data,
>>>sopt->sopt_size, &m);
>>>if (error == 0)
>>>error = sockopt_setmbuf(sopt, m);
>>>break;
>>>}   
>>>#endif /*IPSEC*/  
>>>
>>>There are also lots of places in sctp_usrreq that want to use it.
>>>
>>>I can set it with the following patch (line numbers will be slightly
>>>out), but wondered if there was a reason for the current behaviour.
>>>
>>>Index: uipc_syscalls.c
>>>===
>>>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
>>>retrieving revision 1.187
>>>diff -u -r1.187 uipc_syscalls.c
>>>--- uipc_syscalls.c  20 Jun 2017 20:34:49 -  1.187
>>>+++ uipc_syscalls.c  14 Oct 2017 21:33:09 -
>>>@@ -1235,7 +1240,7 @@
>>> if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
>>> return (error);
>>> 
>>>-sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0);
>>>+sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
>>> 
>>> if (fp->f_flag & FNOSIGPIPE)
>>> so->so_options |= SO_NOSIGPIPE;
>>
>>Has anyone had any other thoughts on this ?
>
>Yes, if you don't protect valsize against MCLBYTES like setsockopt does,
>it can be used to DoS kernel. Otherwise is can be done. Is 2K enough?

Yes, 2K would be enough. Have added the check in my tree.

>>I'm thinking of adding an extra socket option, maybe SO_PARAM, that you
>>would use with setsockopt(2) to set a selector to be used by a following
>>getsockopt(2) call.
>
>This would require some state keeping in the kernel and might not be
>reliable.

I know, it is the only alternative that I can think of to changing
setsockopt(2).

>>This wouldn't fix the IPSEC usage without changing userland though.
>>
>>The alternative is to make both setsockopt(2) and getsockopt(2)
>>bidirectional.
>
>So depending on the contents of the sockval we do something different?

FreeBSD does. The calls to copy in or out the data are scattered
throughout the network stack.

Linux seems to cheat and make use of the fact that their errors are
negative to return a positive integer word value from
setsockopt(). Their implementation of getsockopt() copies in and out.

I would still prefer to change both getsockopt(2) and setsockopt(2).

I would make getsockopt always copy in the supplied optval argument.

For setsockopt, the protocol code can update sopt_size to indicate the
amount of data to be copied back. I would add a line to the PRCO_SETOPT
handler in most protocols to set sopt_size to 0 so that nothing was
copied back.

The syscall definition for setsockopt() would need to be changed to
remove the 'const' from the optval argument.

>>I haven't checked in the userland implementation of sctp_opt_info(3)
>>yet. I took the one from FreeBSD but can modify it to work with a
>>different API.
>
>Ok.

This still requires an alternative API to use. I haven't found an
example of an OS that hides one in its version of sctp_opt_info().


Re: getsockopt(2)

2017-12-08 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>I wrote:
>>Does anyone know why we don't use the input 'optlen' parameter to the
>>getsockopt(2) syscall, we do write back to it on return.
>>
>>In ip_output() there is this, which suggests that someone had come
>>across this before.
>>
>>#if 0   /* defined(IPSEC) */
>>case IP_IPSEC_POLICY:
>>{
>>struct mbuf *m = NULL;
>>
>>/* XXX this will return EINVAL as sopt is empty */
>>error = ipsec4_get_policy(inp, sopt->sopt_data,
>>sopt->sopt_size, &m);
>>if (error == 0)
>>error = sockopt_setmbuf(sopt, m);
>>break;
>>}   
>>#endif /*IPSEC*/  
>>
>>There are also lots of places in sctp_usrreq that want to use it.
>>
>>I can set it with the following patch (line numbers will be slightly
>>out), but wondered if there was a reason for the current behaviour.
>>
>>Index: uipc_syscalls.c
>>===
>>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
>>retrieving revision 1.187
>>diff -u -r1.187 uipc_syscalls.c
>>--- uipc_syscalls.c   20 Jun 2017 20:34:49 -  1.187
>>+++ uipc_syscalls.c   14 Oct 2017 21:33:09 -
>>@@ -1235,7 +1240,7 @@
>>  if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
>>  return (error);
>> 
>>- sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0);
>>+ sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
>> 
>>  if (fp->f_flag & FNOSIGPIPE)
>>  so->so_options |= SO_NOSIGPIPE;
>
>Has anyone had any other thoughts on this ?

Yes, if you don't protect valsize against MCLBYTES like setsockopt does,
it can be used to DoS kernel. Otherwise is can be done. Is 2K enough?

>I'm thinking of adding an extra socket option, maybe SO_PARAM, that you
>would use with setsockopt(2) to set a selector to be used by a following
>getsockopt(2) call.

This would require some state keeping in the kernel and might not be
reliable.

>This wouldn't fix the IPSEC usage without changing userland though.
>
>The alternative is to make both setsockopt(2) and getsockopt(2)
>bidirectional.

So depending on the contents of the sockval we do something different?

>I haven't checked in the userland implementation of sctp_opt_info(3)
>yet. I took the one from FreeBSD but can modify it to work with a
>different API.

Ok.

christos



Re: getsockopt(2)

2017-12-07 Thread Robert Swindells

I wrote:
>Does anyone know why we don't use the input 'optlen' parameter to the
>getsockopt(2) syscall, we do write back to it on return.
>
>In ip_output() there is this, which suggests that someone had come
>across this before.
>
>#if 0   /* defined(IPSEC) */
>case IP_IPSEC_POLICY:
>{
>struct mbuf *m = NULL;
>
>/* XXX this will return EINVAL as sopt is empty */
>error = ipsec4_get_policy(inp, sopt->sopt_data,
>sopt->sopt_size, &m);
>if (error == 0)
>error = sockopt_setmbuf(sopt, m);
>break;
>}   
>#endif /*IPSEC*/  
>
>There are also lots of places in sctp_usrreq that want to use it.
>
>I can set it with the following patch (line numbers will be slightly
>out), but wondered if there was a reason for the current behaviour.
>
>Index: uipc_syscalls.c
>===
>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
>retrieving revision 1.187
>diff -u -r1.187 uipc_syscalls.c
>--- uipc_syscalls.c20 Jun 2017 20:34:49 -  1.187
>+++ uipc_syscalls.c14 Oct 2017 21:33:09 -
>@@ -1235,7 +1240,7 @@
>   if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
>   return (error);
> 
>-  sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0);
>+  sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
> 
>   if (fp->f_flag & FNOSIGPIPE)
>   so->so_options |= SO_NOSIGPIPE;

Has anyone had any other thoughts on this ?

I still think that the one line change shown here is correct, it will 
allocate a buffer that gets filled by the stuff to be copied back to
userspace.

The SCTP API that is defined in RFC 6458 does allow for the case where
getsockopt(2) and setsockopt(2) are both unidirectional. The API defines
a wrapper function sctp_opt_info(3) that can hide whatever mechanism
is used to pass into the kernel an extra parameter to be used to select
what is returned by getsockopt(2).

I'm thinking of adding an extra socket option, maybe SO_PARAM, that you
would use with setsockopt(2) to set a selector to be used by a following
getsockopt(2) call.

This wouldn't fix the IPSEC usage without changing userland though.

The alternative is to make both setsockopt(2) and getsockopt(2)
bidirectional.

I haven't checked in the userland implementation of sctp_opt_info(3)
yet. I took the one from FreeBSD but can modify it to work with a
different API.




Re: getsockopt(2)

2017-10-16 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>chris...@astron.com (Christos Zoulas) wrote:
>>>In article ,
>>>Robert Swindells   wrote:
>>>>
>>>>I wrote:
>>>>>Does anyone know why we don't use the input 'optlen' parameter to the
>>>>>getsockopt(2) syscall, we do write back to it on return.
>>
>>[snip]
>>
>>>Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to
>>>do so if it is useful to read the input to deduce the output. But that
>>>would make the call asymetric, so let's examine the examples where this
>>>is useful first.
>>
>>FreeBSD does copyin the value, but the code to do it is scattered around
>>the kernel, search for sooptcopyin() in their tree.
>
>Yes, aren't most of the in the sosetopt paths? I am looking for the sogetopt
>paths... I have not looked carefully to see if there are any.

It is done for TCP_CCALGOOPT and for all the SCTP options.

I had reused the FreeBSD userland for SCTP, I was trying to test it
before committing the final bits. I would prefer not to have to
redesign it.


Re: getsockopt(2)

2017-10-16 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>chris...@astron.com (Christos Zoulas) wrote:
>>In article ,
>>Robert Swindells   wrote:
>>>
>>>I wrote:
>>>>Does anyone know why we don't use the input 'optlen' parameter to the
>>>>getsockopt(2) syscall, we do write back to it on return.
>
>[snip]
>
>>Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to
>>do so if it is useful to read the input to deduce the output. But that
>>would make the call asymetric, so let's examine the examples where this
>>is useful first.
>
>FreeBSD does copyin the value, but the code to do it is scattered around
>the kernel, search for sooptcopyin() in their tree.

Yes, aren't most of the in the sosetopt paths? I am looking for the sogetopt
paths... I have not looked carefully to see if there are any.

christos



Re: getsockopt(2)

2017-10-16 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>I wrote:
>>>Does anyone know why we don't use the input 'optlen' parameter to the
>>>getsockopt(2) syscall, we do write back to it on return.

[snip]

>Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to
>do so if it is useful to read the input to deduce the output. But that
>would make the call asymetric, so let's examine the examples where this
>is useful first.

FreeBSD does copyin the value, but the code to do it is scattered around
the kernel, search for sooptcopyin() in their tree.

Robert Swindells


Re: getsockopt(2)

2017-10-15 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>I wrote:
>>Does anyone know why we don't use the input 'optlen' parameter to the
>>getsockopt(2) syscall, we do write back to it on return.
>>
>>In ip_output() there is this, which suggests that someone had come
>>across this before.
>>
>>#if 0   /* defined(IPSEC) */
>>case IP_IPSEC_POLICY:
>>{
>>struct mbuf *m = NULL;
>>
>>/* XXX this will return EINVAL as sopt is empty */
>>error = ipsec4_get_policy(inp, sopt->sopt_data,
>>sopt->sopt_size, &m);
>>if (error == 0)
>>error = sockopt_setmbuf(sopt, m);
>>break;
>>}   
>>#endif /*IPSEC*/  
>>
>>There are also lots of places in sctp_usrreq that want to use it.
>>
>>I can set it with the following patch (line numbers will be slightly
>>out), but wondered if there was a reason for the current behaviour.
>
>Version 2 of the patch is below, the uses of getsockopt(2) by IPSEC and
>SCTP expect that the optval data is copied into the kernel and back out
>again by the syscall.
>
>It looks like it was broken by this commit:
>
><http://mail-index.netbsd.org/source-changes/2008/08/06/msg208868.html>
>
>Index: uipc_syscalls.c
>===
>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
>retrieving revision 1.187
>diff -u -r1.187 uipc_syscalls.c
>--- uipc_syscalls.c20 Jun 2017 20:34:49 -  1.187
>+++ uipc_syscalls.c14 Oct 2017 22:24:15 -
>@@ -1235,7 +1240,13 @@
>   if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
>   return (error);
> 
>-  sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0);
>+  sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
>+
>+  if (valsize > 0) {
>+  error = copyin(SCARG(uap, val), sopt.sopt_data, valsize);
>+  if (error)
>+  goto out;
>+  }
> 
>   if (fp->f_flag & FNOSIGPIPE)
>   so->so_options |= SO_NOSIGPIPE;

Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to
do so if it is useful to read the input to deduce the output. But that
would make the call asymetric, so let's examine the examples where this
is useful first.

christos



Re: getsockopt(2)

2017-10-14 Thread Robert Swindells

I wrote:
>Does anyone know why we don't use the input 'optlen' parameter to the
>getsockopt(2) syscall, we do write back to it on return.
>
>In ip_output() there is this, which suggests that someone had come
>across this before.
>
>#if 0   /* defined(IPSEC) */
>case IP_IPSEC_POLICY:
>{
>struct mbuf *m = NULL;
>
>/* XXX this will return EINVAL as sopt is empty */
>error = ipsec4_get_policy(inp, sopt->sopt_data,
>sopt->sopt_size, &m);
>if (error == 0)
>error = sockopt_setmbuf(sopt, m);
>break;
>}   
>#endif /*IPSEC*/  
>
>There are also lots of places in sctp_usrreq that want to use it.
>
>I can set it with the following patch (line numbers will be slightly
>out), but wondered if there was a reason for the current behaviour.

Version 2 of the patch is below, the uses of getsockopt(2) by IPSEC and
SCTP expect that the optval data is copied into the kernel and back out
again by the syscall.

It looks like it was broken by this commit:

<http://mail-index.netbsd.org/source-changes/2008/08/06/msg208868.html>

Index: uipc_syscalls.c
===
RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.187
diff -u -r1.187 uipc_syscalls.c
--- uipc_syscalls.c 20 Jun 2017 20:34:49 -  1.187
+++ uipc_syscalls.c 14 Oct 2017 22:24:15 -
@@ -1235,7 +1240,13 @@
if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
return (error);
 
-   sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0);
+   sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
+
+   if (valsize > 0) {
+   error = copyin(SCARG(uap, val), sopt.sopt_data, valsize);
+   if (error)
+   goto out;
+   }
 
if (fp->f_flag & FNOSIGPIPE)
so->so_options |= SO_NOSIGPIPE;



getsockopt(2)

2017-10-14 Thread Robert Swindells

Does anyone know why we don't use the input 'optlen' parameter to the
getsockopt(2) syscall, we do write back to it on return.

In ip_output() there is this, which suggests that someone had come
across this before.

#if 0   /* defined(IPSEC) */
case IP_IPSEC_POLICY:
{
struct mbuf *m = NULL;

/* XXX this will return EINVAL as sopt is empty */
error = ipsec4_get_policy(inp, sopt->sopt_data,
sopt->sopt_size, &m);
if (error == 0)
error = sockopt_setmbuf(sopt, m);
break;
}   
#endif /*IPSEC*/  

There are also lots of places in sctp_usrreq that want to use it.

I can set it with the following patch (line numbers will be slightly
out), but wondered if there was a reason for the current behaviour.

Index: uipc_syscalls.c
===
RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.187
diff -u -r1.187 uipc_syscalls.c
--- uipc_syscalls.c 20 Jun 2017 20:34:49 -  1.187
+++ uipc_syscalls.c 14 Oct 2017 21:33:09 -
@@ -1235,7 +1240,7 @@
if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0)
return (error);
 
-   sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0);
+   sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize);
 
if (fp->f_flag & FNOSIGPIPE)
so->so_options |= SO_NOSIGPIPE;