Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >> >>What is wrong with your idea of updatesockopt(2) ? Or maybe call it >>getsockopt2() and only use it for the "get with extra argument" case. > >I guess getsockopt2/updatesockopt is not that bad after all. Perhaps >we should go with that? This is working for me: Index: uipc_syscalls.c === RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.191 diff -u -r1.191 uipc_syscalls.c --- uipc_syscalls.c 12 Feb 2018 16:01:35 - 1.191 +++ uipc_syscalls.c 6 Mar 2018 00:47:02 - @@ -1266,6 +1266,68 @@ return error; } +int +sys_getsockopt2(struct lwp *l, const struct sys_getsockopt2_args *uap, +register_t *retval) +{ + /* { + syscallarg(int) s; + syscallarg(int) level; + syscallarg(int) name; + syscallarg(void *) val; + syscallarg(unsigned int *) avalsize; + } */ + struct sockopt sopt; + struct socket *so; + file_t *fp; + unsigned intvalsize, len; + int error; + + if (SCARG(uap, val) != NULL) { + error = copyin(SCARG(uap, avalsize), &valsize, sizeof(valsize)); + if (error) + return error; + } else + valsize = 0; + + if (valsize > MCLBYTES) + return EINVAL; + + if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) + return (error); + + sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); + if (valsize > 0) { + error = copyin(SCARG(uap, val), sopt.sopt_data, valsize); + if (error) + goto out; + } + + if (fp->f_flag & FNOSIGPIPE) + so->so_options |= SO_NOSIGPIPE; + else + so->so_options &= ~SO_NOSIGPIPE; + error = sogetopt(so, &sopt); + if (error) + goto out; + + if (valsize > 0) { + len = min(valsize, sopt.sopt_size); + error = copyout(sopt.sopt_data, SCARG(uap, val), len); + if (error) + goto out; + + error = copyout(&len, SCARG(uap, avalsize), sizeof(len)); + if (error) + goto out; + } + + out: + sockopt_destroy(&sopt); + fd_putfile(SCARG(uap, s)); + return error; +} + #ifdef PIPE_SOCKETPAIR int Index: syscalls.master === RCS file: /cvsroot/src/sys/kern/syscalls.master,v retrieving revision 1.291 diff -u -r1.291 syscalls.master --- syscalls.master 6 Jan 2018 16:41:23 - 1.291 +++ syscalls.master 6 Mar 2018 00:47:03 - @@ -986,3 +986,5 @@ siginfo_t *info); } 482STD { int|sys||clock_getcpuclockid2(idtype_t idtype, \ id_t id, clockid_t *clock_id); } +483STD RUMP{ int|sys||getsockopt2(int s, int level, int name, \ + void *val, socklen_t *avalsize); }
Re: getsockopt(2)
In article , Robert Swindells wrote: > >What is wrong with your idea of updatesockopt(2) ? Or maybe call it >getsockopt2() and only use it for the "get with extra argument" case. I guess getsockopt2/updatesockopt is not that bad after all. Perhaps we should go with that? christos
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) >In article , >Robert Swindells wrote: >>There are about 40 options defined by SCTP. I would need to go through >>the list to see how many of them were of which type. FreeBSD has 32 options that require an input parameter to be able to get them, we don't implement all of them yet. There are only a couple of options that are really new socket operations, they are used to implement the sctp_connectx() function described in the lkml thread. They copy in an array of sockaddr structs and return a uint32_t. Selectively copy out optval for setsockopt(2): >>> >>>I don't like breaking the API, or changing the other protocols. Too >>>intrusive. >> >>I'm not really in favour of this myself, I was just listing all the >>alternatives. > >So we don't want this. But it is starting to become the only viable option. What is wrong with your idea of updatesockopt(2) ? Or maybe call it getsockopt2() and only use it for the "get with extra argument" case. All the calls to it would be hidden in the userland SCTP library functions. Add new syscall to read/write optval: >> >>It would be messy within the kernel to have to use setsockopt(2) to set >>them and ioctl(2) to get them. > >Yes the asymmetry sucks... I meant use ioctl for both get and set then. The API defined in RFC 6458 only provides an alternative function to "get" options, to set them you have to use setsockopt(2), it isn't possible to avoid the asymmetry. >>One possible problem is that some of them are defined as taking variable >>sized arguments, are there other ioctl(2) requests that do this ? > >That can be done. You can pass a struct to it and the struct can contain >pointers... I found an example to copy in sys/dev/ic/ath.c. sctp_peeloff(): >>> >>>Perhaps map it into ioctl, instead? Ioctl, the swiss knife syscall. >> >>I think this will work. > >That's the easy part though. I have done this bit. Not tested yet as it needs the other stuff.
Re: getsockopt(2)
David Holland wrote: >On Mon, Dec 18, 2017 at 07:21:33PM +, David Holland wrote: > > On Thu, Dec 07, 2017 at 03:04:32PM +, Robert Swindells wrote: > > > I wrote: > > >> Does anyone know why we don't use the input 'optlen' parameter to the > > >> getsockopt(2) syscall, we do write back to it on return. > > >> [...] > > >> > > >> There are also lots of places in sctp_usrreq that want to use it. > > >> > > >> I can set it with the following patch (line numbers will be slightly > > >> out), but wondered if there was a reason for the current behaviour. > > >> [...] > > > > > > Has anyone had any other thoughts on this ? > > > > > > I still think that the one line change shown here is correct, it will > > > allocate a buffer that gets filled by the stuff to be copied back to > > > userspace. > > > > The man page for getsockopt says [irrelevant stuff] > >Oops, I guess I am missing something. The current code isn't wrong. In the "get" path, sockopt_set() or sockopt_setmbuf() will allocate a buffer if needed, most other values are ints and can go in the 4 byte buffer in the sockopt. I guess it avoids needing to free the buffer if there is an error while building up the data to return.
Re: getsockopt(2)
On Mon, Dec 18, 2017 at 07:21:33PM +, David Holland wrote: > On Thu, Dec 07, 2017 at 03:04:32PM +, Robert Swindells wrote: > > I wrote: > >> Does anyone know why we don't use the input 'optlen' parameter to the > >> getsockopt(2) syscall, we do write back to it on return. > >> [...] > >> > >> There are also lots of places in sctp_usrreq that want to use it. > >> > >> I can set it with the following patch (line numbers will be slightly > >> out), but wondered if there was a reason for the current behaviour. > >> [...] > > > > Has anyone had any other thoughts on this ? > > > > I still think that the one line change shown here is correct, it will > > allocate a buffer that gets filled by the stuff to be copied back to > > userspace. > > The man page for getsockopt says [irrelevant stuff] Oops, I guess I am missing something. -- David A. Holland dholl...@netbsd.org
Re: getsockopt(2)
On Thu, Dec 07, 2017 at 03:04:32PM +, Robert Swindells wrote: > I wrote: >> Does anyone know why we don't use the input 'optlen' parameter to the >> getsockopt(2) syscall, we do write back to it on return. >> [...] >> >> There are also lots of places in sctp_usrreq that want to use it. >> >> I can set it with the following patch (line numbers will be slightly >> out), but wondered if there was a reason for the current behaviour. >> [...] > > Has anyone had any other thoughts on this ? > > I still think that the one line change shown here is correct, it will > allocate a buffer that gets filled by the stuff to be copied back to > userspace. The man page for getsockopt says that the optlen argument should be initialized with the size before calling getsockopt: The parameters optval and optlen are used to access option values for setsockopt(). For getsockopt() they identify a buffer in which the value for the requested option(s) are to be returned. For getsockopt(), optlen is a value-result parameter, initially containing the size of the buffer pointed to by optval, and modified on return to indicate the actual size of the value returned. This is also the traditional behavior of this type of call (compare e.g. getpeername) and there's no particular reason getsockopt should be different. So I think (a) any code we have that doesn't initialize the value is wrong and should be fixed; (b) any existing kernel code that blats values out to the buffer without checking the user's length is wrong and should be fixed; and (c) if this is an endemic problem we should fix all the wrong user code and then version the syscall before changing the kernel to behave as documented. (but hopefully that isn't necessary) Am I missing something? I'm not sure how the subsequent discussion morphed into ways to hack setsockopt to serve as getsockopt... -- David A. Holland dholl...@netbsd.org
Re: getsockopt(2)
In article , Robert Swindells wrote: > >There are about 40 options defined by SCTP. I would need to go through >the list to see how many of them were of which type. > >There isn't any source shared between FreeBSD and NetBSD anymore, their >stack has diverged a lot from KAME. Ok. > >>> Selectively copy out optval for setsockopt(2): >> >>I don't like breaking the API, or changing the other protocols. Too intrusive. > >I'm not really in favour of this myself, I was just listing all the >alternatives. So we don't want this. But it is starting to become the only viable option. >>> Add new syscall to read/write optval: > >It would be messy within the kernel to have to use setsockopt(2) to set >them and ioctl(2) to get them. Yes the asymmetry sucks... I meant use ioctl for both get and set then. >One possible problem is that some of them are defined as taking variable >sized arguments, are there other ioctl(2) requests that do this ? That can be done. You can pass a struct to it and the struct can contain pointers... >>> sctp_peeloff(): >> >>Perhaps map it into ioctl, instead? Ioctl, the swiss knife syscall. > >I think this will work. That's the easy part though. christos
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >>Options for NetBSD: >> >> Copy in optval for getsockopt(2): >> >>Pro: Allows "get" of IP_IPSEC_POLICY using original KAME API. >>Con: Diverges from standard. Broken KAME API hasn't been a problem. >> >>Wouldn't be too much work to move SCTP operations that were >>triggered by setsockopt() requests in KAME sources to be triggered >>by getsockopt() requests instead. > >Well, are the semantics though such as to "set" things? While the getsockopt >call change the state? If yes, perhaps this is not the most intuitive way. >How many calls are we talking about? I think leaving the source the same >is a big plus, or coming up with a macro that works for both NetBSD and >FreeBSD. There are about 40 options defined by SCTP. I would need to go through the list to see how many of them were of which type. There isn't any source shared between FreeBSD and NetBSD anymore, their stack has diverged a lot from KAME. >> Selectively copy out optval for setsockopt(2): >> >>Pro: no change to SCTP code. >>Con: Breaks standard API. Need small change to all other protocols. > >I don't like breaking the API, or changing the other protocols. Too intrusive. I'm not really in favour of this myself, I was just listing all the alternatives. >> Add new syscall to read/write optval: >> >>Pro: Doesn't break standard for [set/get]sockopt() API. >>Con: Doesn't fix IPSEC userland API. >> >>Could map onto PRCO_GETOPT within the kernel to minimize changes >>to protocol code. > >We could use ioctl instead too :-) I would need to check which options are defined to work with setsockopt(2). It would be messy within the kernel to have to use setsockopt(2) to set them and ioctl(2) to get them. One possible problem is that some of them are defined as taking variable sized arguments, are there other ioctl(2) requests that do this ? >> sctp_peeloff(): >> >>Add syscall for it: >> >> Pro: Code from KAME builds under -current and is ready to commit. >> Con: Adds a new syscall. >> >>Map it onto a [set/get/update]sockopt() request: >> >> Pro: Doesn't need a new syscall. >> Con: Would need to rewrite implementation. Operation needs to do >> file descriptor manipulation which doesn't belong in network >> protocol code. > >Perhaps map it into ioctl, instead? Ioctl, the swiss knife syscall. I think this will work.
Re: getsockopt(2)
In article , Robert Swindells wrote: > >chris...@astron.com (Christos Zoulas) wrote: >>In article , >>Robert Swindells wrote: >>> >>>chris...@astron.com (Christos Zoulas) wrote: >>>>In article , >>>>Robert Swindells wrote: >>>>> >>>>>chris...@astron.com (Christos Zoulas) wrote: >>>>>>In article , >>>>>> >>>>>>So depending on the contents of the sockval we do something different? >>>>> >>>>>FreeBSD does. The calls to copy in or out the data are scattered >>>>>throughout the network stack. >>>> >>>>Is sockval always an int? >>> >>>Do you mean optname or optval ? >> >>optval. > >Ok. No it isn't always an int. > >An example of setting something bigger than this would be >IP_IPSEC_POLICY in sys/netinet/ip_output.c line 1206. > >The data for accept filters is bigger than an int too. > >>>The operation required for setsockopt() is to mostly set things but in >>>certain cases to return an int value that results from setting stuff. >> >>I've lost track of what we are trying to achieve :-( Can you please >>post an example? > >There are two categories of extra things that we need to support: > >1) Some "get" operations require an extra parameter to select the >correct thing to return. > >The "get" side of IP_IPSEC_POLICY is currently wrapped in #ifdef 0 >as it won't work without an extra parameter. > >SCTP can have multiple endpoints defined for each socket, identified >by an "association" handle, in order to get the correct socket value(s) >the association needs to be passed in as well as the optname that >identifies which type of value is required. > >2) In the SCTP code there are extra operations defined that are really >extensions to the socket API, these could be mapped onto new syscalls >but have mostly been implemented by being triggered by setsockopt(2) >requests. > >>Also lets compile a table of choices that has: >> >>- description of what's done >>- which os implemented it (if any) >>- pros >>- cons > >KAME: > > Added code to sys/kern/uipc_syscalls.c to copy optval contents in and > out for both setsockopt(2) and getsockopt(2) when built for NetBSD. > > Implements draft of SCTP API. > > Added syscall for sctp_peeloff() operation. > > Other SCTP operations map onto setsockopt(2) requests. > >FreeBSD > > Copies optval into kernel as well as out for some getsockopt() and > setsockopt() requests, code is scattered throughout network stack. > > Manpages don't match implementation, optval shown as 'const' in > manpage but isn't in kernel source. > > Implements RFC 6458 SCTP API. > > Has sctp_peeloff() as syscall. > > Other SCTP operations map onto setsockopt(2) requests. > >Linux: > > Copies optval into kernel as well as out for getsockopt() but only > for SCTP protocol sockets. > > Implements RFC 6458 SCTP API. > > sctp_peeloff() is done through setsockopt() not a syscall. > > Return value of setsockopt(2) used to copy back an int value > for extra SCTP socket operations: > <https://sourceforge.net/p/lksctp/mailman/message/18962336/> > > Manpages nearly match implementation, missing is description of > meaning of positive return value from setsockopt(2) and that > getsockopt(2) can sometimes copy in optval. > >NetBSD: > > Conforms strictly to standard for getsockopt(2) and setsockopt(2). > > Manpages match implementation. > > I suspect that the "get" of IP_IPSEC_POLICY has never worked since we > imported the IPSEC code. > > Doesn't yet have way to pass extra parameter to "get" socket options. > > Doesn't yet have sctp_peeloff(). > > Doesn't have a way of triggering extra SCTP socket operations and > getting results back to userland. Thanks for documenting all of this! > > > >Options for NetBSD: > > Copy in optval for getsockopt(2): > >Pro: Allows "get" of IP_IPSEC_POLICY using original KAME API. >Con: Diverges from standard. Broken KAME API hasn't been a problem. > >Wouldn't be too much work to move SCTP operations that were >triggered by setsockopt() requests in KAME sources to be triggered >by getsockopt() requests instead. Well, are the semantics though such as to "set" things? While the getsockopt call change the state? If yes, perhaps this is not the most intuitive way. How many calls are we talking about
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >> >>chris...@astron.com (Christos Zoulas) wrote: >>>In article , >>>Robert Swindells wrote: >>>> >>>>chris...@astron.com (Christos Zoulas) wrote: >>>>>In article , >>>>> >>>>>So depending on the contents of the sockval we do something different? >>>> >>>>FreeBSD does. The calls to copy in or out the data are scattered >>>>throughout the network stack. >>> >>>Is sockval always an int? >> >>Do you mean optname or optval ? > >optval. Ok. No it isn't always an int. An example of setting something bigger than this would be IP_IPSEC_POLICY in sys/netinet/ip_output.c line 1206. The data for accept filters is bigger than an int too. >>The operation required for setsockopt() is to mostly set things but in >>certain cases to return an int value that results from setting stuff. > >I've lost track of what we are trying to achieve :-( Can you please >post an example? There are two categories of extra things that we need to support: 1) Some "get" operations require an extra parameter to select the correct thing to return. The "get" side of IP_IPSEC_POLICY is currently wrapped in #ifdef 0 as it won't work without an extra parameter. SCTP can have multiple endpoints defined for each socket, identified by an "association" handle, in order to get the correct socket value(s) the association needs to be passed in as well as the optname that identifies which type of value is required. 2) In the SCTP code there are extra operations defined that are really extensions to the socket API, these could be mapped onto new syscalls but have mostly been implemented by being triggered by setsockopt(2) requests. >Also lets compile a table of choices that has: > >- description of what's done >- which os implemented it (if any) >- pros >- cons KAME: Added code to sys/kern/uipc_syscalls.c to copy optval contents in and out for both setsockopt(2) and getsockopt(2) when built for NetBSD. Implements draft of SCTP API. Added syscall for sctp_peeloff() operation. Other SCTP operations map onto setsockopt(2) requests. FreeBSD Copies optval into kernel as well as out for some getsockopt() and setsockopt() requests, code is scattered throughout network stack. Manpages don't match implementation, optval shown as 'const' in manpage but isn't in kernel source. Implements RFC 6458 SCTP API. Has sctp_peeloff() as syscall. Other SCTP operations map onto setsockopt(2) requests. Linux: Copies optval into kernel as well as out for getsockopt() but only for SCTP protocol sockets. Implements RFC 6458 SCTP API. sctp_peeloff() is done through setsockopt() not a syscall. Return value of setsockopt(2) used to copy back an int value for extra SCTP socket operations: <https://sourceforge.net/p/lksctp/mailman/message/18962336/> Manpages nearly match implementation, missing is description of meaning of positive return value from setsockopt(2) and that getsockopt(2) can sometimes copy in optval. NetBSD: Conforms strictly to standard for getsockopt(2) and setsockopt(2). Manpages match implementation. I suspect that the "get" of IP_IPSEC_POLICY has never worked since we imported the IPSEC code. Doesn't yet have way to pass extra parameter to "get" socket options. Doesn't yet have sctp_peeloff(). Doesn't have a way of triggering extra SCTP socket operations and getting results back to userland. Options for NetBSD: Copy in optval for getsockopt(2): Pro: Allows "get" of IP_IPSEC_POLICY using original KAME API. Con: Diverges from standard. Broken KAME API hasn't been a problem. Wouldn't be too much work to move SCTP operations that were triggered by setsockopt() requests in KAME sources to be triggered by getsockopt() requests instead. Selectively copy out optval for setsockopt(2): Pro: no change to SCTP code. Con: Breaks standard API. Need small change to all other protocols. Add new syscall to read/write optval: Pro: Doesn't break standard for [set/get]sockopt() API. Con: Doesn't fix IPSEC userland API. Could map onto PRCO_GETOPT within the kernel to minimize changes to protocol code. sctp_peeloff(): Add syscall for it: Pro: Code from KAME builds under -current and is ready to commit. Con: Adds a new syscall. Map it onto a [set/get/update]sockopt() request: Pro: Doesn't need a new syscall. Con: Would need to rewrite implementation. Operation needs to do file descriptor manipulation which doesn't belong in network protocol code.
Re: getsockopt(2)
In article , Robert Swindells wrote: > >chris...@astron.com (Christos Zoulas) wrote: >>In article , >>Robert Swindells wrote: >>> >>>chris...@astron.com (Christos Zoulas) wrote: In article , So depending on the contents of the sockval we do something different? >>> >>>FreeBSD does. The calls to copy in or out the data are scattered >>>throughout the network stack. >> >>Is sockval always an int? > >Do you mean optname or optval ? optval. >The operation required for setsockopt() is to mostly set things but in >certain cases to return an int value that results from setting stuff. I've lost track of what we are trying to achieve :-( Can you please post an example? Also lets compile a table of choices that has: - description of what's done - which os implemented it (if any) - pros - cons Or something along those lines so we can make a decision? Thanks, christos
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >> >>chris...@astron.com (Christos Zoulas) wrote: >>>In article , >>> >>>So depending on the contents of the sockval we do something different? >> >>FreeBSD does. The calls to copy in or out the data are scattered >>throughout the network stack. > >Is sockval always an int? Do you mean optname or optval ? >>Linux seems to cheat and make use of the fact that their errors are >>negative to return a positive integer word value from >>setsockopt(). Their implementation of getsockopt() copies in and out. > >Hmm, that is not what the man page claims :-) Linux doesn't really do man pages though, I looked at the source. The FreeBSD man page is wrong too. There is a thread on lkml describing what Linux did. >>I would still prefer to change both getsockopt(2) and setsockopt(2). >> >>I would make getsockopt always copy in the supplied optval argument. > >Well, it has to copyout, so presumably that won't break existing code. I have run a kernel for a while with this change to getsockopt() and it seemed fine. >>For setsockopt, the protocol code can update sopt_size to indicate the >>amount of data to be copied back. I would add a line to the PRCO_SETOPT >>handler in most protocols to set sopt_size to 0 so that nothing was >>copied back. > >That could break things; grepping for sopt_size finds many checks for it. >There is also sockopt_setmbuf that can potentially allocate larger than >MCLBYTES socket option? I meant that I would set sopt_size to zero after all the processing of the input data has been finished. The only place that I could see mbufs being used was in ipsec, which doesn't work because it needs these changes too. >>The syscall definition for setsockopt() would need to be changed to >>remove the 'const' from the optval argument. > >That is arguably the worst part of the change. The API for it is part of >the standard. Well, we can go the linux way and make getsockopt read and >write, but that makes me cringe. Perhaps adding an updatesockopt() is better? It is the FreeBSD way as well. The operation required for getsockopt() isn't really *updating* the socket options, it is more "get with an extra parameter". The operation required for setsockopt() is to mostly set things but in certain cases to return an int value that results from setting stuff.
Re: getsockopt(2)
In article , Robert Swindells wrote: > >chris...@astron.com (Christos Zoulas) wrote: >>In article , >> >>So depending on the contents of the sockval we do something different? > >FreeBSD does. The calls to copy in or out the data are scattered >throughout the network stack. Is sockval always an int? >Linux seems to cheat and make use of the fact that their errors are >negative to return a positive integer word value from >setsockopt(). Their implementation of getsockopt() copies in and out. Hmm, that is not what the man page claims :-) >I would still prefer to change both getsockopt(2) and setsockopt(2). > >I would make getsockopt always copy in the supplied optval argument. Well, it has to copyout, so presumably that won't break existing code. >For setsockopt, the protocol code can update sopt_size to indicate the >amount of data to be copied back. I would add a line to the PRCO_SETOPT >handler in most protocols to set sopt_size to 0 so that nothing was >copied back. That could break things; grepping for sopt_size finds many checks for it. There is also sockopt_setmbuf that can potentially allocate larger than MCLBYTES socket option? >The syscall definition for setsockopt() would need to be changed to >remove the 'const' from the optval argument. That is arguably the worst part of the change. The API for it is part of the standard. Well, we can go the linux way and make getsockopt read and write, but that makes me cringe. Perhaps adding an updatesockopt() is better? christos
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >> >>I wrote: >>>Does anyone know why we don't use the input 'optlen' parameter to the >>>getsockopt(2) syscall, we do write back to it on return. >>> >>>In ip_output() there is this, which suggests that someone had come >>>across this before. >>> >>>#if 0 /* defined(IPSEC) */ >>>case IP_IPSEC_POLICY: >>>{ >>>struct mbuf *m = NULL; >>> >>>/* XXX this will return EINVAL as sopt is empty */ >>>error = ipsec4_get_policy(inp, sopt->sopt_data, >>>sopt->sopt_size, &m); >>>if (error == 0) >>>error = sockopt_setmbuf(sopt, m); >>>break; >>>} >>>#endif /*IPSEC*/ >>> >>>There are also lots of places in sctp_usrreq that want to use it. >>> >>>I can set it with the following patch (line numbers will be slightly >>>out), but wondered if there was a reason for the current behaviour. >>> >>>Index: uipc_syscalls.c >>>=== >>>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v >>>retrieving revision 1.187 >>>diff -u -r1.187 uipc_syscalls.c >>>--- uipc_syscalls.c 20 Jun 2017 20:34:49 - 1.187 >>>+++ uipc_syscalls.c 14 Oct 2017 21:33:09 - >>>@@ -1235,7 +1240,7 @@ >>> if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) >>> return (error); >>> >>>-sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0); >>>+sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); >>> >>> if (fp->f_flag & FNOSIGPIPE) >>> so->so_options |= SO_NOSIGPIPE; >> >>Has anyone had any other thoughts on this ? > >Yes, if you don't protect valsize against MCLBYTES like setsockopt does, >it can be used to DoS kernel. Otherwise is can be done. Is 2K enough? Yes, 2K would be enough. Have added the check in my tree. >>I'm thinking of adding an extra socket option, maybe SO_PARAM, that you >>would use with setsockopt(2) to set a selector to be used by a following >>getsockopt(2) call. > >This would require some state keeping in the kernel and might not be >reliable. I know, it is the only alternative that I can think of to changing setsockopt(2). >>This wouldn't fix the IPSEC usage without changing userland though. >> >>The alternative is to make both setsockopt(2) and getsockopt(2) >>bidirectional. > >So depending on the contents of the sockval we do something different? FreeBSD does. The calls to copy in or out the data are scattered throughout the network stack. Linux seems to cheat and make use of the fact that their errors are negative to return a positive integer word value from setsockopt(). Their implementation of getsockopt() copies in and out. I would still prefer to change both getsockopt(2) and setsockopt(2). I would make getsockopt always copy in the supplied optval argument. For setsockopt, the protocol code can update sopt_size to indicate the amount of data to be copied back. I would add a line to the PRCO_SETOPT handler in most protocols to set sopt_size to 0 so that nothing was copied back. The syscall definition for setsockopt() would need to be changed to remove the 'const' from the optval argument. >>I haven't checked in the userland implementation of sctp_opt_info(3) >>yet. I took the one from FreeBSD but can modify it to work with a >>different API. > >Ok. This still requires an alternative API to use. I haven't found an example of an OS that hides one in its version of sctp_opt_info().
Re: getsockopt(2)
In article , Robert Swindells wrote: > >I wrote: >>Does anyone know why we don't use the input 'optlen' parameter to the >>getsockopt(2) syscall, we do write back to it on return. >> >>In ip_output() there is this, which suggests that someone had come >>across this before. >> >>#if 0 /* defined(IPSEC) */ >>case IP_IPSEC_POLICY: >>{ >>struct mbuf *m = NULL; >> >>/* XXX this will return EINVAL as sopt is empty */ >>error = ipsec4_get_policy(inp, sopt->sopt_data, >>sopt->sopt_size, &m); >>if (error == 0) >>error = sockopt_setmbuf(sopt, m); >>break; >>} >>#endif /*IPSEC*/ >> >>There are also lots of places in sctp_usrreq that want to use it. >> >>I can set it with the following patch (line numbers will be slightly >>out), but wondered if there was a reason for the current behaviour. >> >>Index: uipc_syscalls.c >>=== >>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v >>retrieving revision 1.187 >>diff -u -r1.187 uipc_syscalls.c >>--- uipc_syscalls.c 20 Jun 2017 20:34:49 - 1.187 >>+++ uipc_syscalls.c 14 Oct 2017 21:33:09 - >>@@ -1235,7 +1240,7 @@ >> if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) >> return (error); >> >>- sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0); >>+ sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); >> >> if (fp->f_flag & FNOSIGPIPE) >> so->so_options |= SO_NOSIGPIPE; > >Has anyone had any other thoughts on this ? Yes, if you don't protect valsize against MCLBYTES like setsockopt does, it can be used to DoS kernel. Otherwise is can be done. Is 2K enough? >I'm thinking of adding an extra socket option, maybe SO_PARAM, that you >would use with setsockopt(2) to set a selector to be used by a following >getsockopt(2) call. This would require some state keeping in the kernel and might not be reliable. >This wouldn't fix the IPSEC usage without changing userland though. > >The alternative is to make both setsockopt(2) and getsockopt(2) >bidirectional. So depending on the contents of the sockval we do something different? >I haven't checked in the userland implementation of sctp_opt_info(3) >yet. I took the one from FreeBSD but can modify it to work with a >different API. Ok. christos
Re: getsockopt(2)
I wrote: >Does anyone know why we don't use the input 'optlen' parameter to the >getsockopt(2) syscall, we do write back to it on return. > >In ip_output() there is this, which suggests that someone had come >across this before. > >#if 0 /* defined(IPSEC) */ >case IP_IPSEC_POLICY: >{ >struct mbuf *m = NULL; > >/* XXX this will return EINVAL as sopt is empty */ >error = ipsec4_get_policy(inp, sopt->sopt_data, >sopt->sopt_size, &m); >if (error == 0) >error = sockopt_setmbuf(sopt, m); >break; >} >#endif /*IPSEC*/ > >There are also lots of places in sctp_usrreq that want to use it. > >I can set it with the following patch (line numbers will be slightly >out), but wondered if there was a reason for the current behaviour. > >Index: uipc_syscalls.c >=== >RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v >retrieving revision 1.187 >diff -u -r1.187 uipc_syscalls.c >--- uipc_syscalls.c20 Jun 2017 20:34:49 - 1.187 >+++ uipc_syscalls.c14 Oct 2017 21:33:09 - >@@ -1235,7 +1240,7 @@ > if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) > return (error); > >- sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0); >+ sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); > > if (fp->f_flag & FNOSIGPIPE) > so->so_options |= SO_NOSIGPIPE; Has anyone had any other thoughts on this ? I still think that the one line change shown here is correct, it will allocate a buffer that gets filled by the stuff to be copied back to userspace. The SCTP API that is defined in RFC 6458 does allow for the case where getsockopt(2) and setsockopt(2) are both unidirectional. The API defines a wrapper function sctp_opt_info(3) that can hide whatever mechanism is used to pass into the kernel an extra parameter to be used to select what is returned by getsockopt(2). I'm thinking of adding an extra socket option, maybe SO_PARAM, that you would use with setsockopt(2) to set a selector to be used by a following getsockopt(2) call. This wouldn't fix the IPSEC usage without changing userland though. The alternative is to make both setsockopt(2) and getsockopt(2) bidirectional. I haven't checked in the userland implementation of sctp_opt_info(3) yet. I took the one from FreeBSD but can modify it to work with a different API.
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >> >>chris...@astron.com (Christos Zoulas) wrote: >>>In article , >>>Robert Swindells wrote: >>>> >>>>I wrote: >>>>>Does anyone know why we don't use the input 'optlen' parameter to the >>>>>getsockopt(2) syscall, we do write back to it on return. >> >>[snip] >> >>>Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to >>>do so if it is useful to read the input to deduce the output. But that >>>would make the call asymetric, so let's examine the examples where this >>>is useful first. >> >>FreeBSD does copyin the value, but the code to do it is scattered around >>the kernel, search for sooptcopyin() in their tree. > >Yes, aren't most of the in the sosetopt paths? I am looking for the sogetopt >paths... I have not looked carefully to see if there are any. It is done for TCP_CCALGOOPT and for all the SCTP options. I had reused the FreeBSD userland for SCTP, I was trying to test it before committing the final bits. I would prefer not to have to redesign it.
Re: getsockopt(2)
In article , Robert Swindells wrote: > >chris...@astron.com (Christos Zoulas) wrote: >>In article , >>Robert Swindells wrote: >>> >>>I wrote: >>>>Does anyone know why we don't use the input 'optlen' parameter to the >>>>getsockopt(2) syscall, we do write back to it on return. > >[snip] > >>Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to >>do so if it is useful to read the input to deduce the output. But that >>would make the call asymetric, so let's examine the examples where this >>is useful first. > >FreeBSD does copyin the value, but the code to do it is scattered around >the kernel, search for sooptcopyin() in their tree. Yes, aren't most of the in the sosetopt paths? I am looking for the sogetopt paths... I have not looked carefully to see if there are any. christos
Re: getsockopt(2)
chris...@astron.com (Christos Zoulas) wrote: >In article , >Robert Swindells wrote: >> >>I wrote: >>>Does anyone know why we don't use the input 'optlen' parameter to the >>>getsockopt(2) syscall, we do write back to it on return. [snip] >Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to >do so if it is useful to read the input to deduce the output. But that >would make the call asymetric, so let's examine the examples where this >is useful first. FreeBSD does copyin the value, but the code to do it is scattered around the kernel, search for sooptcopyin() in their tree. Robert Swindells
Re: getsockopt(2)
In article , Robert Swindells wrote: > >I wrote: >>Does anyone know why we don't use the input 'optlen' parameter to the >>getsockopt(2) syscall, we do write back to it on return. >> >>In ip_output() there is this, which suggests that someone had come >>across this before. >> >>#if 0 /* defined(IPSEC) */ >>case IP_IPSEC_POLICY: >>{ >>struct mbuf *m = NULL; >> >>/* XXX this will return EINVAL as sopt is empty */ >>error = ipsec4_get_policy(inp, sopt->sopt_data, >>sopt->sopt_size, &m); >>if (error == 0) >>error = sockopt_setmbuf(sopt, m); >>break; >>} >>#endif /*IPSEC*/ >> >>There are also lots of places in sctp_usrreq that want to use it. >> >>I can set it with the following patch (line numbers will be slightly >>out), but wondered if there was a reason for the current behaviour. > >Version 2 of the patch is below, the uses of getsockopt(2) by IPSEC and >SCTP expect that the optval data is copied into the kernel and back out >again by the syscall. > >It looks like it was broken by this commit: > ><http://mail-index.netbsd.org/source-changes/2008/08/06/msg208868.html> > >Index: uipc_syscalls.c >=== >RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v >retrieving revision 1.187 >diff -u -r1.187 uipc_syscalls.c >--- uipc_syscalls.c20 Jun 2017 20:34:49 - 1.187 >+++ uipc_syscalls.c14 Oct 2017 22:24:15 - >@@ -1235,7 +1240,13 @@ > if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) > return (error); > >- sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0); >+ sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); >+ >+ if (valsize > 0) { >+ error = copyin(SCARG(uap, val), sopt.sopt_data, valsize); >+ if (error) >+ goto out; >+ } > > if (fp->f_flag & FNOSIGPIPE) > so->so_options |= SO_NOSIGPIPE; Neither FreeBSD or OpenBSD copyin the value, but I think it is ok to do so if it is useful to read the input to deduce the output. But that would make the call asymetric, so let's examine the examples where this is useful first. christos
Re: getsockopt(2)
I wrote: >Does anyone know why we don't use the input 'optlen' parameter to the >getsockopt(2) syscall, we do write back to it on return. > >In ip_output() there is this, which suggests that someone had come >across this before. > >#if 0 /* defined(IPSEC) */ >case IP_IPSEC_POLICY: >{ >struct mbuf *m = NULL; > >/* XXX this will return EINVAL as sopt is empty */ >error = ipsec4_get_policy(inp, sopt->sopt_data, >sopt->sopt_size, &m); >if (error == 0) >error = sockopt_setmbuf(sopt, m); >break; >} >#endif /*IPSEC*/ > >There are also lots of places in sctp_usrreq that want to use it. > >I can set it with the following patch (line numbers will be slightly >out), but wondered if there was a reason for the current behaviour. Version 2 of the patch is below, the uses of getsockopt(2) by IPSEC and SCTP expect that the optval data is copied into the kernel and back out again by the syscall. It looks like it was broken by this commit: <http://mail-index.netbsd.org/source-changes/2008/08/06/msg208868.html> Index: uipc_syscalls.c === RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.187 diff -u -r1.187 uipc_syscalls.c --- uipc_syscalls.c 20 Jun 2017 20:34:49 - 1.187 +++ uipc_syscalls.c 14 Oct 2017 22:24:15 - @@ -1235,7 +1240,13 @@ if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) return (error); - sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0); + sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); + + if (valsize > 0) { + error = copyin(SCARG(uap, val), sopt.sopt_data, valsize); + if (error) + goto out; + } if (fp->f_flag & FNOSIGPIPE) so->so_options |= SO_NOSIGPIPE;
getsockopt(2)
Does anyone know why we don't use the input 'optlen' parameter to the getsockopt(2) syscall, we do write back to it on return. In ip_output() there is this, which suggests that someone had come across this before. #if 0 /* defined(IPSEC) */ case IP_IPSEC_POLICY: { struct mbuf *m = NULL; /* XXX this will return EINVAL as sopt is empty */ error = ipsec4_get_policy(inp, sopt->sopt_data, sopt->sopt_size, &m); if (error == 0) error = sockopt_setmbuf(sopt, m); break; } #endif /*IPSEC*/ There are also lots of places in sctp_usrreq that want to use it. I can set it with the following patch (line numbers will be slightly out), but wondered if there was a reason for the current behaviour. Index: uipc_syscalls.c === RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.187 diff -u -r1.187 uipc_syscalls.c --- uipc_syscalls.c 20 Jun 2017 20:34:49 - 1.187 +++ uipc_syscalls.c 14 Oct 2017 21:33:09 - @@ -1235,7 +1240,7 @@ if ((error = fd_getsock1(SCARG(uap, s), &so, &fp)) != 0) return (error); - sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), 0); + sockopt_init(&sopt, SCARG(uap, level), SCARG(uap, name), valsize); if (fp->f_flag & FNOSIGPIPE) so->so_options |= SO_NOSIGPIPE;