Re: Buffer overflow in jk_connect.c::nb_connect
Am 19.07.2016 um 00:20 schrieb Christopher Schultz: On 7/18/16 5:48 PM, Rainer Jung wrote: Am 18.07.2016 um 17:02 schrieb Christopher Schultz: All, Michael Deiner found a buffer overflow in the call to FD_SET macro on line 291 of jk_connect.c: 280> do { 281>rc = connect(sd, (const struct sockaddr *)>sa.sin, addr->salen); 282>} while (rc == -1 && errno == EINTR); 283> 284>if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY) 285> && (timeout > 0)) { 286>fd_set wfdset; 287>struct timeval tv; 288>socklen_t rclen = (socklen_t)sizeof(rc); 289> 290>FD_ZERO(); *291>FD_SET(sd, );* 292>tv.tv_sec = timeout / 1000; 293>tv.tv_usec = (timeout % 1000) * 1000; 294>rc = select(sd + 1, NULL, , NULL, ); I'd like to fix this so it won't bring-down the server :) But it quickly gets complicated. The method itself takes "sd" (a jk_sock_t) as an argument to the function, and we can check immediately whether it will cause FD_SET to overflow -- easy: just check to see if the value is too large -- but what should we do in that case? This function should be connecting to a back-end Tomcat server, but if we have too many outbound connections, we'll fail. I'm not sure it makes any sense to let things get this far. The proposed solution[1] is to use poll() instead of select(), but that won't work on every platform, plus I'd like to be able to fix the buffer overflow right away while we work-out a solution for poll() that will work in most environments. I think if the connection_pool_size exceeds FD_SETSIZE we should refuse to start. Any other behavior will eventually cause errors. +1 in principal. Unfortunately on Windows it seems the default for FD_SETSIZE is only 64. That's probably too small but it seems it is allowed on Windows to increase this limit during compilation: The variable FD_SETSIZE determines the maximum number of descriptors in a set. (The default value of FD_SETSIZE is 64, which can be modified by defining FD_SETSIZE to another value before including Winsock2.h.) Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque. That's ... weird. Okay. So we should IMHO aim for a) check connection pool size against FD_SETSIZE and fail during startup if too big - or we decrease it to the max value and log a warning? On *NIX, that value cannot reasonably be changed. I think we need to make all our decisions at compile-time and fail-fast at runtime. Yes, with "decrease it" I meant decreasing the configured pool size, just as you assumed below. Lowering to a reasonable maximum value is probably okay. I'm not sure which would be worse: requiring the administrator to fix a configuration problem before the server can even start (imagine a server that's been working for years without this config, now it requires some change) or auto-reconfiguring based upon a value the admin hasn't set. Actually... in cases where this would have affected users, the result would have been that everything is fine until there is a buffer overflow. Hopefully, the buffer overflow is fatal, but it might not be. So, lowering to a smaller value if connection_pool_size is too big sounds good to me. Log with severity=WARN is a good option for notification. b) define 1024 as the compile time FD_SETSIZE on Windows (same value as the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as the default connection pool size. +1 c) allow to increase FD_SETSIZE when building on Windows because it is supported there. +1 We probably want something like JK_FD_SETSIZE defaults to 1024 and then FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how on earth to do that for our Windows builds. d) use the existing macro HAVE_POLL to offer a poll based code path if poll was detected. I don't think HAVE_POLL is any kind of standard. I poked-around my Debian Linux environment and HAVE_POLL was defined in a number of header files, but it was unconditionally-defined to be "1" in files such as postgresql/pg_config.h, so I think the package-maintainers must have just said "this system has poll.h, let's just set HAVE_POLL=1 and call it a day". It is a define that our mod_jk build system sets when configure is used and detects during the configure run, that poll() is available. This define is already used in common/jk_connect.c at some places and could be used in nb_connect() as well. It only makes sense for platforms where configure is being run, ie. not on Windows and Netware, but nb_connect() already has different implementation lines for the three platform types (Windows, Netware, *Nix). Using poll() if HAVE_POLL is defined (ie. poll() is available), gives us a clean solution on most platforms except Windows, and if we allow to increase the FD_SETSIZE on Windows (and choose a sane default ourselves) people
Re: Buffer overflow in jk_connect.c::nb_connect
Rainer, On 7/18/16 5:48 PM, Rainer Jung wrote: > Hi Chris, > > thanks for picking this topic. > > Am 18.07.2016 um 17:02 schrieb Christopher Schultz: >> All, >> >> Michael Deiner found a buffer overflow in the call to FD_SET macro on >> line 291 of jk_connect.c: >> >> 280> do { >> 281>rc = connect(sd, (const struct sockaddr *)>sa.sin, >> addr->salen); >> 282>} while (rc == -1 && errno == EINTR); >> 283> >> 284>if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY) >> 285> && (timeout > 0)) { >> 286>fd_set wfdset; >> 287>struct timeval tv; >> 288>socklen_t rclen = (socklen_t)sizeof(rc); >> 289> >> 290>FD_ZERO(); >> *291>FD_SET(sd, );* >> 292>tv.tv_sec = timeout / 1000; >> 293>tv.tv_usec = (timeout % 1000) * 1000; >> 294>rc = select(sd + 1, NULL, , NULL, ); >> >> I'd like to fix this so it won't bring-down the server :) >> >> But it quickly gets complicated. >> >> The method itself takes "sd" (a jk_sock_t) as an argument to the >> function, and we can check immediately whether it will cause FD_SET to >> overflow -- easy: just check to see if the value is too large -- but >> what should we do in that case? >> >> This function should be connecting to a back-end Tomcat server, but if >> we have too many outbound connections, we'll fail. >> >> I'm not sure it makes any sense to let things get this far. >> >> The proposed solution[1] is to use poll() instead of select(), but that >> won't work on every platform, plus I'd like to be able to fix the buffer >> overflow right away while we work-out a solution for poll() that will >> work in most environments. >> >> I think if the connection_pool_size exceeds FD_SETSIZE we should refuse >> to start. Any other behavior will eventually cause errors. > > +1 in principal. Unfortunately on Windows it seems the default for > FD_SETSIZE is only 64. That's probably too small but it seems it is > allowed on Windows to increase this limit during compilation: > > > The variable FD_SETSIZE determines the maximum number of descriptors in > a set. (The default value of FD_SETSIZE is 64, which can be modified by > defining FD_SETSIZE to another value before including Winsock2.h.) > Internally, socket handles in an fd_set structure are not represented as > bit flags as in Berkeley Unix. Their data representation is opaque. > That's ... weird. Okay. > So we should IMHO aim for > > a) check connection pool size against FD_SETSIZE and fail during startup > if too big - or we decrease it to the max value and log a warning? On *NIX, that value cannot reasonably be changed. I think we need to make all our decisions at compile-time and fail-fast at runtime. Lowering to a reasonable maximum value is probably okay. I'm not sure which would be worse: requiring the administrator to fix a configuration problem before the server can even start (imagine a server that's been working for years without this config, now it requires some change) or auto-reconfiguring based upon a value the admin hasn't set. Actually... in cases where this would have affected users, the result would have been that everything is fine until there is a buffer overflow. Hopefully, the buffer overflow is fatal, but it might not be. So, lowering to a smaller value if connection_pool_size is too big sounds good to me. Log with severity=WARN is a good option for notification. > b) define 1024 as the compile time FD_SETSIZE on Windows (same value as > the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as > the default connection pool size. +1 > c) allow to increase FD_SETSIZE when building on Windows because it is > supported there. +1 We probably want something like JK_FD_SETSIZE defaults to 1024 and then FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how on earth to do that for our Windows builds. > d) use the existing macro HAVE_POLL to offer a poll based code path if > poll was detected. I don't think HAVE_POLL is any kind of standard. I poked-around my Debian Linux environment and HAVE_POLL was defined in a number of header files, but it was unconditionally-defined to be "1" in files such as postgresql/pg_config.h, so I think the package-maintainers must have just said "this system has poll.h, let's just set HAVE_POLL=1 and call it a day". > Concerning a) the code that reads the pool size is in > common/jk_ajp_common.c in function ajp_init(): > > p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache); > > The function gets a logger as an argument, so if we want we can easily > log stuff there. Correcting a cacxhe size that's too big to the max > value and log would be easiest. Terminating the startup is more > difficult. We do it e.g. for Apache using jk_error_exit() but it will be > a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c. > Furthermore you need a similar solution for ISAPI. Therefore I suggest > to choose the
Re: Buffer overflow in jk_connect.c::nb_connect
Hi Chris, thanks for picking this topic. Am 18.07.2016 um 17:02 schrieb Christopher Schultz: All, Michael Deiner found a buffer overflow in the call to FD_SET macro on line 291 of jk_connect.c: 280> do { 281>rc = connect(sd, (const struct sockaddr *)>sa.sin, addr->salen); 282>} while (rc == -1 && errno == EINTR); 283> 284>if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY) 285> && (timeout > 0)) { 286>fd_set wfdset; 287>struct timeval tv; 288>socklen_t rclen = (socklen_t)sizeof(rc); 289> 290>FD_ZERO(); *291>FD_SET(sd, );* 292>tv.tv_sec = timeout / 1000; 293>tv.tv_usec = (timeout % 1000) * 1000; 294>rc = select(sd + 1, NULL, , NULL, ); I'd like to fix this so it won't bring-down the server :) But it quickly gets complicated. The method itself takes "sd" (a jk_sock_t) as an argument to the function, and we can check immediately whether it will cause FD_SET to overflow -- easy: just check to see if the value is too large -- but what should we do in that case? This function should be connecting to a back-end Tomcat server, but if we have too many outbound connections, we'll fail. I'm not sure it makes any sense to let things get this far. The proposed solution[1] is to use poll() instead of select(), but that won't work on every platform, plus I'd like to be able to fix the buffer overflow right away while we work-out a solution for poll() that will work in most environments. I think if the connection_pool_size exceeds FD_SETSIZE we should refuse to start. Any other behavior will eventually cause errors. +1 in principal. Unfortunately on Windows it seems the default for FD_SETSIZE is only 64. That's probably too small but it seems it is allowed on Windows to increase this limit during compilation: The variable FD_SETSIZE determines the maximum number of descriptors in a set. (The default value of FD_SETSIZE is 64, which can be modified by defining FD_SETSIZE to another value before including Winsock2.h.) Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque. So we should IMHO aim for a) check connection pool size against FD_SETSIZE and fail during startup if too big - or we decrease it to the max value and log a warning? b) define 1024 as the compile time FD_SETSIZE on Windows (same value as the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as the default connection pool size. c) allow to increase FD_SETSIZE when building on Windows because it is supported there. d) use the existing macro HAVE_POLL to offer a poll based code path if poll was detected. Concerning a) the code that reads the pool size is in common/jk_ajp_common.c in function ajp_init(): p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache); The function gets a logger as an argument, so if we want we can easily log stuff there. Correcting a cacxhe size that's too big to the max value and log would be easiest. Terminating the startup is more difficult. We do it e.g. for Apache using jk_error_exit() but it will be a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c. Furthermore you need a similar solution for ISAPI. Therefore I suggest to choose the "correct the config and warn" attempt. Thoughts? ... and if we want to refuse to start... where is the best place to do that and how? -chris [1] https://lists.apache.org/thread.html/839da944652d028431bebbf74ce0635d02666868944de49c05d81057@%3Cusers.tomcat.apache.org%3E Regards, Rainer - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org