Am 19.07.2016 um 00:20 schrieb Christopher Schultz:
On 7/18/16 5:48 PM, Rainer Jung wrote:
Am 18.07.2016 um 17:02 schrieb Christopher Schultz:
All,

Michael Deiner found a buffer overflow in the call to FD_SET macro on
line 291 of jk_connect.c:

280>   do {
281>        rc = connect(sd, (const struct sockaddr *)&addr->sa.sin,
addr->salen);
282>    } while (rc == -1 && errno == EINTR);
283>
284>    if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
285>                   && (timeout > 0)) {
286>        fd_set wfdset;
287>        struct timeval tv;
288>        socklen_t rclen = (socklen_t)sizeof(rc);
289>
290>        FD_ZERO(&wfdset);
*291>        FD_SET(sd, &wfdset);*
292>        tv.tv_sec = timeout / 1000;
293>        tv.tv_usec = (timeout % 1000) * 1000;
294>        rc = select(sd + 1, NULL, &wfdset, NULL, &tv);

I'd like to fix this so it won't bring-down the server :)

But it quickly gets complicated.

The method itself takes "sd" (a jk_sock_t) as an argument to the
function, and we can check immediately whether it will cause FD_SET to
overflow -- easy: just check to see if the value is too large -- but
what should we do in that case?

This function should be connecting to a back-end Tomcat server, but if
we have too many outbound connections, we'll fail.

I'm not sure it makes any sense to let things get this far.

The proposed solution[1] is to use poll() instead of select(), but that
won't work on every platform, plus I'd like to be able to fix the buffer
overflow right away while we work-out a solution for poll() that will
work in most environments.

I think if the connection_pool_size exceeds FD_SETSIZE we should refuse
to start. Any other behavior will eventually cause errors.

+1 in principal. Unfortunately on Windows it seems the default for
FD_SETSIZE is only 64. That's probably too small but it seems it is
allowed on Windows to increase this limit during compilation:

<Quote>
The variable FD_SETSIZE determines the maximum number of descriptors in
a set. (The default value of FD_SETSIZE is 64, which can be modified by
defining FD_SETSIZE to another value before including Winsock2.h.)
Internally, socket handles in an fd_set structure are not represented as
bit flags as in Berkeley Unix. Their data representation is opaque.
</Quote>

That's ... weird. Okay.

So we should IMHO aim for

a) check connection pool size against FD_SETSIZE and fail during startup
if too big - or we decrease it to the max value and log a warning?

On *NIX, that value cannot reasonably be changed. I think we need to
make all our decisions at compile-time and fail-fast at runtime.

Yes, with "decrease it" I meant decreasing the configured pool size, just as you assumed below.

Lowering to a reasonable maximum value is probably okay. I'm not sure
which would be worse: requiring the administrator to fix a configuration
problem before the server can even start (imagine a server that's been
working for years without this config, now it requires some change) or
auto-reconfiguring based upon a value the admin hasn't set.

Actually... in cases where this would have affected users, the result
would have been that everything is fine until there is a buffer
overflow. Hopefully, the buffer overflow is fatal, but it might not be.

So, lowering to a smaller value if connection_pool_size is too big
sounds good to me. Log with severity=WARN is a good option for notification.

b) define 1024 as the compile time FD_SETSIZE on Windows (same value as
the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as
the default connection pool size.

+1

c) allow to increase FD_SETSIZE when building on Windows because it is
supported there.

+1

We probably want something like JK_FD_SETSIZE defaults to 1024 and then
FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how
on earth to do that for our Windows builds.

d) use the existing macro HAVE_POLL to offer a poll based code path if
poll was detected.

I don't think HAVE_POLL is any kind of standard. I poked-around my
Debian Linux environment and HAVE_POLL was defined in a number of header
files, but it was unconditionally-defined to be "1" in files such as
postgresql/pg_config.h, so I think the package-maintainers must have
just said "this system has poll.h, let's just set HAVE_POLL=1 and call
it a day".

It is a define that our mod_jk build system sets when configure is used and detects during the configure run, that poll() is available. This define is already used in common/jk_connect.c at some places and could be used in nb_connect() as well. It only makes sense for platforms where configure is being run, ie. not on Windows and Netware, but nb_connect() already has different implementation lines for the three platform types (Windows, Netware, *Nix).

Using poll() if HAVE_POLL is defined (ie. poll() is available), gives us a clean solution on most platforms except Windows, and if we allow to increase the FD_SETSIZE on Windows (and choose a sane default ourselves) people can work around the problem there as well.

I'll have a look at the Windows build files concerning setting FD_SETSIZE to 1024 and probably allow to change the value during compilation.

Concerning a) the code that reads the pool size is in
common/jk_ajp_common.c in function ajp_init():

p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache);

The function gets a logger as an argument, so if we want we can easily
log stuff there. Correcting a cacxhe size that's too big to the max
value and log would be easiest. Terminating the startup is more
difficult. We do it e.g. for Apache using jk_error_exit() but it will be
a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c.
Furthermore you need a similar solution for ISAPI. Therefore I suggest
to choose the "correct the config and warn" attempt.

Thoughts?

I like correct-and-warn for a number of reasons.

So forgetting poll() for the time being, the correct-and-warn would just
be a check for connection_pool_size > FD_SETSIZE, then set
connection_pool_size = FD_SETSIZE and warn, right?

Correct, BUT: I'm not totally convinced, that a limit on the pool size suffices. I think file descriptor numbers are global to each process. So if you have various workers, say 20 workers each with a pool size of 100, then you could easily get file descriptor numbers from 0 to 1999. Furthermore I expect other (non-mod_jk resp. non-ISAPI) file descriptors in the process count as well:

- open log files
- incoming client to web server connections
- more?

If that is true, the attempt to work around the problem by limiting pools wouldn't work. So Maybe we need to change the strategy to

- introduce poll() when available
- make FD_SETSIZE configurable (plus sane default) on Windows
- fail during connection attempt if file descriptor number for new socket is to big

WDYT?

Seems simple enough :)

Hmmmm ...

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to