Re: Buffer overflow in jk_connect.c::nb_connect

2016-07-19 Thread Rainer Jung

Am 19.07.2016 um 00:20 schrieb Christopher Schultz:

On 7/18/16 5:48 PM, Rainer Jung wrote:

Am 18.07.2016 um 17:02 schrieb Christopher Schultz:

All,

Michael Deiner found a buffer overflow in the call to FD_SET macro on
line 291 of jk_connect.c:

280>   do {
281>rc = connect(sd, (const struct sockaddr *)>sa.sin,
addr->salen);
282>} while (rc == -1 && errno == EINTR);
283>
284>if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
285>   && (timeout > 0)) {
286>fd_set wfdset;
287>struct timeval tv;
288>socklen_t rclen = (socklen_t)sizeof(rc);
289>
290>FD_ZERO();
*291>FD_SET(sd, );*
292>tv.tv_sec = timeout / 1000;
293>tv.tv_usec = (timeout % 1000) * 1000;
294>rc = select(sd + 1, NULL, , NULL, );

I'd like to fix this so it won't bring-down the server :)

But it quickly gets complicated.

The method itself takes "sd" (a jk_sock_t) as an argument to the
function, and we can check immediately whether it will cause FD_SET to
overflow -- easy: just check to see if the value is too large -- but
what should we do in that case?

This function should be connecting to a back-end Tomcat server, but if
we have too many outbound connections, we'll fail.

I'm not sure it makes any sense to let things get this far.

The proposed solution[1] is to use poll() instead of select(), but that
won't work on every platform, plus I'd like to be able to fix the buffer
overflow right away while we work-out a solution for poll() that will
work in most environments.

I think if the connection_pool_size exceeds FD_SETSIZE we should refuse
to start. Any other behavior will eventually cause errors.


+1 in principal. Unfortunately on Windows it seems the default for
FD_SETSIZE is only 64. That's probably too small but it seems it is
allowed on Windows to increase this limit during compilation:


The variable FD_SETSIZE determines the maximum number of descriptors in
a set. (The default value of FD_SETSIZE is 64, which can be modified by
defining FD_SETSIZE to another value before including Winsock2.h.)
Internally, socket handles in an fd_set structure are not represented as
bit flags as in Berkeley Unix. Their data representation is opaque.



That's ... weird. Okay.


So we should IMHO aim for

a) check connection pool size against FD_SETSIZE and fail during startup
if too big - or we decrease it to the max value and log a warning?


On *NIX, that value cannot reasonably be changed. I think we need to
make all our decisions at compile-time and fail-fast at runtime.


Yes, with "decrease it" I meant decreasing the configured pool size, 
just as you assumed below.



Lowering to a reasonable maximum value is probably okay. I'm not sure
which would be worse: requiring the administrator to fix a configuration
problem before the server can even start (imagine a server that's been
working for years without this config, now it requires some change) or
auto-reconfiguring based upon a value the admin hasn't set.

Actually... in cases where this would have affected users, the result
would have been that everything is fine until there is a buffer
overflow. Hopefully, the buffer overflow is fatal, but it might not be.

So, lowering to a smaller value if connection_pool_size is too big
sounds good to me. Log with severity=WARN is a good option for notification.


b) define 1024 as the compile time FD_SETSIZE on Windows (same value as
the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as
the default connection pool size.


+1


c) allow to increase FD_SETSIZE when building on Windows because it is
supported there.


+1

We probably want something like JK_FD_SETSIZE defaults to 1024 and then
FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how
on earth to do that for our Windows builds.


d) use the existing macro HAVE_POLL to offer a poll based code path if
poll was detected.


I don't think HAVE_POLL is any kind of standard. I poked-around my
Debian Linux environment and HAVE_POLL was defined in a number of header
files, but it was unconditionally-defined to be "1" in files such as
postgresql/pg_config.h, so I think the package-maintainers must have
just said "this system has poll.h, let's just set HAVE_POLL=1 and call
it a day".


It is a define that our mod_jk build system sets when configure is used 
and detects during the configure run, that poll() is available. This 
define is already used in common/jk_connect.c at some places and could 
be used in nb_connect() as well. It only makes sense for platforms where 
configure is being run, ie. not on Windows and Netware, but nb_connect() 
already has different implementation lines for the three platform types 
(Windows, Netware, *Nix).


Using poll() if HAVE_POLL is defined (ie. poll() is available), gives us 
a clean solution on most platforms except Windows, and if we allow to 
increase the FD_SETSIZE on Windows (and choose a sane default ourselves) 
people 

Re: Buffer overflow in jk_connect.c::nb_connect

2016-07-18 Thread Christopher Schultz
Rainer,

On 7/18/16 5:48 PM, Rainer Jung wrote:
> Hi Chris,
> 
> thanks for picking this topic.
> 
> Am 18.07.2016 um 17:02 schrieb Christopher Schultz:
>> All,
>>
>> Michael Deiner found a buffer overflow in the call to FD_SET macro on
>> line 291 of jk_connect.c:
>>
>> 280>   do {
>> 281>rc = connect(sd, (const struct sockaddr *)>sa.sin,
>> addr->salen);
>> 282>} while (rc == -1 && errno == EINTR);
>> 283>
>> 284>if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
>> 285>   && (timeout > 0)) {
>> 286>fd_set wfdset;
>> 287>struct timeval tv;
>> 288>socklen_t rclen = (socklen_t)sizeof(rc);
>> 289>
>> 290>FD_ZERO();
>> *291>FD_SET(sd, );*
>> 292>tv.tv_sec = timeout / 1000;
>> 293>tv.tv_usec = (timeout % 1000) * 1000;
>> 294>rc = select(sd + 1, NULL, , NULL, );
>>
>> I'd like to fix this so it won't bring-down the server :)
>>
>> But it quickly gets complicated.
>>
>> The method itself takes "sd" (a jk_sock_t) as an argument to the
>> function, and we can check immediately whether it will cause FD_SET to
>> overflow -- easy: just check to see if the value is too large -- but
>> what should we do in that case?
>>
>> This function should be connecting to a back-end Tomcat server, but if
>> we have too many outbound connections, we'll fail.
>>
>> I'm not sure it makes any sense to let things get this far.
>>
>> The proposed solution[1] is to use poll() instead of select(), but that
>> won't work on every platform, plus I'd like to be able to fix the buffer
>> overflow right away while we work-out a solution for poll() that will
>> work in most environments.
>>
>> I think if the connection_pool_size exceeds FD_SETSIZE we should refuse
>> to start. Any other behavior will eventually cause errors.
> 
> +1 in principal. Unfortunately on Windows it seems the default for
> FD_SETSIZE is only 64. That's probably too small but it seems it is
> allowed on Windows to increase this limit during compilation:
> 
> 
> The variable FD_SETSIZE determines the maximum number of descriptors in
> a set. (The default value of FD_SETSIZE is 64, which can be modified by
> defining FD_SETSIZE to another value before including Winsock2.h.)
> Internally, socket handles in an fd_set structure are not represented as
> bit flags as in Berkeley Unix. Their data representation is opaque.
> 

That's ... weird. Okay.

> So we should IMHO aim for
> 
> a) check connection pool size against FD_SETSIZE and fail during startup
> if too big - or we decrease it to the max value and log a warning?

On *NIX, that value cannot reasonably be changed. I think we need to
make all our decisions at compile-time and fail-fast at runtime.

Lowering to a reasonable maximum value is probably okay. I'm not sure
which would be worse: requiring the administrator to fix a configuration
problem before the server can even start (imagine a server that's been
working for years without this config, now it requires some change) or
auto-reconfiguring based upon a value the admin hasn't set.

Actually... in cases where this would have affected users, the result
would have been that everything is fine until there is a buffer
overflow. Hopefully, the buffer overflow is fatal, but it might not be.

So, lowering to a smaller value if connection_pool_size is too big
sounds good to me. Log with severity=WARN is a good option for notification.

> b) define 1024 as the compile time FD_SETSIZE on Windows (same value as
> the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as
> the default connection pool size.

+1

> c) allow to increase FD_SETSIZE when building on Windows because it is
> supported there.

+1

We probably want something like JK_FD_SETSIZE defaults to 1024 and then
FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how
on earth to do that for our Windows builds.

> d) use the existing macro HAVE_POLL to offer a poll based code path if
> poll was detected.

I don't think HAVE_POLL is any kind of standard. I poked-around my
Debian Linux environment and HAVE_POLL was defined in a number of header
files, but it was unconditionally-defined to be "1" in files such as
postgresql/pg_config.h, so I think the package-maintainers must have
just said "this system has poll.h, let's just set HAVE_POLL=1 and call
it a day".

> Concerning a) the code that reads the pool size is in
> common/jk_ajp_common.c in function ajp_init():
> 
> p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache);
> 
> The function gets a logger as an argument, so if we want we can easily
> log stuff there. Correcting a cacxhe size that's too big to the max
> value and log would be easiest. Terminating the startup is more
> difficult. We do it e.g. for Apache using jk_error_exit() but it will be
> a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c.
> Furthermore you need a similar solution for ISAPI. Therefore I suggest
> to choose the 

Re: Buffer overflow in jk_connect.c::nb_connect

2016-07-18 Thread Rainer Jung

Hi Chris,

thanks for picking this topic.

Am 18.07.2016 um 17:02 schrieb Christopher Schultz:

All,

Michael Deiner found a buffer overflow in the call to FD_SET macro on
line 291 of jk_connect.c:

280>   do {
281>rc = connect(sd, (const struct sockaddr *)>sa.sin,
addr->salen);
282>} while (rc == -1 && errno == EINTR);
283>
284>if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
285>   && (timeout > 0)) {
286>fd_set wfdset;
287>struct timeval tv;
288>socklen_t rclen = (socklen_t)sizeof(rc);
289>
290>FD_ZERO();
*291>FD_SET(sd, );*
292>tv.tv_sec = timeout / 1000;
293>tv.tv_usec = (timeout % 1000) * 1000;
294>rc = select(sd + 1, NULL, , NULL, );

I'd like to fix this so it won't bring-down the server :)

But it quickly gets complicated.

The method itself takes "sd" (a jk_sock_t) as an argument to the
function, and we can check immediately whether it will cause FD_SET to
overflow -- easy: just check to see if the value is too large -- but
what should we do in that case?

This function should be connecting to a back-end Tomcat server, but if
we have too many outbound connections, we'll fail.

I'm not sure it makes any sense to let things get this far.

The proposed solution[1] is to use poll() instead of select(), but that
won't work on every platform, plus I'd like to be able to fix the buffer
overflow right away while we work-out a solution for poll() that will
work in most environments.

I think if the connection_pool_size exceeds FD_SETSIZE we should refuse
to start. Any other behavior will eventually cause errors.


+1 in principal. Unfortunately on Windows it seems the default for 
FD_SETSIZE is only 64. That's probably too small but it seems it is 
allowed on Windows to increase this limit during compilation:



The variable FD_SETSIZE determines the maximum number of descriptors in 
a set. (The default value of FD_SETSIZE is 64, which can be modified by 
defining FD_SETSIZE to another value before including Winsock2.h.) 
Internally, socket handles in an fd_set structure are not represented as 
bit flags as in Berkeley Unix. Their data representation is opaque.



So we should IMHO aim for

a) check connection pool size against FD_SETSIZE and fail during startup 
if too big - or we decrease it to the max value and log a warning?


b) define 1024 as the compile time FD_SETSIZE on Windows (same value as 
the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as 
the default connection pool size.


c) allow to increase FD_SETSIZE when building on Windows because it is 
supported there.


d) use the existing macro HAVE_POLL to offer a poll based code path if 
poll was detected.


Concerning a) the code that reads the pool size is in 
common/jk_ajp_common.c in function ajp_init():


p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache);

The function gets a logger as an argument, so if we want we can easily 
log stuff there. Correcting a cacxhe size that's too big to the max 
value and log would be easiest. Terminating the startup is more 
difficult. We do it e.g. for Apache using jk_error_exit() but it will be 
a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c. 
Furthermore you need a similar solution for ISAPI. Therefore I suggest 
to choose the "correct the config and warn" attempt.



Thoughts?

... and if we want to refuse to start... where is the best place to do
that and how?

-chris

[1]
https://lists.apache.org/thread.html/839da944652d028431bebbf74ce0635d02666868944de49c05d81057@%3Cusers.tomcat.apache.org%3E


Regards,

Rainer


-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org