Thank you for spending time on this issue.
I spent some time to work on this ldapd(8) hang issue again. It would be
appreciated
if I could get some clarification on my modification. With my modification,
test2.sh from Robert Klein works without any hangs from 1 to 16500 bytes of
data.
This issue seems to be caused by a size mismatch between ber_read_element() on
ber.c and the size of LDAP messages that ldapd(8) receives from the client such
as ldapadd(1). This happens once tls_read() in evtls_read() returns
TLS_WANT_POLLIN. The buffer to receive the LDAP messages is set by ioctl(2) in
evtls_read().
if (ioctl(fd, FIONREAD, &n) == -1 || n <= 0) {
...
if (howmuch < 0 || howmuch > n)
howmuch = n;
...
/* If we don't have FIONREAD, we might waste some space here */
if (evbuffer_expand(buf, howmuch) == -1)
return (-1);
...
/* We can append new data at this point */
p = buf->buffer + buf->off;
...
n = tls_read(ctx, p, howmuch);
To see the mismatch, I put several printf() calls on conn.c and
evbuffer_tls.c. The outputs are like below and the lines starting with '*'
are from the printf().
May 5 12:13:05.512 [82850] conn_tls_init: switching to TLS
* n: 46 howmuch: 75 for evbuffer_expand() and tls_read() in
evtls_read,333,evbuffer_tls.c
*EVBUFFER_LENGTH(bufev->input): 46 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
read ber elements, root 0x19b04aae7700
...
May 5 12:13:05.539 [82850] consumed 46 bytes
...
write ber element of 14 bytes length
*EVBUFFER_LENGTH(input): 46 nused: 46 before evbuffer_drain() in
conn_read,210,conn.c
*EVBUFFER_LENGTH(input): 0 nused: 46 after evbuffer_drain() in
conn_read,213,conn.c
* n: -2 howmuch: 2856 for evbuffer_expand() and tls_read() in
evtls_read,333,evbuffer_tls.c
*EVBUFFER_LENGTH(bufev->input): 0 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
*res from evtls_read() == TLS_WANT_POLLIN in buffertls_readcb,105,evbuffer_tls.c
* n: 1066 howmuch: 1066 for evbuffer_expand() and tls_read() in
evtls_read,333,evbuffer_tls.c
*EVBUFFER_LENGTH(bufev->input): 1066 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
read ber elements, root 0x19b04aae7e00
ber read got class 0 type 16, constructive
ber read element size 3889
*EVBUFFER_LENGTH(input): 1066 nused: 0 before evbuffer_drain() in
conn_read,210,conn.c
*EVBUFFER_LENGTH(input): 1066 nused: 0 after evbuffer_drain() in
conn_read,213,conn.c
^CMay 5 12:14:05.967 [82850] May 5 12:14:05.967 [47727] ldape: got signal 2
...
May 5 12:14:05.968 [82850] ldape: exiting
The diffs for the printf() calls are
Index: conn.c
===================================================================
RCS file: /cvs/src/usr.sbin/ldapd/conn.c,v
retrieving revision 1.14
diff -u -p -r1.14 conn.c
--- conn.c 20 Jan 2017 11:55:08 -0000 1.14
+++ conn.c 5 May 2017 14:33:50 -0000
@@ -206,7 +206,11 @@ conn_read(struct bufferevent *bev, void
break;
}
+ printf("*EVBUFFER_LENGTH(input): %zu nused: %zu before evbuffer_drain()
in %s,%d,%s\n",
+ EVBUFFER_LENGTH(input),nused,__func__,__LINE__,__FILE__);
evbuffer_drain(input, nused);
+ printf("*EVBUFFER_LENGTH(input): %zu nused: %zu after evbuffer_drain()
in %s,%d,%s\n",
+ EVBUFFER_LENGTH(input),nused,__func__,__LINE__,__FILE__);
}
Index: evbuffer_tls.c
===================================================================
RCS file: /cvs/src/usr.sbin/ldapd/evbuffer_tls.c,v
retrieving revision 1.2
diff -u -p -r1.2 evbuffer_tls.c
--- evbuffer_tls.c 3 Mar 2017 20:26:23 -0000 1.2
+++ evbuffer_tls.c 5 May 2017 14:33:50 -0000
@@ -97,8 +97,12 @@ buffertls_readcb(int fd, short event, vo
}
res = evtls_read(bufev->input, fd, howmuch, ctx);
+ printf("*EVBUFFER_LENGTH(bufev->input): %zu fd: %d howmuch: %d in
%s,%d,%s\n",
+
EVBUFFER_LENGTH(bufev->input),fd,howmuch,__func__,__LINE__,__FILE__);
switch (res) {
case TLS_WANT_POLLIN:
+ printf("*res from evtls_read() == TLS_WANT_POLLIN in
%s,%d,%s\n",
+ __func__,__LINE__,__FILE__);
bufferevent_add(&bufev->ev_read, bufev->timeout_read);
return;
case TLS_WANT_POLLOUT:
@@ -324,6 +328,10 @@ evtls_read(struct evbuffer *buf, int fd,
p = buf->buffer + buf->off;
n = tls_read(ctx, p, howmuch);
+
+ printf("\n* n: %d howmuch: %d for evbuffer_expand() and tls_read() in
%s,%d,%s\n",
+ n,howmuch,__func__,__LINE__,__FILE__);
+
if (n <= 0)
return (n);
2856 bytes of data from the client was missing. It was not accumulated
correctly when
evtls_read() was called again. The LDAP client did send the data and ldapd(8)
did receive
the data but somehow ldapd(8) didn't store the data. As the result, the client
was waiting for
the response and ldapd(8) was also waiting for more data from the client to
read the entire
LDAP message.
*EVBUFFER_LENGTH(bufev->input): 1066 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
...
ber read element size 3889
*EVBUFFER_LENGTH(input): 1066 nused: 0 before evbuffer_drain() in
conn_read,210,conn.c
*EVBUFFER_LENGTH(input): 1066 nused: 0 after evbuffer_drain() in
conn_read,213,conn.c
ldapd(8) needed 3889 bytes of data but it had only 1066 bytes of data in the
buffer.
Then I started to change howmuch, the buffer size, in evtls_read() adding
1,10,100 and
others to the size of the immediately available data for reading. However I
still had hangs.
Eventually I used EVBUFFER_MAX_READ as the buffer size without ioctl(2) like
below.
It didn't hang at this time. The output with my modification is as follows:
May 5 12:28:58.766 [52462] conn_tls_init: switching to TLS
* n: 46 howmuch: 16384 for evbuffer_expand() and tls_read() in
evtls_read,319,evbuffer_tls.c
*EVBUFFER_LENGTH(bufev->input): 46 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
read ber elements, root 0x1f0e5a06dc80
...
May 5 12:28:58.793 [52462] consumed 46 bytes
...
write ber element of 14 bytes length
*EVBUFFER_LENGTH(input): 46 nused: 46 before evbuffer_drain() in
conn_read,210,conn.c
*EVBUFFER_LENGTH(input): 0 nused: 46 after evbuffer_drain() in
conn_read,213,conn.c
* n: -2 howmuch: 16384 for evbuffer_expand() and tls_read() in
evtls_read,319,evbuffer_tls.c
*EVBUFFER_LENGTH(bufev->input): 0 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
*res from evtls_read() == TLS_WANT_POLLIN in buffertls_readcb,105,evbuffer_tls.c
* n: 3893 howmuch: 16384 for evbuffer_expand() and tls_read() in
evtls_read,319,evbuffer_tls.c
*EVBUFFER_LENGTH(bufev->input): 3893 fd: 17 howmuch: -1 in
buffertls_readcb,101,evbuffer_tls.c
read ber elements, root 0x1f0e82f35780
ber read got class 0 type 16, constructive
ber read element size 3889
...
May 5 12:28:58.795 [52462] consumed 3893 bytes
May 5 12:28:58.795 [52462] got request type 8, id 2
May 5 12:28:58.795 [52462] adding entry cn=robert 3
smith,ou=people,dc=example,dc=com
btree_txn_begin:666: taking write lock on txn 0x1f0e405804a0
...
*EVBUFFER_LENGTH(input): 3893 nused: 3893 before evbuffer_drain() in
conn_read,210,conn.c
*EVBUFFER_LENGTH(input): 0 nused: 3893 after evbuffer_drain() in
conn_read,213,conn.c
I'm not quite sure if this is the right way to fix the issue but it looks like
that
this issue is related to how ldapd(8) buffers LDAP messages from the client.
Index: evbuffer_tls.c
===================================================================
RCS file: /cvs/src/usr.sbin/ldapd/evbuffer_tls.c,v
retrieving revision 1.2
diff -u -p -r1.2 evbuffer_tls.c
--- evbuffer_tls.c 3 Mar 2017 20:26:23 -0000 1.2
+++ evbuffer_tls.c 5 May 2017 16:03:04 -0000
@@ -296,25 +296,9 @@ evtls_read(struct evbuffer *buf, int fd,
{
u_char *p;
size_t oldoff = buf->off;
- int n = EVBUFFER_MAX_READ;
- if (ioctl(fd, FIONREAD, &n) == -1 || n <= 0) {
- n = EVBUFFER_MAX_READ;
- } else if (n > EVBUFFER_MAX_READ && n > howmuch) {
- /*
- * It's possible that a lot of data is available for
- * reading. We do not want to exhaust resources
- * before the reader has a chance to do something
- * about it. If the reader does not tell us how much
- * data we should read, we artifically limit it.
- */
- if ((size_t)n > buf->totallen << 2)
- n = buf->totallen << 2;
- if (n < EVBUFFER_MAX_READ)
- n = EVBUFFER_MAX_READ;
- }
- if (howmuch < 0 || howmuch > n)
- howmuch = n;
+ if (howmuch < 0 || howmuch > EVBUFFER_MAX_READ)
+ howmuch = EVBUFFER_MAX_READ;
/* If we don't have FIONREAD, we might waste some space here */
if (evbuffer_expand(buf, howmuch) == -1)
@@ -323,7 +307,7 @@ evtls_read(struct evbuffer *buf, int fd,
/* We can append new data at this point */
p = buf->buffer + buf->off;
- n = tls_read(ctx, p, howmuch);
+ int n = tls_read(ctx, p, howmuch);
if (n <= 0)
return (n);
Thank you
Seiya
________________________________________
From: Robert Klein [[email protected]]
Sent: Friday, March 03, 2017 4:45 AM
To: Alexander Bluhm
Cc: Gleydson Soares; Seiya Kawashima; [email protected]; [email protected]
Subject: Re: ldapd(8) hangs when receiving large data
On Thu, 2 Mar 2017 22:43:41 +0100
Alexander Bluhm <[email protected]> wrote:
> On Thu, Mar 02, 2017 at 06:08:45PM -0300, Gleydson Soares wrote:
> > Yes, it's using libtls and evbuffer_tls(came from syslogd).
>
> I have copied it from libevent and adapted it to tls.
>
> > I did some tests and I think this hangs with jpeg >=4k and <=15k,
> > other image sizes not triggers the issue.
> > ..weird
>
> libevent uses 4k read blocks and libssl 16k blocks internally.
> This is at least inefficent and I guess could lead to hangs.
>
> Could you test this diff?
>
The issue still shows up, but more erratically: When you can't ldapadd
with a certain image size it may well work when you try a second time
(at least for small size; see below).
First, when using ldap instead of ldaps all image sizes up to at least
70000 bytes succeed (that is the maximum size I tested).
When using ldaps it hangs at different image sizes. I had hangs at:
4339, 4603, 3600, 3179, 3414, 3700, and 5239 byte-sized images using the
script attached, using the default values.
When trying image sizes from upwards down, e.g.:
FIRST_IMAGE_SIZE=48750 LAST_IMAGE_SIZE=10000 DEBUG= ./test2.sh
(test script attached)
I found there seems to be a boundary 16384*n - 409 and less for 16K
sizes where ldapadd may fail:
size 16500 and lower had failures at image sizes of:
- 15974
- 15975 (almost always 15975)
size 33000 and lower (I lowered the upper limit after the first couple
of times):
- 32358
- 32359 (almost always 32358)
size 49152 and lower:
- 48743 (thrice)
- 48742 (twice)
- 48733 (once)
- 48742 (once)
size 65536 and lower (order of appearance):
- 65126
- 65121
- 65126
- 65127
- 65117
- 65124
- 65112
- 65117
- 65122
- 65124
- 65119
- 65125
- 65126
- 65116
- 65117
I also checked sizes around 8K; failures above and below (8163,
8297, 8187) with a command line of
FIRST_IMAGE_SIZE=8200 LAST_IMAGE_SIZE=1 DEBUG= ./test2.sh
ldapadd trace always shows a repeated output of
--- snip ---
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100) = 0 (Timeout)
--- snip ---
or, with debugging enabled:
--- snip ---
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100) = 0 (Timeout)
write(2, "ldap_result ld 0x563bac634770 ms"..., 38) = 38
write(2, "wait4msg ld 0x563bac634770 msgid"..., 57) = 57
write(2, "wait4msg continue ld 0x563bac634"..., 50) = 50
write(2, "** ld 0x563bac634770 Connections"..., 34) = 34
write(2, "* host: r310 port: 636 (defaul"..., 35) = 35
write(2, " refcnt: 2 status: Connected\n", 31) = 31
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
write(2, " last used: Fri Mar 3 08:26:00"..., 39) = 39
write(2, "\n", 1) = 1
write(2, "** ld 0x563bac634770 Outstanding"..., 43) = 43
write(2, " * msgid 2, origid 2, status In"..., 41) = 41
write(2, " outstanding referrals 0, pare"..., 43) = 43
write(2, " ld 0x563bac634770 request coun"..., 50) = 50
write(2, "** ld 0x563bac634770 Response Qu"..., 37) = 37
write(2, " Empty\n", 9) = 9
write(2, " ld 0x563bac634770 response cou"..., 37) = 37
write(2, "ldap_chkResponseList ld 0x563bac"..., 53) = 53
write(2, "ldap_chkResponseList returns ld "..., 52) = 52
write(2, "ldap_int_select\n", 16) = 16
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 100^CProcess 22140 detached
--- snip ---
Best regards
Robert
> bluhm
>
> Index: usr.sbin/ldapd/evbuffer_tls.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/usr.sbin/ldapd/evbuffer_tls.c,v
> retrieving revision 1.1
> diff -u -p -r1.1 evbuffer_tls.c
> --- usr.sbin/ldapd/evbuffer_tls.c 1 May 2016 00:32:37
> -0000 1.1 +++ usr.sbin/ldapd/evbuffer_tls.c 2 Mar 2017
> 21:36:35 -0000 @@ -289,7 +289,7 @@ buffertls_connect(struct buffertls
> *buft
> * Reads data from a file descriptor into a buffer.
> */
>
> -#define EVBUFFER_MAX_READ 4096
> +#define EVBUFFER_MAX_READ 16384
>
> int
> evtls_read(struct evbuffer *buf, int fd, int howmuch, struct tls
> *ctx)
>