On Mon Nov 24 21:52:04 2014, prav...@viptela.com wrote: > * state = 4384,*
This is SSL3_ST_CR_SRVR_HELLO_A, i.e. we are trying to read a ServerHello. This confirms what we expected. > > So if s->init_num is 0 then frag_len is 0 and frag->fragment gets > set to > > NULL. What I missed in the above is that there are some OPENSSL_assert calls in dtls_buffer_message that check init_num, so it cannot be 0. Something else is happening. > *Agreed. All good points. Just another data point, is that we ran > valgrind > on another node, saw a leak in this related code. See if this helps > you.* > > *==697== HEAP SUMMARY: > ==697== in use at exit: 1,282,108 bytes in 20,788 blocks > ==697== total heap usage: 664,349 allocs, 643,561 frees, 105,419,006 > bytes allocated > ==697== > ==697== 120 bytes in 1 blocks are definitely lost in loss record 27 of > 96 > ==697== at 0x4A05F80: malloc (vg_replace_malloc.c:296) > ==697== by 0x5BFBC86: default_malloc_ex (mem.c:79) > ==697== by 0x5BFC315: CRYPTO_malloc (mem.c:308) > ==697== by 0x5955875: dtls1_hm_fragment_new (d1_both.c:199) > ==697== by 0x5956817: dtls1_reassemble_fragment (d1_both.c:625) > ==697== by 0x595720A: dtls1_get_message_fragment (d1_both.c:852) > ==697== by 0x5956174: dtls1_get_message (d1_both.c:443) > ==697== by 0x59504DA: dtls1_get_hello_verify (d1_clnt.c:918) > ==697== by 0x594F5AB: dtls1_connect (d1_clnt.c:360) > ==697== by 0x595B591: SSL_connect (ssl_lib.c:949) > ==697== by 0x430409: ssl_connect_timer_cb (vdaemon_peer.c:303) > ==697== by 0x48573E: timer_exec_pri (timer.c:612) > ==697== > ==697== LEAK SUMMARY: > ==697== definitely lost: 120 bytes in 1 blocks > ==697== indirectly lost: 0 bytes in 0 blocks > ==697== possibly lost: 0 bytes in 0 blocks > ==697== still reachable: 1,281,988 bytes in 20,787 blocks > ==697== suppressed: 0 bytes in 0 blocks > ==697== Reachable blocks (those to which a pointer was found) are not > shown. > ==697== To see them, rerun with: --leak-check=full > --show-leak-kinds=all > ==697== > ==697== For counts of detected and suppressed errors, rerun with: -v > ==697== Use --track-origins=yes to see where uninitialised values come > from * > > *==697== ERROR SUMMARY: 126394 errors from 117 contexts (suppressed: 1 > from > 1)* > That's very interesting. I've tracked that down to a problem in dtls1_clear_queues which is failing to correct free bufferred fragments. I've attached a patch. Please let me know if you have any problems with it. Unfortunately I think this is unconnected to your main problem. > > > > If I sent you some instrumented code would you be able to apply it > and see > > if > > that helps us narrow down what's going on? > > > > *[viptela.com <http://viptela.com>] * > > *Ofcourse. But as I mentioned earlier, we dont know the likelyhood of > this > happening again. Please send me any instrumented patch. We will keep > trying.* Ok, thanks. I've attached a second patch which adds a number of OPENSSL_assert calls at various points to check that frag->fragment is not null. I'm hoping it will help us track down why its not being correctly set. If you get another crash with this patch applied, then please capture the core and let me know what you find out. Thanks Matt
>From 90e37eb304a697e37ebd857ea5456435fa236bc9 Mon Sep 17 00:00:00 2001 From: Matt Caswell <m...@openssl.org> Date: Tue, 25 Nov 2014 13:36:00 +0000 Subject: [PATCH] Fixed memory leak due to incorrect freeing of DTLS reassembly bit mask --- ssl/d1_both.c | 3 +-- ssl/d1_lib.c | 6 ++---- ssl/ssl_locl.h | 1 + 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/ssl/d1_both.c b/ssl/d1_both.c index 2e4250f..b58fdc2 100644 --- a/ssl/d1_both.c +++ b/ssl/d1_both.c @@ -211,8 +211,7 @@ dtls1_hm_fragment_new(unsigned long frag_len, int reassembly) return frag; } -static void -dtls1_hm_fragment_free(hm_fragment *frag) +void dtls1_hm_fragment_free(hm_fragment *frag) { if (frag->msg_header.is_ccs) diff --git a/ssl/d1_lib.c b/ssl/d1_lib.c index 82ca653..f7d681b 100644 --- a/ssl/d1_lib.c +++ b/ssl/d1_lib.c @@ -161,16 +161,14 @@ static void dtls1_clear_queues(SSL *s) while( (item = pqueue_pop(s->d1->buffered_messages)) != NULL) { frag = (hm_fragment *)item->data; - OPENSSL_free(frag->fragment); - OPENSSL_free(frag); + dtls1_hm_fragment_free(frag); pitem_free(item); } while ( (item = pqueue_pop(s->d1->sent_messages)) != NULL) { frag = (hm_fragment *)item->data; - OPENSSL_free(frag->fragment); - OPENSSL_free(frag); + dtls1_hm_fragment_free(frag); pitem_free(item); } diff --git a/ssl/ssl_locl.h b/ssl/ssl_locl.h index 98888d2..d97d4ca 100644 --- a/ssl/ssl_locl.h +++ b/ssl/ssl_locl.h @@ -998,6 +998,7 @@ int dtls1_is_timer_expired(SSL *s); void dtls1_double_timeout(SSL *s); int dtls1_send_newsession_ticket(SSL *s); unsigned int dtls1_min_mtu(void); +void dtls1_hm_fragment_free(hm_fragment *frag); /* some client-only functions */ int ssl3_client_hello(SSL *s); -- 2.1.0
>From 6dc8cf337fe41b9311d8d15fe70f9ea4e07c7b06 Mon Sep 17 00:00:00 2001 From: Matt Caswell <m...@openssl.org> Date: Tue, 25 Nov 2014 14:15:47 +0000 Subject: [PATCH] Liberal sprinkling of asserts to try and catch the frag->fragment problem --- ssl/d1_both.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/ssl/d1_both.c b/ssl/d1_both.c index 2e4250f..b082a05 100644 --- a/ssl/d1_both.c +++ b/ssl/d1_both.c @@ -1170,6 +1170,7 @@ dtls1_retransmit_buffered_messages(SSL *s) for ( item = pqueue_next(&iter); item != NULL; item = pqueue_next(&iter)) { frag = (hm_fragment *)item->data; + OPENSSL_assert(frag->fragment != NULL); if ( dtls1_retransmit_message(s, (unsigned short)dtls1_get_queue_priority(frag->msg_header.seq, frag->msg_header.is_ccs), 0, &found) <= 0 && found) @@ -1193,7 +1194,11 @@ dtls1_buffer_message(SSL *s, int is_ccs) * been serialized */ OPENSSL_assert(s->init_off == 0); + /* Sanity checks */ + OPENSSL_assert(s->init_num != 0); frag = dtls1_hm_fragment_new(s->init_num, 0); + OPENSSL_assert(frag->fragment != NULL); + if (!frag) return 0; @@ -1244,6 +1249,9 @@ dtls1_buffer_message(SSL *s, int is_ccs) #endif pqueue_insert(s->d1->sent_messages, item); + + /* Check nothing untoward happened since we created it! */ + OPENSSL_assert(frag->fragment != NULL); return 1; } @@ -1280,6 +1288,7 @@ dtls1_retransmit_message(SSL *s, unsigned short seq, unsigned long frag_off, *found = 1; frag = (hm_fragment *)item->data; + OPENSSL_assert(frag->fragment != NULL); if ( frag->msg_header.is_ccs) header_length = DTLS1_CCS_HEADER_LENGTH; -- 2.1.0