Frans,

The mistake in your original code is largely due to the

  BIO_set_mem_eof_return(mem, 0);

call at the start as that one prevents the bio chain from signaling
'should retry' upon error conditions (such as BIO_mem becoming empty,
due to BIO_read pulling the data out of it).
Instead, things should've gone rather better with:

  BIO_set_mem_eof_return(mem, -1);

... though then still the 'should retry' checking code would be
lacking from the code (which is the second part causing your agony).


;-) No sweat: I had to debug the bugger to find out it had to be
BIO_set_mem_eof_return(mem, -1) instead of BIO_set_mem_eof_return(mem,
0). By simply reading your code I didn't spot the issue. Hence I wrote
the next section as much for you as for myself and others, to 'recall'
how it should be.

------------
Summary:

BIO_read --> ret==0: check retry/should flags, otherwise the end.
BIO_read --> ret<0: check retry/should flags, otherwise error.

want to 'auto-recover' from (temporary) EOD/no more data in the BIO
source, then make sure the BIO source spits out a negative value on
such 'end' BIO_read() calls and has set the appropriate retry/want
flags.
BIO_s_mem() and BIO_pair (see second function in attached sample code)
do the latter out of the box (= signal retry on EOD).

-----------
Elaboration:

Okay, what's the issue here: you fill the BIO_mem, fine, then
BIO_read() with a BIO_F_base64 filter in its chain on top of BIO_mem
fetches that data from the BIO_mem source/sink again (which is thus
used as an intermediate  buffer store). As the BIO_read tries to read
as many bytes ('inlen') as there currently are in BIO_mem 'raw
storage', this will run into an 'EOF' signal from BIO_mem.


Why?

Because the chain is set up to DECODE BASE64, i.e. output 'raw input'
is assumed to be base64 enc'ed stuff and BIO_read() should produce the
literal, unencoded bytes. BASE64 encoding clocks in at a conversion
ratio of 4:3, i.e. 4 enc'ed bytes produce 3 decoded bytes. Hence, the
bio chain won't be able to produce than inlen*3/4 bytes, best case,
per round of fread/BIO_mem:write/base64::BIO_read.

And just just because I said 'because' up there, it doesn't mean this
is the problem. This behaviour is not a problem; it is rather to be
expected, and given the variable inputs accepted for BASE64, the 'best
case' in the paragraph above is a sure hint you won't ever be able to
nail the number of decoded/read bytes per round to a sure-fire fixed
number. And if you even would/could, such would lead to very
inflexible, brittle software.


WARNING: beginners' mistake #1: trying to 'tweak' in BIO_read(....,
bufsize): bufsize argument to 'make it just work'. I've seen them try
it, then panic, thus try some more, and there's a special shotgun
waiting in my drawer for those IT 'professionals', loaded with
hollow-point silver tips. ;-) (No worries, mate, drawer's not going to
open today.)

Why is this beginners' mistake a grave one? Because it's trying to
plug a hole by fiddling with a /symptom/ rather than curing the
(hidden) error.


Generally speaking for streams, and BIOs are a particular brand of
those, halting reads (i.e. reads which won't deliver anything anymore
after a certain time/size) happen most often because somewhere down
the line the system has concluded it's stream closing time. If you
don't want that, the question becomes how to prevent the streams from
closing (== signaling 'End Of Stream Data').

Given your choice to use a BIO_mem source/sink for buffering purposes
(I like the idea, though note the caveats!) you have chosen to create
a read-stream with an implicit 'hickuppy EOF' behaviour: every time
you try to read more data than currently resides in the BIO_mem memory
space, you'll get the sensible response from the BIO_mem source/sink:
EOF reached.

To 'lift' the EOF blockade again, once you've stored some more, fresh
data in BIO_mem, you need to make that previous EOF signal provide a
little extra info: 'please retry later'. The wicked bit here is that,
on the outside, it does not so much depend on BIO_read() producing a 0
or -1 return value (0 usually assumed to mean 'EOF reached' in other
systems and OPENSSL generally adheres to the same assumption), but
you've got to check these flags as well to see if it's the end for
real or if the system is somehow aware there might be some more at a
later time:

            BIO_should_retry(bio)
            BIO_should_read(bio)

In your case checking for BIO_should_read() is not really necessary as
you only use the read I/O direction anyhow, but I suggest you check it
anyhow (once you move on to SSL-enabled read and write are not all
that separate anymore: write can trigger BIO_should_read and vice
versa).

The bugger in there is now how to make BIO_mem report BIO_should_retry
every time it runs into 'End Of Data' due to an oversized read. (And
note that those 'BIO_should_retry' and other signals propagate up the
BIO chain!)


Turns out that BIO_mem acts correctly, but my mind had the wrong idea:

normally and EOF/EOD is suggested by a read returning 0. The same goes
for BIO_mem. If you however instruct BIO_mem to report an ERROR signal
(-1) upon BIO_read hitting EOF/EOD instead, that error state is
augmented with should_retry. And that's how all the OPENSSL BIO's act:
BIO_read/BIO_write return error code (negative value, -1)? Then you
should _always_ check those should_retry/want_xyz attributes ('flags')
before you go and treat that negative 'error' return value as a real
error code (with detailed error report available through the
per-thread ERR stack).

You should check those retry flags in all situations where you didn't
get what you expected, i.e. both when read returns 0 or -1, hence:

        if (inlen <= 0)
        {
            if (!BIO_should_retry(bo)
               || !BIO_should_read(bo))

in the code.


Side notes:
-------------

1) BIO_mem also sets should_read/should_retry when BIO_gets() didn't
find a \n terminated line in there, i.e. you don't only need to check
those retry/should flags on 'error', but also when you received actual
data from the BIO chain.

2) several BIO filters will propagate the retry flag, while they
return (remaining) valid data from their own buffers. Again, that
means you can have should_retry while you got data from your
BIO_read().


WARNING: mistake #2: while you most often see 'in situ' that BIO_read
returns -1 while BIO_should_retry is flagged, it is a mistake to check
for those flags only when you get a -1, because there are spurious
cases where the flag is purposely set for another BIO_read/BIO_write
return value.




-----------------------

And that's what the corrected, attached code does. As it's based on a
few other tests of mine, it includes a few more examples how to tackle
base64 decoding + one way to encode to base64 to complete the test
code: see attachment.

Enjoy.


Code has been tested on augmented OpenSSL 0.9.9 HEAD stack on Win32 +
Win64 platforms, compiled with MSVC2005. You'll have to add your own
makefile/project poison to this to build it in your environment.

NOTE: Code still lacks proper error handling for clarity of example.


-- 
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------
#include <stdio.h>
#include <openssl/err.h>
#include <openssl/bio.h>
#include <openssl/evp.h>

/*
 * [i_a] Side note here: BIO_mem::write() will cause the
 *       internal BIO_mem buffer to expand (grow) once, but
 *       after that the mem buffer will remain as is as
 *       BIO_write() is interleaved with emptying the mem
 *       space through BIO_read().
 *
 *
 * Alternative ways of doing things
 * ================================
 *
 * In those cases where you only need an intermediate
 * (semi-)fixed sized buffer, you may consider using a
 * special source/sink BIO instead: BIO_s_bio(), i.e.
 * BIO_new_bio_pair().
 *
 * The benefit of using a BIO_pair is that you'll get
 * bidirectional buffered I/O, where BIO_mem essentially
 * SHARES its mem space (i.e. buffer) between the read
 * and write channels. The latter is fine here, but it's
 * worth a note nonetheless (also see the BIO_f_base64
 * caveats at the bottom).
 *
 * For an equivalent functionality example using BIO_pair,
 * see the BioPairB64_Test() code further below.
 *
 *
 * But wait! Why R/W into and out of an intermediate
 * memory buffer when we can just drop the Base64 filter
 * in the pull (read) or push (write) chain? (Okay, that 
 * wasn't exactly what Frans was looking for, but we show
 * it here for completeness sake.)
 *
 * In this case (we want to decode BASE64 data) it should
 * sit in the pull/read line, as BIO_f_base64 DECODES on
 * read and ENCODES on write.
 *
 * See for those the third example below.
 *
 *
 * CAVEAT for BIO_f_base64() to note: the filter
 * implementation won't play nice with full duplex
 * I/O as a BIO_read() or BIO_write() will discard/clear
 * the internals if they still carry data for the
 * other direction, before resetting/re-init-ing the
 * internals for the new I/O direction. That is why
 * a BIO_write() MUST be followed by a BIO_flush()
 * when the next thing which can happen includes a
 * BIO_read().
 *
 * The 'read' side is a little 'funny' as it don't
 * come with a action terminating EVP_DecodeFinal()
 * in there (BIO_f_base64()) which means BIO_f_base64
 * EXPECTS a [series of] BIO_read() to completely
 * decode incoming data, before I/O direction is
 * switched back to BIO_write() again.
 *
 * All this wickedness is due to the fact that read and
 * write directions in BIO_f_base64 share a common
 * BIO_B64_CTX instance.
 *
 * There are two resolutions for this, both using an
 * 'augmented' version of BIO_f_base64: either fix
 * the encode/decode direction on first use of through
 * BIO_ctrl()-based preset, or create a base64 BIO which
 * has separate BIO_B64_CTX structs for both I/O
 * directions (read / write).
 */


/*
 * Sample BASE64 data for test: (copy & paste for your use
 * between the '-' lines...)
 *
 * base64_in:
 * -start-------------------------------------------------------
UmVtYXJrcw0KVGhlIGNsZWFyZXJyIGZ1bmN0aW9uIHJlc2V0cyB0aGUgZXJyb3Ig
aW5kaWNhdG9yIGFuZCBlbmQtb2YtZmlsZSBpbmRpY2F0b3IgZm9yIHN0cmVhbS4g
RXJyb3IgaW5kaWNhdG9ycyBhcmUgbm90IGF1dG9tYXRpY2FsbHkgY2xlYXJlZDsg
b25jZSB0aGUgZXJyb3IgaW5kaWNhdG9yIGZvciBhIHNwZWNpZmllZCBzdHJlYW0g
aXMgc2V0LCBvcGVyYXRpb25zIG9uIHRoYXQgc3RyZWFtIGNvbnRpbnVlIHRvIHJl
dHVybiBhbiBlcnJvciB2YWx1ZSB1bnRpbCBjbGVhcmVyciwgZnNlZWssIGZzZXRw
b3MsIG9yIHJld2luZCBpcyBjYWxsZWQuDQoNCklmIHN0cmVhbSBpcyBOVUxMLCB0
aGUgaW52YWxpZCBwYXJhbWV0ZXIgDQoNCg0KDQo=
 * -end-------------------------------------------------------
 *
 * raw_out (which is a bit of text ripped from MSVC2005 documentation
 * and UTTERLY irrelevant to this example; it's just the text I
 * used to produce a test sample (sample program #4 below: the 'push'
 * chain which does the encoding)):
 * -start-------------------------------------------------------
Remarks
The clearerr function resets the error indicator and end-of-file
indicator for stream. Error indicators are not automatically cleared;
once the error indicator for a specified stream is set, operations
on that stream continue to return an error value until clearerr,
fseek, fsetpos, or rewind is called.

If stream is NULL, the invalid parameter


 * -end-------------------------------------------------------
 * WARNING: the line wrapping of your decoded output will certainly
 * differ from the 'raw' sample text shown above.
 *
 *                                               Ger Hobbelt, 2009/jan
 */



/*
 * This first item is based on sample code by Frans B. Brokken as
 * posted to the OpenSSL-users mailing list; corrections are tagged
 * with '[i_a]'
 */
#ifdef MEGA_MONOLITH /* [i_a] */
int Issue20090104_Test(int argc, char *argv[])
#else
int main_1(int argc, char *argv[])
#endif
{
    BIO *bio, *b64;
    char inbuf[500];
    int inlen;
    BIO *mem;

    b64 = BIO_new(BIO_f_base64());     // define BIOs
    mem = BIO_new(BIO_f_buffer());
    mem = BIO_new(BIO_s_mem());

    bio = BIO_push(b64, mem);          // set up the chain

    BIO_set_mem_eof_return(mem, -1);    // define s_mem eof:
    // [i_a] ^^^ must be set to a NON-ZERO value to ensure
    // you're getting a retry_read signal as well!

    // read info from some source
    while ((inlen = fread(inbuf, 1, 500, stdin)) > 0)
    {
        BIO_write(mem, inbuf, inlen);  // put it in the s_mem buffer
        /* BIO_flush(mem);   -- [i_a] not needed */

        // Doesn't work: [i_a] *does work!* it's the retry-read that's important
        while (1)
        {
            // read what's already available
            inlen = BIO_read(bio, inbuf, 500 /* inlen */);
            if (inlen <= 0)             // no more, then done
            {
                break;
            }

            // write decoded info to a dest.
            fwrite(inbuf, 1, inlen, stdout);
        }

        /*
         * [i_a] as the base64 encoder may gobble any amount of
         *       BIO_mem stored bytes, we can reach a premature
         *       EOF state in BIO_mem, so we'd better
         *       check our retry state before calling it quits!
         */
        if (inlen <= 0)             // no more, then done
        {
            if (!BIO_should_retry(bio)
               || !BIO_should_read(bio))
            {
                break;
            }
        }
    }

#if 0 /* [i_a] */
      // same procedure, but now write to the destination after first
    // reading all info into s_mem
    for (;;)
    {
        inlen = BIO_read(bio, inbuf, 200);
        if (inlen <= 0)
            break;
        fwrite(inbuf, 1, inlen, stdout);
    }
#endif

    BIO_free_all(bio);
    return 0;
}






#ifdef MEGA_MONOLITH /* [i_a] */
int BioPairB64_Test(int argc, char *argv[])
#else
int main_2(int argc, char *argv[])
#endif
{
    BIO *b64;
    BIO *bi;
    BIO *bo;
    char inbuf[500];
    int inlen;

    BIO_new_bio_pair(&bi, 1024, &bo, 1024);

    b64 = BIO_new(BIO_f_base64());     // define BIOs

    bo = BIO_push(b64, bo);          // set up the chain

    // read info from some source
    while ((inlen = fread(inbuf, 1, 500, stdin)) > 0)
    {
        BIO_write(bi, inbuf, inlen);  // put it in the s_mem buffer

        while (1)
        {
            // read what's already available
            inlen = BIO_read(bo, inbuf, inlen);
            if (inlen <= 0)             // no more, then done
            {
                break;
            }

            // write decoded info to a dest.
            fwrite(inbuf, 1, inlen, stdout);
        }

        /*
         * [i_a] as the base64 encoder may gobble any amount of
         *       BIO_mem stored bytes, we can reach a premature
         *       EOF state in BIO_mem, so we'd better
         *       check our retry state before calling it quits!
         */
        if (inlen <= 0)             // no more, then done
        {
            if (!BIO_should_retry(bo)
               || !BIO_should_read(bo))
            {
                break;
            }
        }
    }

    BIO_free_all(bi);
    BIO_free_all(bo);

    return 0;
}







#ifdef MEGA_MONOLITH /* [i_a] */
int B64pull_Test(int argc, char *argv[])
#else
int main_3(int argc, char *argv[])
#endif
{
    BIO *bi;
    BIO *bo;
    char inbuf[500];
    int inlen;

    clearerr(stdin);
    bi = BIO_new_fp(stdin, BIO_NOCLOSE);
    bi = BIO_push(BIO_new(BIO_f_base64()), bi);          // set up the chain

    bo = BIO_new_fp(stdout, BIO_NOCLOSE);  // for this example, do everything using BIO's

    // read info from some source
    while (!BIO_eof(bi))
    {
        inlen = BIO_read(bi, inbuf, 500);
        if (inlen <= 0)                     // no more, then done
        {
            /*
             * [i_a] as the base64 encoder may gobble any amount of
             *       BIO_mem stored bytes, we can reach a premature
             *       EOF state in BIO_mem (after all, it is expanded
             *       in the outer loop as we talk), so we'd better
             *       check our retry state before calling it quits!
             */
            if (!BIO_should_retry(bi)
               || !BIO_should_read(bi))
            {
                break;
            }
        }

        // write decoded info to a dest.
        if (inlen > 0)
        {
            BIO_write(bo, inbuf, inlen);
        }
    }

    // theoretically, you should also do this, before you go free_all():
    BIO_flush(bo);

    BIO_free_all(bi);
    BIO_free_all(bo);

    return 0;
}







#ifdef MEGA_MONOLITH /* [i_a] */
int B64push_Test(int argc, char *argv[])
#else
int main_4(int argc, char *argv[])
#endif
{
    BIO *bi;
    BIO *bo;
    char inbuf[500];
    int inlen;

    clearerr(stdin);
    bi = BIO_new_fp(stdin, BIO_NOCLOSE);

    bo = BIO_new_fp(stdout, BIO_NOCLOSE);                // for this example, do everything using BIO's
    bo = BIO_push(BIO_new(BIO_f_base64()), bo);          // set up the chain

    // read info from some source
    while (!BIO_eof(bi))
    {
        inlen = BIO_read(bi, inbuf, 500);
        if (inlen <= 0)                     // no more, then done
        {
            /*
             * [i_a] as the base64 encoder may gobble any amount of
             *       BIO_mem stored bytes, we can reach a premature
             *       EOF state in BIO_mem (after all, it is expanded
             *       in the outer loop as we talk), so we'd better
             *       check our retry state before calling it quits!
             */
            if (!BIO_should_retry(bi)
               || !BIO_should_read(bi))
            {
                break;
            }
        }

        // write decoded info to a dest.
        if (inlen > 0)
        {
            BIO_write(bo, inbuf, inlen);
        }
    }

    // theoretically, you should also do this, before you go free_all():
    BIO_flush(bo);

    BIO_free_all(bi);
    BIO_free_all(bo);

    return 0;
}







#ifdef MEGA_MONOLITH /* [i_a] */
int ia4Base64demos_Test(int argc, char *argv[])
#else
int main(int argc, char *argv[])
#endif
{
    switch (atoi(argc == 2 ? argv[1] : "0"))
    {
    default:
        printf("%s [arg]\n"
               "\n"
               "Options:\n"
               "  arg:\n"
               "    1, 2, 3 or 4: pick your code example (all four act the same)\n"
               "    where 1 using a BIO_mem, 2 uses a BIO_pair and 3 is a BIO\n"
               "    chain in pull mode (read), while 4 is a BIO chain in push\n"
               "    mode (write) which is like 3 but ENCODES raw input to BASE64.\n"
               "\n"
               "Operation:\n"
               "  Takes BASE64-encoded input from stdin and outputs the decoded\n"
               "  result to stdout.\n"
              , argv[0]);
        return 0;

    case 1:
#ifdef MEGA_MONOLITH /* [i_a] */
        return Issue20090104_Test(argc, argv);
#else
        return main_1(argc, argv);
#endif

    case 2:
#ifdef MEGA_MONOLITH /* [i_a] */
        return BioPairB64_Test(argc, argv);
#else
        return main_2(argc, argv);
#endif

    case 3:
#ifdef MEGA_MONOLITH /* [i_a] */
        return B64pull_Test(argc, argv);
#else
        return main_3(argc, argv);
#endif

    case 4:
#ifdef MEGA_MONOLITH /* [i_a] */
        return B64push_Test(argc, argv);
#else
        return main_4(argc, argv);
#endif
    }
}

Reply via email to