hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>
and with time, implement base32 and base16: MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800> base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX() thanks guys 2013/9/17 Alexander Barkov <b...@mariadb.org> > Hi Monty, > > thanks for review. > > I have addressed most of your suggestions. See the new version attached, > and the detailed comments inline: > > > > On 09/12/2013 04:32 PM, Michael Widenius wrote: > >> >> Hi! >> >> Here is the review for the code that we should put into 10.0 >> >> First the base64: >> >> === modified file 'mysql-test/t/func_str.test' >>> --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 >>> +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 >>> @@ -1555,3 +1555,118 @@ drop table t1,t2; >>> --echo # End of 5.5 tests >>> --echo # >>> >>> + >>> +--echo # >>> +--echo # Start of 5.6 tests >>> +--echo # >>> >> >> Shouldn't this be start of 10.0 tests ? >> (I know that this code is from MySQL 5.6, but still for us this is >> 10.0...) >> > > This code is (almost) copy-and-paste from MySQL-5.6. > I think it's a good idea when looking inside a test file > to be able to see which tests are coming from MySQL-5.6, > and which tests are coming from MariaDB-10.0. > > I'd suggest to keep "Start of 5.6 tests" in this particular case, > and also when merging the tests for the other merged MySQL-5.6 features. > > > > >> === modified file 'mysys/base64.c' >>> --- mysys/base64.c 2011-06-30 15:46:53 +0000 >>> +++ mysys/base64.c 2013-03-09 06:22:59 +0000 >>> @@ -1,5 +1,4 @@ >>> -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. >>> - Use is subject to license terms. >>> +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights >>> reserved. >>> >> >> Removed 'all rights reserved'. You can't have that for GPL code. >> >> If there is any new code from us, please add a copyright message for the >> MariaDB foundation too! >> > > Removed 'all rights reserved' and added "MariaDB", as there are some > our own changes. > > > >> >>> This program is free software; you can redistribute it and/or modify >>> it under the terms of the GNU General Public License as published by >>> @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ >>> "abcdefghijklmnopqrstuvwxyz" >>> "0123456789+/"; >>> >>> +/** >>> + * Maximum length base64_needed_encoded_length() >>> + * can handle without overflow. >>> + */ >>> +int >>> +base64_encode_max_arg_length(**) >>> +{ >>> +#if (SIZEOF_INT == 8) >>> + /* >>> + 6827690988321067803 -> 9223372036854775805 >>> + 6827690988321067804 -> -9223372036854775807 >>> + */ >>> >> >> Please describe from where the following number comes from >> (so that anyone can recalculate and check the number if needed): >> > > I added more comments where these numbers come from. > > > >> + return 0x5EC0D4C77B03531BLL >>> +#else >>> + /* >>> + 1589695686 -> 2147483646 >>> + 1589695687 -> -2147483645 >>> + */ >>> + return 0x5EC0D4C6; >>> +#endif >>> +} >>> + >>> >>> int >>> base64_needed_encoded_length(**int length_of_data) >>> @@ -39,6 +60,21 @@ base64_needed_encoded_length(**int length_ >>> } >>> >>> >>> +/** >>> + * Maximum length base64_needed_decoded_length() >>> + * can handle without overflow. >>> + */ >>> +int >>> +base64_decode_max_arg_length(**) >>> +{ >>> >> >> The following also need a comment. >> > > Also added comments. > > > >> +#if (SIZEOF_INT == 8) >>> + return 0x2AAAAAAAAAAAAAAALL; >>> +#else >>> + return 0x2AAAAAAA; >>> +#endif >>> +} >>> + >>> + >>> >> >> <cut> >> >> +/** >>> + * Get the next character from a base64 encoded stream. >>> + * Skip spaces, then scan the next base64 character or a pad character >>> + * and collect bits into decoder->c. >>> + * >>> + * @param decoder Pointer to MY_BASE64_DECODER >>> + * @return >>> + * FALSE on success (a valid base64 encoding character found) >>> + * TRUE on error (unexpected character or unexpected end-of-input >>> found) >>> + */ >>> >> >> Add an empty line here. >> > > Done. > > > >> It would also be good to have a good description of the base64 format >> in this file so that one can understand what the functions are >> supposed to do (or at least a link to the specification). >> >> Some thing I figured out by asking bar on IRC: >> >> - We never generate spaces >> - When decoding, we allow spaces anywhere. >> - The end of an 64 base decoded string is always padded with '=' >> so that the final length is dividable with 4. >> > > I added a reference to > http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64> > , > as well all added these your comments in proper places of the code. > > > > >> +static inline my_bool >>> +my_base64_decoder_getch(MY_**BASE64_DECODER *decoder) >>> >> >> Remove inline from the above; It's likely to make the code slower, not >> faster as this is a big function with lot of if's. >> > > Removed "inline". > > > >> { >>> - char b[3]; >>> - size_t i= 0; >>> - char *dst_base= (char *)dst; >>> - char const *src= src_base; >>> - char *d= dst_base; >>> - size_t j; >>> + if (my_base64_decoder_skip_**spaces(decoder)) >>> + return TRUE; /* End-of-input */ >>> >>> - while (i < len) >>> + if (!my_base64_add(decoder)) /* Valid base64 character found */ >>> { >>> - unsigned c= 0; >>> - size_t mark= 0; >>> + if (decoder->mark) >>> + { >>> + /* If we have scanned '=' already, then only '=' is valid */ >>> + DBUG_ASSERT(decoder->state == 3); >>> + decoder->error= 1; >>> + decoder->src--; >>> + return TRUE; /* expected '=', but encoding character found */ >>> + } >>> + decoder->state++; >>> + return FALSE; >>> + } >>> >> >> If you want to have the funtion inline, then move the following to >> another not inline function; We don't need to generate code for this >> for every function call: >> >> + >>> + /* Process error */ >>> + switch (decoder->state) >>> + { >>> + case 0: >>> + case 1: >>> + decoder->src--; >>> + return TRUE; /* base64 character expected */ >>> + break; >>> + >>> + case 2: >>> + case 3: >>> + if (decoder->src[-1] == '=') >>> + { >>> + decoder->error= 0; /* Not an error - it's a pad character */ >>> + decoder->mark++; >>> + } >>> + else >>> + { >>> + decoder->src--; >>> + return TRUE; /* base64 character or '=' expected */ >>> + } >>> + break; >>> + default: >>> + DBUG_ASSERT(0); >>> + return TRUE; /* Wrong state, should not happen */ >>> + } >>> >> >> <cut> >> >> Add to the comment for base64_decode that the 'dst' buffer has to be >> at least 2 character longer than what is needed to decode src_base. >> > > I added this comment: > > * Note: 'dst' must have sufficient space to store the decoded data. > * Use base64_needed_decoded_length() to calculate the correct space size. > > > > > > >> +int >>> +base64_decode(const char *src_base, size_t len, >>> + void *dst, const char **end_ptr, int flags) >>> +{ >>> + if (my_base64_decoder_getch(&**decoder) || >>> + my_base64_decoder_getch(&**decoder) || >>> + my_base64_decoder_getch(&**decoder) || >>> + my_base64_decoder_getch(&**decoder)) >>> + break; >>> >> >> Generating the error handling with a switch for every of the >> above is wasteful. See previous comment. >> > > Right. my_base64_decoder_getch() is not inline anymore. > > > >> + /* Return error if there are more non-space characters */ >>> + decoder.state= 0; >>> + if (!my_base64_decoder_skip_**spaces(&decoder)) >>> + decoder.error= 1; >>> + >>> if (end_ptr != NULL) >>> >> >> Note that base64_needed_decoded_length() used ceil() when it's not needed. >> > > Removed. > > > > >> <cut> >> >> === modified file 'sql/item_strfunc.cc' >>> @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_**length_a >>> set_persist_maybe_null(1); >>> } >>> >>> + >>> +void Item_func_to_base64::fix_**length_and_dec() >>> +{ >>> + maybe_null= args[0]->maybe_null; >>> + collation.set(default_charset(**), DERIVATION_COERCIBLE, >>> MY_REPERTOIRE_ASCII); >>> >> >> Maybe better to cast both arguments to (ulonglong) as the rest of the code >> is depending on this. >> >> Also, why is base64_encode_max_arg_length() int instead of uint or >> even better size_t? >> > > This is because base64_encode() and base64_decode() are used > in the replication code using "int" as return value. > > When exposing TO_BASE64() and FROM_BASE64() to the SQL level, > I did not want to touch the replication code. > > > >> + if (args[0]->max_length > (uint) base64_encode_max_arg_length()**) >>> + { >>> + maybe_null= 1; >>> + fix_char_length_ulonglong((**ulonglong) >>> base64_encode_max_arg_length()**); >>> + } >>> + else >>> + { >>> + int length= base64_needed_encoded_length((**int) >>> args[0]->max_length); >>> + DBUG_ASSERT(length > 0); >>> >> >> Don't think assert is needed as the function gurantees it already. >> >> + fix_char_length_ulonglong((**ulonglong) length - 1); >>> + } >>> +} >>> >> >> <cut> >> >> +String *Item_func_from_base64::val_**str(String *str) >>> +{ >>> + String *res= args[0]->val_str_ascii(str); >>> + bool too_long= false; >>> + int length; >>> + const char *end_ptr; >>> + >>> + if (!res || >>> + res->length() > (uint) base64_decode_max_arg_length() || >>> + (too_long= >>> + ((uint) (length= base64_needed_decoded_length((**int) >>> res->length())) > >>> + current_thd->variables.max_**allowed_packet)) || >>> + tmp_value.alloc((uint) length) || >>> + (length= base64_decode(res->ptr(), (int) res->length(), >>> + (char *) tmp_value.ptr(), &end_ptr, 0)) < >>> 0 || >>> + end_ptr < res->ptr() + res->length()) >>> + { >>> >> >> Shouldn't we get a readable error or warning if the string contained wrong >> characters? >> Now it looks like we will just return null. >> >> Something like 'Malformed base64 string. Error at position %d" would >> be nice. >> > > Good idea. I have added a warning. Now it looks clear what went wrong > if FROM_BASE64() returns NULL. > > > > >> >> + null_value= 1; // NULL input, too long input, OOM, or badly formed >>> input >>> + if (too_long) >>> + { >>> + push_warning_printf(current_**thd, Sql_condition::WARN_LEVEL_** >>> WARN, >>> + ER_WARN_ALLOWED_PACKET_**OVERFLOWED, >>> + ER(ER_WARN_ALLOWED_PACKET_**OVERFLOWED), >>> func_name(), >>> + current_thd->variables.max_**allowed_packet); >>> + } >>> + return 0; >>> + } >>> + tmp_value.length((uint) length); >>> + null_value= 0; >>> + return &tmp_value; >>> +} >>> >> >> <cut> >> >> <cut> > > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-developers > Post to : maria-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-developers > More help : https://help.launchpad.net/ListHelp > > -- Roberto Spadim SPAEmpresarial
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp