On 1/20/16, Pavel Stehule <pavel.steh...@gmail.com> wrote: > ... > New version is attached > > Regards > Pavel
Hello! 1. Why the function is marked as VOLATILE? It depends only on input value, it does change nothing in the DB. I guess it should be IMMUTABLE for efficient caching its result. 2. > text *arg = PG_GETARG_TEXT_PP(0); ... > str = text_to_cstring(arg); > Hmm... My question was _why_ you get TEXT and convert it instead of getting an argument of type cstring (not about usage of VARSIZE_ANY_EXHDR). It was enough to give me a link[1] and leave the usage of VARSIZE_ANY_EXHDR as is. I think all the other claims below are mostly cosmetic. Main behavior is OK (considering its usage will not be in heavy queries). === Documentation: Besides currently added row in a table I think it is worth to add detailed comment after a block "<function>pg_size_pretty</> can be used" similar to (but the best way is to get advice for a correct phrase): "pg_size_bytes can be used to get a size in bytes as bigint from a human-readable format for a comparison purposes (it is the opposite to pg_size_pretty function). The format is numeric with an optional size unit: kB, MB, GB or TB." (and delete unit sizes from the table row). === Tests: Since there was a letter with an explanation why units bigger than "TB" can't be used[2], it is a good reason to add that size units ("PB", "EB", "ZB") in tests with a note to update the documentation part when that unit are supported. === Code style: 1. When you declare pointers, you must align _names_ by the left border, i.e. asterisks must be at one position to the left from the aligned names, i.e. one to three spaces instead of TAB before them. 2. > int multiplier; One more TAB to align it with the name at the next line. === Code: > /* allow whitespace between integer and unit */ May be "numeric" instead of "integer"? > "\"%s\" is not in expected format" It is not clear what format is expected. > /* ignore plus symbol */ It seems it is not ignored, but copied... ;-) > to ger hintstr s/ger/get/ > Elsewhere is used space trimmed buffer. I'd write it as "Otherwise trimmed value is used." I'm not good at English, but that full block looks little odd for me. I tried to reword it in the attachment single_buffer.c, but I don't think it is enough. > We allow ending spaces. "trailing"? > but this message can be little bit no intuitive - better text is "is not a > valid number" It was a block with a comment "don't allow empty string", i.e. absence of a number... > Automatic memory deallocation doesn't cover all possible situations where > the function can be used - for example DirectFunctionCall - so explicit > deallocation can descrease a memory requirements when you call these > functions from C. DirectFunctionCall uses a memory context of the caller. For example, you don't free "num" allocated by numeric_in and numeric_mul, also mul_num allocated by int8_numeric... I'd ask experienced hackers for importance of freeing (or leaving) "str" and "buffer". > postgres=# select pg_size_bytes('+912+ kB'); > ERROR: invalid unit: "+ kB" > This is difficult - you have to divide string to two parts and first world is > number, second world is unit. Your code looks like a duplicated (not by lines, but by behavior). numeric_in has similar checks, let it do them for you. The goal of your function is just split mantissa and size unit, and if the last one is found, turn it into an exponent. See my example in the attachment check_mantissa_by_numeric_in.c. There is just searching non-space char that can't be part of numeric. Then all before it passes to numeric_in and let it check anything it should. I even left first spaces to show full numeric part to a user if an error occurs (for any case). The second attachment single_buffer.c is an attempt to avoid allocating the second buffer (the first is "str" allocated by text_to_cstring) and copying into it. I don't think it gives a big improvement, but nevertheless. === [1] http://www.postgresql.org/message-id/29618.1451882...@sss.pgh.pa.us [2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=dhb3o0pc-nx1v3xjszkn3z9kbexgcq...@mail.gmail.com -- Best regards, Vitaly Burovoy
/* * Convert human readable size to long int. * * Due to support for decimal values and case insensitivity of units * a function parse_int cannot be used. */ Datum pg_size_bytes(PG_FUNCTION_ARGS) { text *arg = PG_GETARG_TEXT_PP(0); char *str, *strptr; char *buffer, *bufptr; Numeric num; int64 result; size_t len; str = text_to_cstring(arg); strptr = str; /* Skip leading spaces */ while (isspace((unsigned char) *strptr)) strptr++; bufptr = strptr; for(;*strptr != '\0'; strptr++) { if (isdigit((unsigned char) *strptr) || isspace((unsigned char) *strptr) || *strptr == '.' || *strptr == '+' || *strptr == '-') continue; break; } /* don't allow empty string */ if (bufptr == strptr) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("\"%s\" is not a number", str))); /* prepare buffer for parts of parsed input string */ buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1); len = strptr - str; memcpy(buffer, str, len); buffer[len] = '\0'; num = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(buffer), 0, -1)); /* Handle possible unit */ if (*strptr != '\0') { int multiplier; Numeric mul_num; const char *hintmsg; const char *unitstr; bufptr = buffer; /* * unitstr holds complete string related to unit part. Used * as part of error message. The parse_memory_unit is called * with this value, when we detected invalid format, and we * would to emit error result to ger hintstr. Elsewhere is * used space trimmed buffer. */ unitstr = strptr; /* copy chars to buffer and stop on first space or end string */ while (*strptr != '\0' && !isspace((unsigned char) *strptr)) *bufptr++ = *strptr++; *bufptr = '\0'; /* We allow ending spaces. Skip all spaces. */ while (isspace((unsigned char) *strptr)) strptr++; /* Use buffer as unitstr, when we didn't find more words. */ if (*strptr == '\0') unitstr = buffer; /* Still, the unit can be invalid: too long, unknown */ if (!parse_memory_unit(unitstr, &multiplier, &hintmsg)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid unit: \"%s\"", unitstr), errhint("%s", _(hintmsg)))); /* * Now, the multiplier is in KB units. It should be multiplied by 1024 * before usage. */ mul_num = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum(multiplier * 1024L))); num = DatumGetNumeric(DirectFunctionCall2(numeric_mul, NumericGetDatum(mul_num), NumericGetDatum(num))); } result = DatumGetInt64(DirectFunctionCall1(numeric_int8, NumericGetDatum(num))); PG_RETURN_INT64(result); }
/* * Convert human readable size to long int. * * Due to support for decimal values and case insensitivity of units * a function parse_int cannot be used. */ Datum pg_size_bytes(PG_FUNCTION_ARGS) { text *arg = PG_GETARG_TEXT_PP(0); char *str, *strptr; Numeric num; int64 result; char savedchar; char *tmpptr; str = text_to_cstring(arg); strptr = str; /* Skip leading spaces */ while (isspace((unsigned char) *strptr)) strptr++; /* The first non-space char to check for emptiness later */ tmpptr = strptr; for(;*strptr != '\0'; strptr++) { if (isdigit((unsigned char) *strptr) || isspace((unsigned char) *strptr) || *strptr == '.' || *strptr == '+' || *strptr == '-') continue; break; } /* don't allow empty string */ if (tmpptr == strptr) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("\"%s\" is not a number", str))); /* * Save the first char after possible number with space chars after it * and set end-of-string mark to get cstring for "numeric_in". * If there is no unit part in the arg then savedchar is '\0'. */ savedchar = *strptr; *strptr = '\0'; num = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(str), 0, -1)); if (savedchar != '\0') { int multiplier; Numeric mul_num; const char *hintmsg; const char *unitstr; /* * unitstr points to a complete string related to a unit part. It is * used as a part of an error message. The parse_memory_unit is called * with this value, even if invalid format is detected to get hintstr * for the error message. Otherwise trimmed value is used. */ unitstr = strptr; *strptr = savedchar; /* Restore value which was replaced by '\0' */ while (*strptr != '\0' && !isspace((unsigned char) *strptr)) strptr++; tmpptr = strptr; /* The first space char to set there '\0' later */ while (isspace((unsigned char) *strptr)) strptr++; /* If there are only spaces after the unit part, trim them. */ if (*strptr == '\0') *tmpptr = '\0'; /* Still, the unit can be invalid: too long, unknown */ if (!parse_memory_unit(unitstr, &multiplier, &hintmsg)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid unit: \"%s\"", unitstr), errhint("%s", _(hintmsg)))); /* * Now, the multiplier is in KB units. It should be multiplied by 1024 * before usage. */ mul_num = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum(multiplier * 1024L))); num = DatumGetNumeric(DirectFunctionCall2(numeric_mul, NumericGetDatum(mul_num), NumericGetDatum(num))); } result = DatumGetInt64(DirectFunctionCall1(numeric_int8, NumericGetDatum(num))); PG_RETURN_INT64(result); }
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers