Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-13 Thread Ranier Vilela
Em seg., 5 de set. de 2022 às 23:02, David Rowley 
escreveu:

> On Tue, 6 Sept 2022 at 13:52, Ranier Vilela  wrote:
> >
> > Em seg., 5 de set. de 2022 às 22:29, David Rowley 
> escreveu:
> >> It feels like it would be good if we had a way to detect a few of
> >> these issues much earlier than we are currently.  There's been a long
> >> series of commits fixing up this sort of thing.  If we had a tool to
> >> parse the .c files and look for things like a function call to
> >> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
> >> no va_arg arguments).
> >
> > StaticAssert could check va_arg no?
>
> I'm not sure exactly what you have in mind. If you think you have a
> way to make that work, it would be good to see a patch with it.
>
About this:

1. StaticAssertSmt can not help.
Although some posts on the web show that it is possible to calculate the
number of arguments,
I didn't get anything useful.
So I left this option.

2. Compiler supports
Best solution.
But currently does not allow the suggestion to use another function.

3.  Owner tool
Temporary solution.
Can help, until the compilers build support for it.

So, I made one very simple tool, can do the basics here.
Not meant to be some universal lint.
It only processes previously coded functions.

pg_check test1.c
line (1): should be appendPQExpBufferStr?
line (2): should be appendPQExpBufferChar?
line (4): should be appendPQExpBufferStr?
line (5): should be appendPQExpBufferStr?

I don't think it's anywhere near the quality to be considered Postgres, but
it could be a start.
If it helps, great, if not, fine.

regards,
Ranier Vilela
/*
 * pg_check.c
 *
 * A simple check syntax program for PostgreSQL
 * Originally written by Ranier Vilela and enhanced by many contributors.
 *
 * src/bin/pg_check/pg_check.c
 * Copyright (c) 2022, PostgreSQL Global Development Group
 * ALL RIGHTS RESERVED;
 *
 * Permission to use, copy, modify, and distribute this software and its
 * documentation for any purpose, without fee, and without a written agreement
 * is hereby granted, provided that the above copyright notice and this
 * paragraph and the following two paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
 * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
 * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
 * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 *
 * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIMS ANY WARRANTIES,
 * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
 * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
 * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
 * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 *
 */

#include 
#include 
#include 
#include 


static const char * progname = "pg_check";


struct S_CTX
{
int nline;
int nargs;
int nchars;
int nbrackets;
int nkeys;
int nperc;
int nsfmt;
bool quote;
bool finish;
};

typedef struct S_CTX s_ctx;

struct S_FUNC
{
char * function;
char * should_string;
char * should_char;
};

typedef struct S_FUNC s_func;


static const s_func pg_functions[] = {
{ "appendPQExpBuffer(", "should be appendPQExpBufferStr?", "should be 
appendPQExpBufferChar?" },
{ "appendStringInfo(", "should be appendStringInfoString?", "should be 
appendStringInfoChar?" },
NULL
};


void parser_function(s_ctx * ctx, const s_func * pg_function, const char * line)
{
const char * ptr;
size_t len;
size_t i;
char ch;

ctx->finish = false;
ptr = line;
len = strlen(line);
for(i = 0; i < len; i++)
{
ch = ptr[i];
if (!ctx->quote)
{
if (ch == '(')
ctx->nbrackets++;
else if (ch == ')')
ctx->nbrackets--;
else if (ch == '{')
ctx->nkeys++;
else if (ch == '}')
ctx->nkeys--;
else if (ch == ';')
{
ctx->finish = true;
break;
}
else if (ch == ',')
ctx->nargs++;
}
if (ctx->nargs > 0)
if (ch == '"')
ctx->quote = !ctx->quote;
if (ctx->quote)
{
if (ch != '"')
ctx->nchars++;
if (ch == '%')
ctx->nperc++;
 

Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-12 Thread Ranier Vilela
Em seg., 12 de set. de 2022 às 20:06, David Rowley 
escreveu:

> On Tue, 6 Sept 2022 at 13:29, David Rowley  wrote:
> > I'll hold off a few days before pushing the other patch.  Tom stamped
> > beta4 earlier, so I'll hold off until after the tag.
>
> I've now pushed this.
>
Thank you David.
But the correct thing is to put you also as author, after all, there's more
of your code there than mine.
Anyway, I appreciate the consideration.

regards,
Ranier Vilela


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-12 Thread David Rowley
On Tue, 6 Sept 2022 at 13:29, David Rowley  wrote:
> I'll hold off a few days before pushing the other patch.  Tom stamped
> beta4 earlier, so I'll hold off until after the tag.

I've now pushed this.

David




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-06 Thread David Rowley
On Tue, 6 Sept 2022 at 23:25, Ranier Vilela  wrote:
> But first, I would like to continue with this correction of using strings.
> In the following cases:
> fprintf -> fputs -> fputc
> printf -> puts -> putchar
>
> There are many occurrences, do you think it would be worth the effort?

I'm pretty unexcited about that. Quite a bit of churn and adding
another precedent that we currently have no good way to enforce or
maintain.

In addition to that, puts() is a fairly seldom used function, which
perhaps is because it's a bit quirky and appends a \n to the end of
the string. I'm just imagining all the bugs where we append an extra
newline. But, feel free to open another thread about it and see if you
can drum up any support.

David




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-06 Thread Ranier Vilela
Em seg., 5 de set. de 2022 às 23:02, David Rowley 
escreveu:

> On Tue, 6 Sept 2022 at 13:52, Ranier Vilela  wrote:
> >
> > Em seg., 5 de set. de 2022 às 22:29, David Rowley 
> escreveu:
> >> It feels like it would be good if we had a way to detect a few of
> >> these issues much earlier than we are currently.  There's been a long
> >> series of commits fixing up this sort of thing.  If we had a tool to
> >> parse the .c files and look for things like a function call to
> >> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
> >> no va_arg arguments).
> >
> > StaticAssert could check va_arg no?
>
> I'm not sure exactly what you have in mind. If you think you have a
> way to make that work, it would be good to see a patch with it.
>
I will study the matter.
But first, I would like to continue with this correction of using strings.
In the following cases:
fprintf -> fputs -> fputc
printf -> puts -> putchar

There are many occurrences, do you think it would be worth the effort?

regards,
Ranier Vilela


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread David Rowley
On Tue, 6 Sept 2022 at 13:52, Ranier Vilela  wrote:
>
> Em seg., 5 de set. de 2022 às 22:29, David Rowley  
> escreveu:
>> It feels like it would be good if we had a way to detect a few of
>> these issues much earlier than we are currently.  There's been a long
>> series of commits fixing up this sort of thing.  If we had a tool to
>> parse the .c files and look for things like a function call to
>> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
>> no va_arg arguments).
>
> StaticAssert could check va_arg no?

I'm not sure exactly what you have in mind. If you think you have a
way to make that work, it would be good to see a patch with it.

David




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread Ranier Vilela
Em seg., 5 de set. de 2022 às 22:29, David Rowley 
escreveu:

> On Tue, 6 Sept 2022 at 06:07, Ranier Vilela  wrote:
> > I did a search and found a few more places.
> > v1 attached.
>
> Thanks.  I've done a bit more looking and found a few more places that
> we can improve and I've pushed the result.
>
Thanks.


>
> It feels like it would be good if we had a way to detect a few of
> these issues much earlier than we are currently.  There's been a long
> series of commits fixing up this sort of thing.  If we had a tool to
> parse the .c files and look for things like a function call to
> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
> no va_arg arguments).
>
StaticAssert could check va_arg no?

regards,
Ranier Vilela


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread David Rowley
On Tue, 6 Sept 2022 at 06:07, Ranier Vilela  wrote:
> I did a search and found a few more places.
> v1 attached.

Thanks.  I've done a bit more looking and found a few more places that
we can improve and I've pushed the result.

It feels like it would be good if we had a way to detect a few of
these issues much earlier than we are currently.  There's been a long
series of commits fixing up this sort of thing.  If we had a tool to
parse the .c files and look for things like a function call to
appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
no va_arg arguments).

I'll hold off a few days before pushing the other patch.  Tom stamped
beta4 earlier, so I'll hold off until after the tag.

David




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread Ranier Vilela
Em seg., 5 de set. de 2022 às 10:40, David Rowley 
escreveu:

> On Mon, 5 Sept 2022 at 22:15, David Rowley  wrote:
> > On Sat, 3 Sept 2022 at 00:37, Ranier Vilela  wrote:
> > > 6. Avoid overhead when using unnecessary StringInfoData to convert
> Datum a to Text b.
> >
> > I've ripped out #4 and #6 for now. I think we should do #6 in master
> > only, probably as part of a wider cleanup of StringInfo misusages.
>
> I've attached a patch which does various other string operation cleanups.
>
> * This changes cstring_to_text() to use cstring_to_text_with_len when
> we're working with a StringInfo and can just access the .len field.
> * Uses appendStringInfoString instead of appendStringInfo when there
> is special formatting.
> * Uses pstrdup(str) instead of psprintf("%s", str).  In many cases
> this will save a bit of memory
> * Uses appendPQExpBufferChar instead of appendPQExpBufferStr() when
> appending a 1 byte string.
> * Uses appendStringInfoChar() instead of appendStringInfo() when no
> formatting and string is 1 byte.
> * Uses appendStringInfoChar() instead of appendStringInfoString() when
> string is 1 byte.
> * Uses appendPQExpBuffer(b , ...) instead of appendPQExpBufferStr(b, "%s"
> ...)
>
> I'm aware there are a few other places that we could use
> cstring_to_text_with_len() instead of cstring_to_text(). For example,
> using the return value of snprintf() to obtain the length. I just
> didn't do that because we need to take care to check the return value
> isn't -1.
>
> My grep patterns didn't account for these function calls spanning
> multiple lines, so I may have missed a few.
>
I did a search and found a few more places.
v1 attached.

regards,
Ranier Vilela


v1-string_fixes.patch
Description: Binary data


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread David Rowley
On Mon, 5 Sept 2022 at 22:15, David Rowley  wrote:
> On Sat, 3 Sept 2022 at 00:37, Ranier Vilela  wrote:
> > 6. Avoid overhead when using unnecessary StringInfoData to convert Datum a 
> > to Text b.
>
> I've ripped out #4 and #6 for now. I think we should do #6 in master
> only, probably as part of a wider cleanup of StringInfo misusages.

I've attached a patch which does various other string operation cleanups.

* This changes cstring_to_text() to use cstring_to_text_with_len when
we're working with a StringInfo and can just access the .len field.
* Uses appendStringInfoString instead of appendStringInfo when there
is special formatting.
* Uses pstrdup(str) instead of psprintf("%s", str).  In many cases
this will save a bit of memory
* Uses appendPQExpBufferChar instead of appendPQExpBufferStr() when
appending a 1 byte string.
* Uses appendStringInfoChar() instead of appendStringInfo() when no
formatting and string is 1 byte.
* Uses appendStringInfoChar() instead of appendStringInfoString() when
string is 1 byte.
* Uses appendPQExpBuffer(b , ...) instead of appendPQExpBufferStr(b, "%s" ...)

I'm aware there are a few other places that we could use
cstring_to_text_with_len() instead of cstring_to_text(). For example,
using the return value of snprintf() to obtain the length. I just
didn't do that because we need to take care to check the return value
isn't -1.

My grep patterns didn't account for these function calls spanning
multiple lines, so I may have missed a few.

David
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index fb72bb6cfe..6161df2790 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -1325,7 +1325,7 @@ hstore_to_json_loose(PG_FUNCTION_ARGS)
}
appendStringInfoChar(, '}');
 
-   PG_RETURN_TEXT_P(cstring_to_text(dst.data));
+   PG_RETURN_TEXT_P(cstring_to_text_with_len(dst.data, dst.len));
 }
 
 PG_FUNCTION_INFO_V1(hstore_to_json);
@@ -1370,7 +1370,7 @@ hstore_to_json(PG_FUNCTION_ARGS)
}
appendStringInfoChar(, '}');
 
-   PG_RETURN_TEXT_P(cstring_to_text(dst.data));
+   PG_RETURN_TEXT_P(cstring_to_text_with_len(dst.data, dst.len));
 }
 
 PG_FUNCTION_INFO_V1(hstore_to_jsonb);
diff --git a/src/backend/access/brin/brin_minmax_multi.c 
b/src/backend/access/brin/brin_minmax_multi.c
index a581659fe2..ea326a250b 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -3063,7 +3063,7 @@ brin_minmax_multi_summary_out(PG_FUNCTION_ARGS)
 
appendStringInfo(, "%s ... %s", a, b);
 
-   c = cstring_to_text(str.data);
+   c = cstring_to_text_with_len(str.data, str.len);
 
astate_values = accumArrayResult(astate_values,

 PointerGetDatum(c),
@@ -3095,15 +3095,9 @@ brin_minmax_multi_summary_out(PG_FUNCTION_ARGS)
{
Datum   a;
text   *b;
-   StringInfoData str;
-
-   initStringInfo();
 
a = FunctionCall1(, 
ranges_deserialized->values[idx++]);
-
-   appendStringInfoString(, DatumGetCString(a));
-
-   b = cstring_to_text(str.data);
+   b = cstring_to_text(DatumGetCString(a));
 
astate_values = accumArrayResult(astate_values,

 PointerGetDatum(b),
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 91ba49a14b..78abbb8196 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1062,7 +1062,7 @@ copy_table(Relation rel)
appendStringInfoString(, 
quote_identifier(lrel.attnames[i]));
}
 
-   appendStringInfo(, ") TO STDOUT");
+   appendStringInfoString(, ") TO STDOUT");
}
else
{
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 081dfa2450..a2bdde0459 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -96,7 +96,7 @@ anytime_typmodout(bool istz, int32 typmod)
if (typmod >= 0)
return psprintf("(%d)%s", (int) typmod, tz);
else
-   return psprintf("%s", tz);
+   return pstrdup(tz);
 }
 
 
diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 6594a9aac7..2b7b1b0c0f 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1453,7 +1453,7 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
appendStringInfoChar(, ')');
 
if (idxrec->indnullsnotdistinct)
-   appendStringInfo(, " NULLS NOT DISTINCT");
+   appendStringInfoString(, " NULLS NOT DISTINCT");
 
/*
  

Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread Ranier Vilela
Em seg., 5 de set. de 2022 às 07:15, David Rowley 
escreveu:

> On Sat, 3 Sept 2022 at 00:37, Ranier Vilela  wrote:
> >> But +1 to fix this and other issues even if they would never crash.
>
> Yeah, I don't think any of this coding would lead to a crash, but it's
> pretty weird coding and we should fix it.
>
> > 1. Once that ranges->nranges is invariant, avoid the loop if
> ranges->nranges <= 0.
> > This matches the current behavior.
> >
> > 2. Once that ranges->nsorted is invariant, avoid the loop if
> ranges->nsorted <= 0.
> > This matches the current behavior.
> >
> > 3. Remove the invariant cxt from ranges->nsorted loop.
> >
> > 4. Avoid possible overflows when using int to store length strings.
> >
> > 5. Avoid possible out-of-bounds when ranges->nranges == 0.
> >
> > 6. Avoid overhead when using unnecessary StringInfoData to convert Datum
> a to Text b.
>
> I've ripped out #4 and #6 for now. I think we should do #6 in master
> only, probably as part of a wider cleanup of StringInfo misusages.
>
> I also spent some time trying to ensure I understand this code
> correctly.  I was unable to work out what the nsorted field was from
> just the comments. I needed to look at the code to figure it out, so I
> think the comments for that field need to be improved.  A few of the
> others were not that clear either.  I hope I've improved them in the
> attached.
>
> I was also a bit confused at various other comments. e.g:
>
> /*
> * Is the value greater than the maxval? If yes, we'll recurse to the
> * right side of range array.
> */
>
The second comment in the v3 patch, must be:
/*
* Is the value greater than the maxval? If yes, we'll recurse
* to the right side of the range array.
*/

I think this is copy-and-paste thinko with the word "minval".


>
> I don't see any sort of recursion going on here. All I see are
> skipping of values that are out of bounds of the lower bound of the
> lowest range, and above the upper bound of the highest range.
>
I think this kind recursion, because the loop is restarted
with:
start = (midpoint + 1);
continue;


>
> I propose to backpatch the attached into v14 tomorrow morning (about
> 12 hours from now).
>
> The only other weird coding I found was in brin_range_deserialize:
>
> for (i = 0; (i < nvalues) && (!typbyval); i++)
>
> I imagine most compilers would optimize that so that the typbyval
> check is done before the first loop and not done on every loop, but I
> don't think that coding practice helps the human readers out much. I
> left that one alone, for now.
>
Yeah, I prefer write:
if (!typbyval)
{
for (i = 0; (i < nvalues); i++)
}

regards,
Ranier Vilela


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-05 Thread David Rowley
On Sat, 3 Sept 2022 at 00:37, Ranier Vilela  wrote:
>> But +1 to fix this and other issues even if they would never crash.

Yeah, I don't think any of this coding would lead to a crash, but it's
pretty weird coding and we should fix it.

> 1. Once that ranges->nranges is invariant, avoid the loop if ranges->nranges 
> <= 0.
> This matches the current behavior.
>
> 2. Once that ranges->nsorted is invariant, avoid the loop if ranges->nsorted 
> <= 0.
> This matches the current behavior.
>
> 3. Remove the invariant cxt from ranges->nsorted loop.
>
> 4. Avoid possible overflows when using int to store length strings.
>
> 5. Avoid possible out-of-bounds when ranges->nranges == 0.
>
> 6. Avoid overhead when using unnecessary StringInfoData to convert Datum a to 
> Text b.

I've ripped out #4 and #6 for now. I think we should do #6 in master
only, probably as part of a wider cleanup of StringInfo misusages.

I also spent some time trying to ensure I understand this code
correctly.  I was unable to work out what the nsorted field was from
just the comments. I needed to look at the code to figure it out, so I
think the comments for that field need to be improved.  A few of the
others were not that clear either.  I hope I've improved them in the
attached.

I was also a bit confused at various other comments. e.g:

/*
* Is the value greater than the maxval? If yes, we'll recurse to the
* right side of range array.
*/

I don't see any sort of recursion going on here. All I see are
skipping of values that are out of bounds of the lower bound of the
lowest range, and above the upper bound of the highest range.

I propose to backpatch the attached into v14 tomorrow morning (about
12 hours from now).

The only other weird coding I found was in brin_range_deserialize:

for (i = 0; (i < nvalues) && (!typbyval); i++)

I imagine most compilers would optimize that so that the typbyval
check is done before the first loop and not done on every loop, but I
don't think that coding practice helps the human readers out much. I
left that one alone, for now.

David
diff --git a/src/backend/access/brin/brin_minmax_multi.c 
b/src/backend/access/brin/brin_minmax_multi.c
index a581659fe2..99ef97206a 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -149,13 +149,16 @@ typedef struct MinMaxMultiOptions
  * single-point ranges. That is, we have (2*nranges + nvalues) boundary
  * values in the array.
  *
- * +-+---+
- * | ranges (sorted pairs of values) | sorted values (single points) |
- * +-+---+
+ * +-+--+
+ * | ranges (2 * nranges of) | single point values (nvalues of) |
+ * +-+--+
  *
  * This allows us to quickly add new values, and store outliers without
  * making the other ranges very wide.
  *
+ * 'nsorted' denotes how many of 'nvalues' in the values[] array are sorted.
+ * When nsorted == nvalues, all single point values are sorted.
+ *
  * We never store more than maxvalues values (as set by values_per_range
  * reloption). If needed we merge some of the ranges.
  *
@@ -173,10 +176,10 @@ typedef struct Ranges
FmgrInfo   *cmp;
 
/* (2*nranges + nvalues) <= maxvalues */
-   int nranges;/* number of ranges in 
the array (stored) */
-   int nsorted;/* number of sorted 
values (ranges + points) */
-   int nvalues;/* number of values in 
the data array (all) */
-   int maxvalues;  /* maximum number of 
values (reloption) */
+   int nranges;/* number of ranges in 
the values[] array */
+   int nsorted;/* number of nvalues 
which are sorted */
+   int nvalues;/* number of point 
values in values[] */
+   int maxvalues;  /* number of elements 
in the values[] array */
 
/*
 * We simply add the values into a large buffer, without any expensive
@@ -318,102 +321,99 @@ AssertCheckRanges(Ranges *ranges, FmgrInfo *cmpFn, Oid 
colloid)
 * Check that none of the values are not covered by ranges (both sorted
 * and unsorted)
 */
-   for (i = 0; i < ranges->nvalues; i++)
+   if (ranges->nranges > 0)
{
-   Datum   compar;
-   int start,
-   end;
-   Datum   minvalue,
-   maxvalue;
-
-   Datum   value = ranges->values[2 * ranges->nranges + i];
-
-   if (ranges->nranges == 0)
-   break;
-
-   

Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-02 Thread Ranier Vilela
Em sex., 2 de set. de 2022 às 09:01, Justin Pryzby 
escreveu:

> On Mon, Aug 29, 2022 at 10:06:55AM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela 
> wrote in
> > > At function has_matching_range, if variable ranges->nranges == 0,
> > > we exit quickly with a result equal to false.
> > >
> > > This means that nranges can be zero.
> > > It occurs then that it is possible then to occur an array out of
> bonds, in
> > > the initialization of the variable maxvalue.
> > > So if nranges is equal to zero, there is no need to initialize
> minvalue and
> > > maxvalue.
> > >
> > > The patch tries to fix it, avoiding possible errors by using maxvalue.
> >
> > However it seems that nranges will never be zero, still the fix looks
> > good to me since it is generally allowed to be zero. I don't find a
> > similar mistake related to Range.nranges.
>
> Actually, the nranges==0 branch is hit during regression tests:
>
> https://coverage.postgresql.org/src/backend/access/brin/brin_minmax_multi.c.gcov.html
>
> I'm not sure, but I *suspect* that compilers usually check
>   ranges->nranges==0
> before reading ranges->values[2 * ranges->nranges - 1];
>
> Especially since it's a static function.
>
> Even if they didn't (say, under -O0), values[-1] would probably point to
> a palloc header, which would be enough to "not crash" before returning
> one line later.
>
> But +1 to fix this and other issues even if they would never crash.
>
Thanks Justin.

Based on comments by David, I made a new patch.
To simplify I've included the 0001 in the 0002 patch.

Summary:
1. Once that ranges->nranges is invariant, avoid the loop if
ranges->nranges <= 0.
This matches the current behavior.

2. Once that ranges->nsorted is invariant, avoid the loop if
ranges->nsorted <= 0.
This matches the current behavior.

3. Remove the invariant cxt from ranges->nsorted loop.

4. Avoid possible overflows when using int to store length strings.

5. Avoid possible out-of-bounds when ranges->nranges == 0.

6. Avoid overhead when using unnecessary StringInfoData to convert Datum a
to Text b.

Attached is 0002.

regards,
Ranier Vilela


0002-avoid-small-issues-brin_minmax_multi.patch
Description: Binary data


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-02 Thread Justin Pryzby
On Mon, Aug 29, 2022 at 10:06:55AM +0900, Kyotaro Horiguchi wrote:
> At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela  wrote 
> in 
> > At function has_matching_range, if variable ranges->nranges == 0,
> > we exit quickly with a result equal to false.
> > 
> > This means that nranges can be zero.
> > It occurs then that it is possible then to occur an array out of bonds, in
> > the initialization of the variable maxvalue.
> > So if nranges is equal to zero, there is no need to initialize minvalue and
> > maxvalue.
> > 
> > The patch tries to fix it, avoiding possible errors by using maxvalue.
> 
> However it seems that nranges will never be zero, still the fix looks
> good to me since it is generally allowed to be zero. I don't find a
> similar mistake related to Range.nranges.

Actually, the nranges==0 branch is hit during regression tests:
https://coverage.postgresql.org/src/backend/access/brin/brin_minmax_multi.c.gcov.html

I'm not sure, but I *suspect* that compilers usually check
  ranges->nranges==0
before reading ranges->values[2 * ranges->nranges - 1];

Especially since it's a static function.

Even if they didn't (say, under -O0), values[-1] would probably point to
a palloc header, which would be enough to "not crash" before returning
one line later.

But +1 to fix this and other issues even if they would never crash.

-- 
Justin




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-01 Thread David Rowley
On Fri, 2 Sept 2022 at 12:55, Ranier Vilela  wrote:
> Why not?
> astate_values = accumArrayResult(astate_values,
>   PointerGetDatum(a),
>   false,
>  TEXTOID,
>  CurrentMemoryContext);
>
> Is it possible to avoid cstring_to_text conversion?

Note the element_type is being passed to accumArrayResult() as
TEXTOID, so we should be passing a text type, not a cstring type. The
conversion to text is required.

David




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-01 Thread Ranier Vilela
Em qui., 1 de set. de 2022 às 21:27, David Rowley 
escreveu:

> On Sat, 27 Aug 2022 at 01:29, Ranier Vilela  wrote:
> > At function has_matching_range, if variable ranges->nranges == 0,
> > we exit quickly with a result equal to false.
> >
> > This means that nranges can be zero.
> > It occurs then that it is possible then to occur an array out of bonds,
> in the initialization of the variable maxvalue.
> > So if nranges is equal to zero, there is no need to initialize minvalue
> and maxvalue.
>
> I think there's more strange coding in the same file that might need
> addressed, for example, AssertCheckRanges() does:
>
> if (ranges->nranges == 0)
> break;
>
> from within the first for() loop.  Why can't that check be outside of
> the loop. Nothing seems to make any changes to that field from within
> the loop.
>
> Also, in the final loop of the same function there's:
>
> if (ranges->nsorted == 0)
> break;
>
> It's not very obvious to me why we don't only run that loop when
> ranges->nsorted > 0.  Also, isn't it an array overrun to access:
>
> Datum value = ranges->values[2 * ranges->nranges + i];
>
> If there's only 1 range stored in the array, then there should be 2
> elements, but that code will try to access the 3rd element with
> ranges->values[2].
>
Yeah, it seems to me that both nranges and nsorted are invariant there,
so we can safely avoid loops.


>
> This is not so critical, but I'll note it down anyway.  The following
> looks a bit suboptimal in brin_minmax_multi_summary_out():
>
> StringInfoData str;
>
> initStringInfo();
>
> a = FunctionCall1(, ranges_deserialized->values[idx++]);
>
> appendStringInfoString(, DatumGetCString(a));
>
> b = cstring_to_text(str.data);
>
> Why do we need a StringInfoData there?  Why not just do:
>
> b = cstring_to_text(DatumGetCString(a)); ?
>
> That requires less memcpy()s and pallocs().
>
I agree that StringInfoData is not needed there.
Is it strange to convert char * to only store a temporary str.data.

Why not?
astate_values = accumArrayResult(astate_values,
  PointerGetDatum(a),
  false,
 TEXTOID,
 CurrentMemoryContext);

Is it possible to avoid cstring_to_text conversion?

regards,
Ranier Vilela


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-01 Thread David Rowley
On Sat, 27 Aug 2022 at 01:29, Ranier Vilela  wrote:
> At function has_matching_range, if variable ranges->nranges == 0,
> we exit quickly with a result equal to false.
>
> This means that nranges can be zero.
> It occurs then that it is possible then to occur an array out of bonds, in 
> the initialization of the variable maxvalue.
> So if nranges is equal to zero, there is no need to initialize minvalue and 
> maxvalue.

I think there's more strange coding in the same file that might need
addressed, for example, AssertCheckRanges() does:

if (ranges->nranges == 0)
break;

from within the first for() loop.  Why can't that check be outside of
the loop. Nothing seems to make any changes to that field from within
the loop.

Also, in the final loop of the same function there's:

if (ranges->nsorted == 0)
break;

It's not very obvious to me why we don't only run that loop when
ranges->nsorted > 0.  Also, isn't it an array overrun to access:

Datum value = ranges->values[2 * ranges->nranges + i];

If there's only 1 range stored in the array, then there should be 2
elements, but that code will try to access the 3rd element with
ranges->values[2].

This is not so critical, but I'll note it down anyway.  The following
looks a bit suboptimal in brin_minmax_multi_summary_out():

StringInfoData str;

initStringInfo();

a = FunctionCall1(, ranges_deserialized->values[idx++]);

appendStringInfoString(, DatumGetCString(a));

b = cstring_to_text(str.data);

Why do we need a StringInfoData there?  Why not just do:

b = cstring_to_text(DatumGetCString(a)); ?

That requires less memcpy()s and pallocs().

I've included Tomas just in case I've misunderstood the nrange stuff.

David




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-08-29 Thread Ranier Vilela
Em dom., 28 de ago. de 2022 às 22:06, Kyotaro Horiguchi <
horikyota@gmail.com> escreveu:

> At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela 
> wrote in
> > At function has_matching_range, if variable ranges->nranges == 0,
> > we exit quickly with a result equal to false.
> >
> > This means that nranges can be zero.
> > It occurs then that it is possible then to occur an array out of bonds,
> in
> > the initialization of the variable maxvalue.
> > So if nranges is equal to zero, there is no need to initialize minvalue
> and
> > maxvalue.
> >
> > The patch tries to fix it, avoiding possible errors by using maxvalue.
>
> However it seems that nranges will never be zero, still the fix looks
> good to me since it is generally allowed to be zero. I don't find a
> similar mistake related to Range.nranges.
>
Thanks  Kyotaro for taking a look at this.

regards,
Ranier Vilela


Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-08-28 Thread Kyotaro Horiguchi
At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela  wrote 
in 
> At function has_matching_range, if variable ranges->nranges == 0,
> we exit quickly with a result equal to false.
> 
> This means that nranges can be zero.
> It occurs then that it is possible then to occur an array out of bonds, in
> the initialization of the variable maxvalue.
> So if nranges is equal to zero, there is no need to initialize minvalue and
> maxvalue.
> 
> The patch tries to fix it, avoiding possible errors by using maxvalue.

However it seems that nranges will never be zero, still the fix looks
good to me since it is generally allowed to be zero. I don't find a
similar mistake related to Range.nranges.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center