from:"Ranier Vilela"

Re: Optimize WindowAgg's use of tuplestores

2024-07-11 Thread Ranier Vilela

Em qui., 11 de jul. de 2024 às 09:09, David Rowley 
escreveu:

> On Wed, 10 Jul 2024 at 02:42, Ashutosh Bapat
>  wrote:
> > Observations
> > 1. The numbers corresponding to 10 and 100 partitions are higher when
> > patched. That might be just noise. I don't see any reason why it would
> > impact negatively when there are a small number of partitions. The
> > lower partition cases also have a higher number of rows per partition,
> > so is the difference between MemoryContextDelete() vs
> > MemoryContextReset() making any difference here. May be worth
> > verifying those cases carefully. Otherwise upto 1000 partitions, it
> > doesn't show any differences.
>
> I think this might just be noise as a result of rearranging code. In
> terms of C code, I don't see any reason for it to be slower.  If you
> look at GenerationDelete() (as what is getting called from
> MemoryContextDelete()), it just calls GenerationReset(). So resetting
> is going to always be less work than deleting the context, especially
> given we don't need to create the context again when we reset it.
>
> I wrote the attached script to see if I can also see the slowdown and
> I do see the patched code come out slightly slower (within noise
> levels) in lower partition counts.
>
> To get my compiler to produce code in a more optimal order for the
> common case, I added unlikely() to the "if (winstate->all_first)"
> condition.  This is only evaluated on the first time after a rescan,
> so putting that code at the end of the function makes more sense.  The
> attached v2 patch has it this way.

Not that it's going to make any difference,
but since you're going to touch this code, why not?

Function *begin_partition*:
1. You can reduce the scope for variable *outerPlan*,
it is not needed anywhere else.
+ /*
+ * If this is the very first partition, we need to fetch the first input
+ * row to store in first_part_slot.
+ */
+ if (TupIsNull(winstate->first_part_slot))
+ {
+ PlanState *outerPlan = outerPlanState(winstate);
+ TupleTableSlot *outerslot = ExecProcNode(outerPlan);

2. There is once reference to variable numFuncs
You can reduce the scope.

+ /* reset mark and seek positions for each real window function */
+ for (int i = 0; i < winstate->numfuncs; i++)

Function *prepare_tuplestore. *
1. There is once reference to variable numFuncs
You can reduce the scope.
  /* create mark and read pointers for each real window function */
- for (i = 0; i < numfuncs; i++)
+ for (int i = 0; i < winstate->numfuncs; i++)

2. You can securely initialize the default value for variable
*winstate->grouptail_ptr*
in *else* part.

  if ((frameOptions & (FRAMEOPTION_EXCLUDE_GROUP |
  FRAMEOPTION_EXCLUDE_TIES)) &&
  node->ordNumCols != 0)
  winstate->grouptail_ptr =
  tuplestore_alloc_read_pointer(winstate->buffer, 0);
  }
+ else
+ winstate->grouptail_ptr = -1;

best regards,
Ranier Vilela

Re: Additional minor pg_dump cleanups

2024-07-04 Thread Ranier Vilela

Em qui., 4 de jul. de 2024 às 05:18, Daniel Gustafsson 
escreveu:

> > On 3 Jul 2024, at 13:29, Ranier Vilela  wrote:
>
> > With the function *getPublications* I think it would be good to free up
> the allocated memory?
> >
> >  }
> > + pg_free(pubinfo);
> > +cleanup:
> >   PQclear(res);
>
> Since the pubinfo is recorded in the DumpableObject and is responsible for
> keeping track of which publications to dump, it would be quite incorrect to
> free it here.
>
> > With the function *getExtensions* I think it would be good to return
> NULL in case ntups = 0?
> > Otherwise we may end up with an uninitialized variable.
> >
> > - ExtensionInfo *extinfo;
> > + ExtensionInfo *extinfo = NULL;
>
> I guess that won't hurt, though any code inspecting extinfo when
> numExtensions
> is returned as zero is flat-out wrong.  It may however silence a static
> analyzer so there is that.
>
> > Funny, the function *getExtensionMembership* does not use the parameter
> ExtensionInfo extinfo.
> > getExtensions does not have another caller, Is it really necessary?
>
> Yes, see processExtensionTables().
>
I saw, thanks.

LGTM.

best regards,
Ranier Vilela

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-03 Thread Ranier Vilela

Em qua., 3 de jul. de 2024 às 08:18, Dean Rasheed 
escreveu:

> On Tue, 2 Jul 2024 at 21:10, Joel Jacobson  wrote:
> >
> > I found the bug in the case 3 code,
> > and it turns out the same type of bug also exists in the case 2 code:
> >
> > case 2:
> > newdig = (int) var1digits[1] *
> var2digits[res_ndigits - 4];
> >
> > The problem here is that res_ndigits could become less than 4,
>
> Yes. It can't be less than 3 though (per an earlier test), so the case
> 2 code was correct.
>
> I've been hacking on this a bit and trying to tidy it up. Firstly, I
> moved it to a separate function, because it was starting to look messy
> having so much extra code in mul_var(). Then I added a bunch more
> comments to explain what's going on, and the limits of the various
> variables. Note that most of the boundary checks are actually
> unnecessary -- in particular all the ones in or after the main loop,
> provided you pull out the first 2 result digits from the main loop in
> the 3-digit case. That does seem to work very well, but...
>
> I wasn't entirely happy with how messy that code is getting, so I
> tried a different approach. Similar to div_var_int(), I tried writing
> a mul_var_int() function instead. This can be used for 1 and 2 digit
> factors, and we could add a similar mul_var_int64() function on
> platforms with 128-bit integers. The code looks quite a lot neater, so
> it's probably less likely to contain bugs (though I have just written
> it in a hurry,so it might still have bugs). In testing, it seemed to
> give a decent speedup, but perhaps a little less than before. But
> that's to be balanced against having more maintainable code, and also
> a function that might be useful elsewhere in numeric.c.
>
> Anyway, here are both patches for comparison. I'll stop hacking for a
> while and let you see what you make of these.
>
I liked v5-add-mul_var_int.patch better.

I think that *var_digits can be const too.
+ const NumericDigit *var_digits = var->digits;

Typo In the comments:
- by procssing
+ by processing

best regards,
Ranier Vilela

Re: Additional minor pg_dump cleanups

2024-07-03 Thread Ranier Vilela

Em qua., 3 de jul. de 2024 às 04:37, Daniel Gustafsson 
escreveu:

> Re-reading Nathans recent 8213df9effaf I noticed a few more small things
> which
> can be cleaned up.  In two of the get functions we lack a
> fast-path for
> when no tuples are found which leads to pg_malloc(0) calls.  Another thing
> is
> that we in one place reset the PQExpBuffer immediately after creating it
> which
> isn't required.
>
0001 Looks good to me.

0002:
With the function *getPublications* I think it would be good to free up the
allocated memory?

 }
+ pg_free(pubinfo);
+cleanup:
  PQclear(res);

With the function *getExtensions* I think it would be good to return NULL
in case ntups = 0?
Otherwise we may end up with an uninitialized variable.

- ExtensionInfo *extinfo;
+ ExtensionInfo *extinfo = NULL;

Funny, the function *getExtensionMembership* does not use the parameter
ExtensionInfo extinfo.
getExtensions does not have another caller, Is it really necessary?

best regards,
Ranier Vilela

Assorted style changes with a tiny improvement

2024-07-02 Thread Ranier Vilela

Hi.

This is a series of patches to change styles, in assorted sources.
IMO, this improves a tiny bit and is worth trying.

1. Avoid dereference iss_SortSupport if it has nulls.
2. Avoid dereference plan_node_id if no dsm area.
3. Avoid dereference spill partitions if zero ntuples.
4. Avoid possible useless palloc call with zero size.
5. Avoid redundant variable initialization in foreign.
6. Check the cheap test first in ExecMain.
7. Check the cheap test first in pathkeys.
8. Delay possible expensive bms_is_empty call in sub_select.
9. Reduce some scope in execPartition.
10. Reduce some scope for TupleDescAttr array dereference.
11. Remove useless duplicate test in ruleutils.
This is already checked at line 4566.

12. Remove useless duplicate test in string_utils.
This is already checked at line 982.

best regards,
Ranier Vilela


avoid-dereference-iss_SortSupport-array-if-has-nulls.patch
Description: Binary data


avoid-dereference-plan_node_id-if-no-dsm-area.patch
Description: Binary data


avoid-dereference-spill-partitions-if-zero-ntuples.patch
Description: Binary data


avoid-possible-useless-palloc-call-with-zero-size-execGrouping.patch
Description: Binary data


avoid-redundant-variable-initialization-foreign.patch
Description: Binary data


check-cheap-test-first-execMain.patch
Description: Binary data


check-cheap-test-first-pathkeys.patch
Description: Binary data


delay-possible-expensive-bms_is_empty_call.patch
Description: Binary data


reduce-some-scope-execPartition.patch
Description: Binary data


reduce-some-scope-for-tupleDescAttr.patch
Description: Binary data


remove-useless-duplicate-test-ruleutils.patch
Description: Binary data


remove-useless-duplicate-test-string_utils.patch
Description: Binary data

Re: plenty code is confused about function level static

2024-07-02 Thread Ranier Vilela

Em qui., 18 de abr. de 2024 às 14:43, Ranier Vilela 
escreveu:

>
>
> Em qui., 18 de abr. de 2024 às 14:16, Andres Freund 
> escreveu:
>
>> Hi,
>>
>> On 2024-04-18 09:07:43 -0300, Ranier Vilela wrote:
>> >  On 18/04/2024 00:39, Andres Freund wrote:
>> > >There are lots of places that could benefit from adding 'static
>> > >const'.
>> >
>> > I found a few more places.
>>
>> Good catches.
>>
>>
>> > Patch 004
>> >
>> > The opposite would also help, adding static.
>> > In these places, I believe it is safe to add static,
>> > allowing the compiler to transform into read-only, definitively.
>>
>> I don't think this would even compile?
>
> Compile, at least with msvc 2022.
> Pass ninja test.
>
>
> E.g. LockTagTypeNames, pg_wchar_table
>> are declared in a header and used across translation units.
>>
> Sad.
> There should be a way to export a read-only (static const) variable.
> Better remove these.
>
> v1-0005 attached.
>
Now with v18 open, any plans to forward this?

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-02 Thread Ranier Vilela

Em ter., 2 de jul. de 2024 às 06:44, Daniel Gustafsson 
escreveu:

> > On 2 Jul 2024, at 02:33, Michael Paquier  wrote:
> >
> > On Mon, Jul 01, 2024 at 09:19:59PM +0200, Daniel Gustafsson wrote:
> >>> The bit I don't understand about this discussion is what will happen
> >>> with users that currently have exactly 1024 chars in backup names
> today.
> >>> With this change, we'll be truncating their names to 1023 chars
> instead.
> >>> Why would they feel that such change is welcome?
> >>
> >> That's precisely what I was getting at.  Maybe it makes sense to
> change, maybe
> >> not, but that's not for this patch to decide as that's a different
> discussion
> >> from using safe string copying API's.
> >
> > Yep.  Agreed to keep backward-compatibility here, even if I suspect
> > there is close to nobody relying on backup label names of this size.
>
> I suspect so too, and it might be a good project for someone to go over
> such
> buffers to see if there is reason grow or contract.  Either way, pushed the
> strlcpy part.
>
Thanks Daniel.

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 16:20, Daniel Gustafsson 
escreveu:

> > On 1 Jul 2024, at 21:15, Alvaro Herrera  wrote:
> > On 2024-Jul-01, Ranier Vilela wrote:
>
> >>> -   charname[MAXPGPATH + 1];
> >>> +   charname[MAXPGPATH];/* backup label name */
> >>>
> >>> With the introduced use of strlcpy, why do we need to change this
> field?
> >>>
> >> The part about being the only reference in the entire code that uses
> >> MAXPGPATH + 1.
> >
> > The bit I don't understand about this discussion is what will happen
> > with users that currently have exactly 1024 chars in backup names today.
> > With this change, we'll be truncating their names to 1023 chars instead.
> > Why would they feel that such change is welcome?
>
> That's precisely what I was getting at.  Maybe it makes sense to change,
> maybe
> not, but that's not for this patch to decide as that's a different
> discussion
> from using safe string copying API's.
>
Ok, this is not material for the proposal and can be considered unrelated.

We only have to replace it with strlcpy.

v7 attached.

best regards,
Ranier Vilela


v7-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 16:15, Alvaro Herrera 
escreveu:

> On 2024-Jul-01, Ranier Vilela wrote:
>
> > > -   charname[MAXPGPATH + 1];
> > > +   charname[MAXPGPATH];/* backup label name */
> > >
> > > With the introduced use of strlcpy, why do we need to change this
> field?
> > >
> > The part about being the only reference in the entire code that uses
> > MAXPGPATH + 1.
>
> The bit I don't understand about this discussion is what will happen
> with users that currently have exactly 1024 chars in backup names today.
> With this change, we'll be truncating their names to 1023 chars instead.
> Why would they feel that such change is welcome?
>
Yes of course, I understand.
This can be a problem for users.

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 14:35, Ranier Vilela 
escreveu:

> Em seg., 1 de jul. de 2024 às 06:20, Daniel Gustafsson 
> escreveu:
>
>> > On 27 Jun 2024, at 13:50, Ranier Vilela  wrote:
>>
>> > Now with file patch really attached.
>>
>> -   if (strlen(backupidstr) > MAXPGPATH)
>> +   if (strlcpy(state->name, backupidstr, sizeof(state->name)) >=
>> sizeof(state->name))
>> ereport(ERROR,
>>
>> Stylistic nit perhaps, I would keep the strlen check here and just
>> replace the
>> memcpy with strlcpy.  Using strlen in the error message check makes the
>> code
>> more readable.
>>
> This is not performance-critical code, so I see no problem using strlen,
> for the sake of readability.
>
>
>>
>> -   charname[MAXPGPATH + 1];
>> +   charname[MAXPGPATH];/* backup label name */
>>
>> With the introduced use of strlcpy, why do we need to change this field?
>>
> The part about being the only reference in the entire code that uses
> MAXPGPATH + 1.
> MAXPGPATH is defined as 1024, so MAXPGPATH +1 is 1025.
> I think this hurts the calculation of the array index,
> preventing power two optimization.
>
> Another argument is that all other paths have a 1023 size limit,
> I don't see why the backup label would have to be different.
>
> New version patch attached.
>
Sorry for v5, I forgot to update the patch and it was an error.

best regards,
Ranier Vilela


v6-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 06:20, Daniel Gustafsson 
escreveu:

> > On 27 Jun 2024, at 13:50, Ranier Vilela  wrote:
>
> > Now with file patch really attached.
>
> -   if (strlen(backupidstr) > MAXPGPATH)
> +   if (strlcpy(state->name, backupidstr, sizeof(state->name)) >=
> sizeof(state->name))
> ereport(ERROR,
>
> Stylistic nit perhaps, I would keep the strlen check here and just replace
> the
> memcpy with strlcpy.  Using strlen in the error message check makes the
> code
> more readable.
>
This is not performance-critical code, so I see no problem using strlen,
for the sake of readability.


>
> -   charname[MAXPGPATH + 1];
> +   charname[MAXPGPATH];/* backup label name */
>
> With the introduced use of strlcpy, why do we need to change this field?
>
The part about being the only reference in the entire code that uses
MAXPGPATH + 1.
MAXPGPATH is defined as 1024, so MAXPGPATH +1 is 1025.
I think this hurts the calculation of the array index,
preventing power two optimization.

Another argument is that all other paths have a 1023 size limit,
I don't see why the backup label would have to be different.

New version patch attached.

best regards,
Ranier Vilela
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d36272ab4f..bfb500aa54 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8742,9 +8742,9 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("backup label too long (max %d bytes)",
-		MAXPGPATH)));
+		MAXPGPATH - 1)));
 
-	memcpy(state->name, backupidstr, strlen(backupidstr));
+	(void) strlcpy(state->name, backupidstr, MAXPGPATH))
 
 	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index c30d4a5991..a52345850e 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -21,8 +21,7 @@
 typedef struct BackupState
 {
 	/* Fields saved at backup start */
-	/* Backup label name one extra byte for null-termination */
-	char		name[MAXPGPATH + 1];
+	char		name[MAXPGPATH];/* backup label name */
 	XLogRecPtr	startpoint;		/* backup start WAL location */
 	TimeLineID	starttli;		/* backup start TLI */
 	XLogRecPtr	checkpointloc;	/* last checkpoint location */

Re: JIT causes core dump during error recovery

2024-06-27 Thread Ranier Vilela

Em qui., 27 de jun. de 2024 às 13:18, Tom Lane  escreveu:

> I wrote:
> > So delaying removal of the jit-created code segment until transaction
> > cleanup wouldn't be enough to prevent this crash, if I'm reading
> > things right.  The extra-pstrdup solution may be the only viable one.
>
> > I could use confirmation from someone who knows the JIT code about
> > when jit-created code is unloaded.  It also remains very unclear
> > why there is no crash if we don't force both jit_optimize_above_cost
> > and jit_inline_above_cost to small values.
>
> I found where the unload happens: ResOwnerReleaseJitContext, which
> is executed during the resource owner BEFORE_LOCKS phase.  (Which
> seems like a pretty dubious choice from here; why can't we leave
> it till the less time-critical phase after we've let go of locks?)
> But anyway, we definitely do drop this stuff during xact cleanup.
>
> Also, it seems that the reason that both jit_optimize_above_cost
> and jit_inline_above_cost must be small is that otherwise int4div
> is simply called from the JIT-generated code, not inlined into it.
> This gives me very considerable fear about how well that behavior
> has been tested: if throwing an error from inlined code doesn't
> work, and we hadn't noticed that, how much can it really have been
> exercised?  I have also got an itchy feeling that we have code that
> will be broken by this behavior of "if it happens to get inlined
> then string constants aren't so constant".
>
> In any case, I found that adding some copying logic to CopyErrorData()
> is enough to solve this problem, since the SPI infrastructure applies
> that before executing xact cleanup.

In this case, I think that these fields, in struct definition struct
ErrorData (src/include/utils/elog.h)
should be changed too?
from const char * to char*

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-27 Thread Ranier Vilela

Em qui., 27 de jun. de 2024 às 08:48, Ranier Vilela 
escreveu:

> Em qui., 27 de jun. de 2024 às 01:01, Yugo NAGATA 
> escreveu:
>
>> On Mon, 24 Jun 2024 08:25:36 -0300
>> Ranier Vilela  wrote:
>>
>> > Em dom., 23 de jun. de 2024 às 23:56, Richard Guo <
>> guofengli...@gmail.com>
>> > escreveu:
>> >
>> > > On Mon, Jun 24, 2024 at 7:51 AM Ranier Vilela 
>> wrote:
>> > > > In src/include/access/xlogbackup.h, the field *name*
>> > > > has one byte extra to store null-termination.
>> > > >
>> > > > But, in the function *do_pg_backup_start*,
>> > > > I think that is a mistake in the line (8736):
>> > > >
>> > > > memcpy(state->name, backupidstr, strlen(backupidstr));
>> > > >
>> > > > memcpy with strlen does not copy the whole string.
>> > > > strlen returns the exact length of the string, without
>> > > > the null-termination.
>> > >
>> > > I noticed that the two callers of do_pg_backup_start both allocate
>> > > BackupState with palloc0.  Can we rely on this to ensure that the
>> > > BackupState.name is initialized with null-termination?
>> > >
>> > I do not think so.
>> > It seems to me the best solution is to use Michael's suggestion,
>> strlcpy +
>> > sizeof.
>> >
>> > Currently we have this:
>> > memcpy(state->name, "longlongpathexample1",
>> > strlen("longlongpathexample1"));
>> > printf("%s\n", state->name);
>> > longlongpathexample1
>> >
>> > Next random call:
>> > memcpy(state->name, "longpathexample2", strlen("longpathexample2"));
>> > printf("%s\n", state->name);
>> > longpathexample2ple1
>>
>> In the current uses, BackupState is freed (by pfree or
>> MemoryContextDelete)
>> after each use of BackupState, so the memory space is not reused as your
>> scenario above, and there would not harms even if the null-termination is
>> omitted.
>>
>> However, I wonder it is better to use strlcpy without assuming such the
>> good
>> manner of callers.
>>
> Thanks for your inputs.
>
> strlcpy is used across all the sources, so this style is better and safe.
>
> Here v4, attached, with MAXPGPATH -1, according to your suggestion.
>
Now with file patch really attached.

best regards,
Ranier Vilela


v4-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-27 Thread Ranier Vilela

Em qui., 27 de jun. de 2024 às 01:01, Yugo NAGATA 
escreveu:

> On Mon, 24 Jun 2024 08:25:36 -0300
> Ranier Vilela  wrote:
>
> > Em dom., 23 de jun. de 2024 às 23:56, Richard Guo <
> guofengli...@gmail.com>
> > escreveu:
> >
> > > On Mon, Jun 24, 2024 at 7:51 AM Ranier Vilela 
> wrote:
> > > > In src/include/access/xlogbackup.h, the field *name*
> > > > has one byte extra to store null-termination.
> > > >
> > > > But, in the function *do_pg_backup_start*,
> > > > I think that is a mistake in the line (8736):
> > > >
> > > > memcpy(state->name, backupidstr, strlen(backupidstr));
> > > >
> > > > memcpy with strlen does not copy the whole string.
> > > > strlen returns the exact length of the string, without
> > > > the null-termination.
> > >
> > > I noticed that the two callers of do_pg_backup_start both allocate
> > > BackupState with palloc0.  Can we rely on this to ensure that the
> > > BackupState.name is initialized with null-termination?
> > >
> > I do not think so.
> > It seems to me the best solution is to use Michael's suggestion, strlcpy
> +
> > sizeof.
> >
> > Currently we have this:
> > memcpy(state->name, "longlongpathexample1",
> > strlen("longlongpathexample1"));
> > printf("%s\n", state->name);
> > longlongpathexample1
> >
> > Next random call:
> > memcpy(state->name, "longpathexample2", strlen("longpathexample2"));
> > printf("%s\n", state->name);
> > longpathexample2ple1
>
> In the current uses, BackupState is freed (by pfree or MemoryContextDelete)
> after each use of BackupState, so the memory space is not reused as your
> scenario above, and there would not harms even if the null-termination is
> omitted.
>
> However, I wonder it is better to use strlcpy without assuming such the
> good
> manner of callers.
>
Thanks for your inputs.

strlcpy is used across all the sources, so this style is better and safe.

Here v4, attached, with MAXPGPATH -1, according to your suggestion.

>From the linux man page:
https://linux.die.net/man/3/strlcpy

" The *strlcpy*() function copies up to *size* - 1 characters from the
NUL-terminated string *src* to *dst*, NUL-terminating the result. "

best regards,
Ranier Vilela

Re: JIT causes core dump during error recovery

2024-06-26 Thread Ranier Vilela

Em qua., 26 de jun. de 2024 às 15:09, Tom Lane  escreveu:

> I initially ran into this while trying to reproduce the recent
> reports of trouble with LLVM 14 on ARM.  However, it also reproduces
> with LLVM 17 on x86_64, and I see no reason to think it's at all
> arch-specific.  I also reproduced it in back branches (only tried
> v14, but it's definitely not new in HEAD).
>
> To reproduce:
>
> 1. Build with --with-llvm
>
> 2. Create a config file containing
>
> $ cat $HOME/tmp/temp_config
> # enable jit at max
> jit_above_cost = 1
> jit_inline_above_cost = 1
> jit_optimize_above_cost = 1
>
> and do
> export TEMP_CONFIG=$HOME/tmp/temp_config
>
> 3. cd to .../src/pl/plpgsql/src/, and do "make check".
>
> It gets a SIGSEGV in plpgsql_transaction.sql's
> cursor_fail_during_commit test.  The stack trace looks like
>
> (gdb) bt
> #0  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
> #1  0x00735c58 in pq_sendstring (buf=0x7ffd80f8eeb0,
> str=0x7f77cffdf000  0x7f77cffdf000>)
> at pqformat.c:197
> #2  0x009ca09c in err_sendstring (buf=0x7ffd80f8eeb0,
> str=0x7f77cffdf000  0x7f77cffdf000>)
> at elog.c:3449
> #3  0x009ca4ba in send_message_to_frontend (edata=0xf786a0
> )
> at elog.c:3568
> #4  0x009c73a3 in EmitErrorReport () at elog.c:1715
> #5  0x008987e7 in PostgresMain (dbname=,
> username=0x29fdb00 "postgres") at postgres.c:4378
> #6  0x00893c5d in BackendMain (startup_data=,
> startup_data_len=) at backend_startup.c:105
>
> The errordata structure it's trying to print out contains
>
> (gdb) p *edata
> $1 = {elevel = 21, output_to_server = true, output_to_client = true,
>   hide_stmt = false, hide_ctx = false,
>   filename = 0x7f77cffdf000  0x7f77cffdf000>, lineno = 843,
>   funcname = 0x7f77cffdf033  0x7f77cffdf033>, domain = 0xbd3baa "postgres-17",
>   context_domain = 0x7f77c3343320 "plpgsql-17", sqlerrcode = 33816706,
>   message = 0x29fdc20 "division by zero", detail = 0x0, detail_log = 0x0,
>   hint = 0x0,
>   context = 0x29fdc50 "PL/pgSQL function cursor_fail_during_commit() line
> 6 at COMMIT", backtrace = 0x0,
>   message_id = 0x7f77cffdf022  0x7f77cffdf022>, schema_name = 0x0, table_name = 0x0, column_name = 0x0,
>   datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos =
> 0,
>   internalquery = 0x0, saved_errno = 2, assoc_context = 0x29fdb20}
>
> lineno = 843 matches the expected error location in int4_div().
>
Did you mean *int4div*, right?
Since there is no reference to int4_div in *.c or *.h

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-24 Thread Ranier Vilela

Em seg., 24 de jun. de 2024 às 00:27, Yugo NAGATA 
escreveu:

> On Sun, 23 Jun 2024 22:34:03 -0300
> Ranier Vilela  wrote:
>
> > Em dom., 23 de jun. de 2024 às 22:14, Ranier Vilela  >
> > escreveu:
> >
> > > Em dom., 23 de jun. de 2024 às 22:05, Ranier Vilela <
> ranier...@gmail.com>
> > > escreveu:
> > >
> > >> Em dom., 23 de jun. de 2024 às 21:54, Michael Paquier <
> > >> mich...@paquier.xyz> escreveu:
> > >>
> > >>> On Sun, Jun 23, 2024 at 09:34:45PM -0300, Ranier Vilela wrote:
> > >>> > It's not critical code, so I think it's ok to use strlen, even
> because
> > >>> the
> > >>> > result of strlen will already be available using modern compilers.
> > >>> >
> > >>> > So, I think it's ok to use memcpy with strlen + 1.
> > >>>
> > >>> It seems to me that there is a pretty good argument to just use
> > >>> strlcpy() for the same reason as the one you cite: this is not a
> > >>> performance-critical code, and that's just safer.
> > >>>
> > >> Yeah, I'm fine with strlcpy. I'm not against it.
> > >>
> > > Perhaps, like the v2?
> > >
> > > Either v1 or v2, to me, looks good.
> > >
> > Thinking about, does not make sense the field size MAXPGPATH + 1.
> > all other similar fields are just MAXPGPATH.
> >
> > If we copy MAXPGPATH + 1, it will also be wrong.
> > So it is necessary to adjust logbackup.h as well.
>
> I am not sure whether we need to change the size of the field,
> but if change it, I wonder it is better to modify the following
> message from MAXPGPATH to MAXPGPATH -1.
>
>      errmsg("backup label too long (max %d
> bytes)",
> MAXPGPATH)));
>
Or perhaps, is it better to show the too long label?

errmsg("backup label too long (%s)",
backupidstr)));

best regards,
Ranier Vilela

>
> >
> > So, I think that v3 is ok to fix.
> >
> > best regards,
> > Ranier Vilela
> >
> > >
> > > best regards,
> > > Ranier Vilela
> > >
> > >>
>
>
> --
> Yugo NAGATA 
>

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-24 Thread Ranier Vilela

Em dom., 23 de jun. de 2024 às 23:56, Richard Guo 
escreveu:

> On Mon, Jun 24, 2024 at 7:51 AM Ranier Vilela  wrote:
> > In src/include/access/xlogbackup.h, the field *name*
> > has one byte extra to store null-termination.
> >
> > But, in the function *do_pg_backup_start*,
> > I think that is a mistake in the line (8736):
> >
> > memcpy(state->name, backupidstr, strlen(backupidstr));
> >
> > memcpy with strlen does not copy the whole string.
> > strlen returns the exact length of the string, without
> > the null-termination.
>
> I noticed that the two callers of do_pg_backup_start both allocate
> BackupState with palloc0.  Can we rely on this to ensure that the
> BackupState.name is initialized with null-termination?
>
I do not think so.
It seems to me the best solution is to use Michael's suggestion, strlcpy +
sizeof.

Currently we have this:
memcpy(state->name, "longlongpathexample1",
strlen("longlongpathexample1"));
printf("%s\n", state->name);
longlongpathexample1

Next random call:
memcpy(state->name, "longpathexample2", strlen("longpathexample2"));
printf("%s\n", state->name);
longpathexample2ple1

It's not a good idea to use memcpy with strlen.

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-23 Thread Ranier Vilela

Em dom., 23 de jun. de 2024 às 22:14, Ranier Vilela 
escreveu:

> Em dom., 23 de jun. de 2024 às 22:05, Ranier Vilela 
> escreveu:
>
>> Em dom., 23 de jun. de 2024 às 21:54, Michael Paquier <
>> mich...@paquier.xyz> escreveu:
>>
>>> On Sun, Jun 23, 2024 at 09:34:45PM -0300, Ranier Vilela wrote:
>>> > It's not critical code, so I think it's ok to use strlen, even because
>>> the
>>> > result of strlen will already be available using modern compilers.
>>> >
>>> > So, I think it's ok to use memcpy with strlen + 1.
>>>
>>> It seems to me that there is a pretty good argument to just use
>>> strlcpy() for the same reason as the one you cite: this is not a
>>> performance-critical code, and that's just safer.
>>>
>> Yeah, I'm fine with strlcpy. I'm not against it.
>>
> Perhaps, like the v2?
>
> Either v1 or v2, to me, looks good.
>
Thinking about, does not make sense the field size MAXPGPATH + 1.
all other similar fields are just MAXPGPATH.

If we copy MAXPGPATH + 1, it will also be wrong.
So it is necessary to adjust logbackup.h as well.

So, I think that v3 is ok to fix.

best regards,
Ranier Vilela

>
> best regards,
> Ranier Vilela
>
>>


v3-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-23 Thread Ranier Vilela

Em dom., 23 de jun. de 2024 às 22:05, Ranier Vilela 
escreveu:

> Em dom., 23 de jun. de 2024 às 21:54, Michael Paquier 
> escreveu:
>
>> On Sun, Jun 23, 2024 at 09:34:45PM -0300, Ranier Vilela wrote:
>> > It's not critical code, so I think it's ok to use strlen, even because
>> the
>> > result of strlen will already be available using modern compilers.
>> >
>> > So, I think it's ok to use memcpy with strlen + 1.
>>
>> It seems to me that there is a pretty good argument to just use
>> strlcpy() for the same reason as the one you cite: this is not a
>> performance-critical code, and that's just safer.
>>
> Yeah, I'm fine with strlcpy. I'm not against it.
>
Perhaps, like the v2?

Either v1 or v2, to me, looks good.

best regards,
Ranier Vilela

>


v2-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-23 Thread Ranier Vilela

Em dom., 23 de jun. de 2024 às 21:54, Michael Paquier 
escreveu:

> On Sun, Jun 23, 2024 at 09:34:45PM -0300, Ranier Vilela wrote:
> > It's not critical code, so I think it's ok to use strlen, even because
> the
> > result of strlen will already be available using modern compilers.
> >
> > So, I think it's ok to use memcpy with strlen + 1.
>
> It seems to me that there is a pretty good argument to just use
> strlcpy() for the same reason as the one you cite: this is not a
> performance-critical code, and that's just safer.
>
Yeah, I'm fine with strlcpy. I'm not against it.

New version, attached.

best regards,
Ranier Vilela


v1-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-23 Thread Ranier Vilela

Em dom., 23 de jun. de 2024 às 21:24, Michael Paquier 
escreveu:

> On Sun, Jun 23, 2024 at 09:08:47PM -0300, Fabrízio de Royes Mello wrote:
> > Doesn’t “sizeof” solve the problem? It take in account the
> null-termination
> > character.
>
> The size of BackupState->name is fixed with MAXPGPATH + 1, so it would
> be a better practice to use strlcpy() with sizeof(name) anyway?
>
It's not critical code, so I think it's ok to use strlen, even because the
result of strlen will already be available using modern compilers.

So, I think it's ok to use memcpy with strlen + 1.

best regards,
Ranier Vilela

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-23 Thread Ranier Vilela

Em dom., 23 de jun. de 2024 às 21:08, Fabrízio de Royes Mello <
fabriziome...@gmail.com> escreveu:

>
> On Sun, 23 Jun 2024 at 20:51 Ranier Vilela  wrote:
>
>> Hi.
>>
>> In src/include/access/xlogbackup.h, the field *name*
>> has one byte extra to store null-termination.
>>
>> But, in the function *do_pg_backup_start*,
>> I think that is a mistake in the line (8736):
>>
>> memcpy(state->name, backupidstr, strlen(backupidstr));
>>
>> memcpy with strlen does not copy the whole string.
>> strlen returns the exact length of the string, without
>> the null-termination.
>>
>> So, I think this can result in errors,
>> like in the function *build_backup_content*
>> (src/backend/access/transam/xlogbackup.c)
>> Where *appendStringInfo* expects a string with null-termination.
>>
>> appendStringInfo(result, "LABEL: %s\n", state->name);
>>
>> To fix, copy strlen size plus one byte, to include the null-termination.
>>
>
>>
> Doesn’t “sizeof” solve the problem? It take in account the
> null-termination character.

sizeof is is preferable when dealing with constants such as:
memcpy(name, "string test1", sizeof( "string test1");

Using sizeof in this case will always copy MAXPGPATH + 1.
Modern compilers will optimize strlen,
copying fewer bytes.

best regards,
Ranier Vilela

Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-06-23 Thread Ranier Vilela

Hi.

In src/include/access/xlogbackup.h, the field *name*
has one byte extra to store null-termination.

But, in the function *do_pg_backup_start*,
I think that is a mistake in the line (8736):

memcpy(state->name, backupidstr, strlen(backupidstr));

memcpy with strlen does not copy the whole string.
strlen returns the exact length of the string, without
the null-termination.

So, I think this can result in errors,
like in the function *build_backup_content*
(src/backend/access/transam/xlogbackup.c)
Where *appendStringInfo* expects a string with null-termination.

appendStringInfo(result, "LABEL: %s\n", state->name);

To fix, copy strlen size plus one byte, to include the null-termination.

Trivial patch attached.

best regards,
Ranier Vilela


avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm

2024-06-20 Thread Ranier Vilela

Em qui., 20 de jun. de 2024 às 08:54, Tomas Vondra <
tomas.von...@enterprisedb.com> escreveu:

>
>
> On 6/20/24 13:32, Ranier Vilela wrote:
> > Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra <
> > tomas.von...@enterprisedb.com> escreveu:
> >
> >> Hi,
> >>
> >> While running valgrind on 32-bit ARM (rpi5 with debian), I got this
> >> really strange report:
> >>
> >>
> >> ==25520== Use of uninitialised value of size 4
> >> ==25520==at 0x94A550: wrapper_handler (pqsignal.c:108)
> >> ==25520==by 0x4D7826F: ??? (sigrestorer.S:64)
> >> ==25520==  Uninitialised value was created by a heap allocation
> >> ==25520==at 0x8FB780: palloc (mcxt.c:1340)
> >> ==25520==by 0x913067: tuplestore_begin_common (tuplestore.c:289)
> >> ==25520==by 0x91310B: tuplestore_begin_heap (tuplestore.c:331)
> >> ==25520==by 0x3EA717: ExecMaterial (nodeMaterial.c:64)
> >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
> >> ==25520==by 0x3EF73F: ExecProcNode (executor.h:274)
> >> ==25520==by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703)
> >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
> >> ==25520==by 0x3C47DB: ExecProcNode (executor.h:274)
> >> ==25520==by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561)
> >> ==25520==by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364)
> >> ==25520==by 0x3C7E07: ExecAgg (nodeAgg.c:2179)
> >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
> >> ==25520==by 0x3A5EC3: ExecProcNode (executor.h:274)
> >> ==25520==by 0x3A8FBF: ExecutePlan (execMain.c:1646)
> >> ==25520==by 0x3A6677: standard_ExecutorRun (execMain.c:363)
> >> ==25520==by 0x3A644B: ExecutorRun (execMain.c:304)
> >> ==25520==by 0x6976D3: PortalRunSelect (pquery.c:924)
> >> ==25520==by 0x6972F7: PortalRun (pquery.c:768)
> >> ==25520==by 0x68FA1F: exec_simple_query (postgres.c:1274)
> >> ==25520==
> >> {
> >>
> >>Memcheck:Value4
> >>fun:wrapper_handler
> >>obj:/usr/lib/arm-linux-gnueabihf/libc.so.6
> >> }
> >> **25520** Valgrind detected 1 error(s) during execution of "select
> >> count(*) from
> >> **25520**   (select * from tenk1 x order by x.thousand, x.twothousand,
> >> x.fivethous) x
> >> **25520**   left join
> >> **25520**   (select * from tenk1 y order by y.unique2) y
> >> **25520**   on x.thousand = y.unique2 and x.twothousand = y.hundred and
> >> x.fivethous = y.unique2;"
> >>
> >>
> >> I'm mostly used to weird valgrind stuff on this platform, but it's
> >> usually about libarmmmem and (possibly) thinking it might access
> >> undefined stuff when calculating checksums etc.
> >>
> >> This seems somewhat different, so I wonder if it's something real?
> >
> > It seems like a false positive to me.
> >
> > According to valgrind's documentation:
> > https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value
> >
> > " This can lead to false positive errors, as the shared memory can be
> > initialised via a first mapping, and accessed via another mapping. The
> > access via this other mapping will have its own V bits, which have not
> been
> > changed when the memory was initialised via the first mapping. The bypass
> > for these false positives is to use Memcheck's client requests
> > VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform
> > Memcheck about what your program does (or what another process does) to
> > these shared memory mappings. "
> >
>
> But that's about shared memory, and the report has nothing to do with
> shared memory AFAICS.
>
You can try once:
Selecting --expensive-definedness-checks=yes causes Memcheck to use the
most accurate analysis possible. This minimises false error rates but can
cause up to 30% performance degradation.

I did a search through my reports and none refer to this particular source.

best regards,
Ranier Vilela

Re: suspicious valgrind reports about radixtree/tidstore on arm64

2024-06-20 Thread Ranier Vilela

Em qua., 19 de jun. de 2024 às 20:52, Tom Lane  escreveu:

> I wrote:
> > I hypothesize that the reason we're not seeing equivalent failures
> > on x86_64 is one of
>
> > 1. x86_64 valgrind is stupider than aarch64, and fails to track that
> > the contents of the SIMD registers are only partially defined.
>
> > 2. x86_64 valgrind is smarter than aarch64, and is able to see
> > that the "mask off invalid entries" step removes all the
> > potentially-uninitialized bits.
>
> Hah: it's the second case.  If I patch radixtree.h as attached,
> then x86_64 valgrind complains about
>
> ==00:00:00:32.759 247596== Conditional jump or move depends on
> uninitialised value(s)
> ==00:00:00:32.759 247596==at 0x52F668: local_ts_node_16_search_eq
> (radixtree.h:1018)
>
> showing that it knows that the result of vector8_highbit_mask is
> only partly defined.

I wouldn't be surprised if *RT_NODE_16_GET_INSERTPOS*
(src/include/lib/radixtree.h),
does not suffer from the same problem?
Even with Assert trying to protect.

Does the fix not apply here too?

best regards,
Ranier Vilela

Re: confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm

2024-06-20 Thread Ranier Vilela

Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra <
tomas.von...@enterprisedb.com> escreveu:

> Hi,
>
> While running valgrind on 32-bit ARM (rpi5 with debian), I got this
> really strange report:
>
>
> ==25520== Use of uninitialised value of size 4
> ==25520==at 0x94A550: wrapper_handler (pqsignal.c:108)
> ==25520==by 0x4D7826F: ??? (sigrestorer.S:64)
> ==25520==  Uninitialised value was created by a heap allocation
> ==25520==at 0x8FB780: palloc (mcxt.c:1340)
> ==25520==by 0x913067: tuplestore_begin_common (tuplestore.c:289)
> ==25520==by 0x91310B: tuplestore_begin_heap (tuplestore.c:331)
> ==25520==by 0x3EA717: ExecMaterial (nodeMaterial.c:64)
> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
> ==25520==by 0x3EF73F: ExecProcNode (executor.h:274)
> ==25520==by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703)
> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
> ==25520==by 0x3C47DB: ExecProcNode (executor.h:274)
> ==25520==by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561)
> ==25520==by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364)
> ==25520==by 0x3C7E07: ExecAgg (nodeAgg.c:2179)
> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
> ==25520==by 0x3A5EC3: ExecProcNode (executor.h:274)
> ==25520==by 0x3A8FBF: ExecutePlan (execMain.c:1646)
> ==25520==by 0x3A6677: standard_ExecutorRun (execMain.c:363)
> ==25520==by 0x3A644B: ExecutorRun (execMain.c:304)
> ==25520==by 0x6976D3: PortalRunSelect (pquery.c:924)
> ==25520==by 0x6972F7: PortalRun (pquery.c:768)
> ==25520==by 0x68FA1F: exec_simple_query (postgres.c:1274)
> ==25520==
> {
>
>Memcheck:Value4
>fun:wrapper_handler
>obj:/usr/lib/arm-linux-gnueabihf/libc.so.6
> }
> **25520** Valgrind detected 1 error(s) during execution of "select
> count(*) from
> **25520**   (select * from tenk1 x order by x.thousand, x.twothousand,
> x.fivethous) x
> **25520**   left join
> **25520**   (select * from tenk1 y order by y.unique2) y
> **25520**   on x.thousand = y.unique2 and x.twothousand = y.hundred and
> x.fivethous = y.unique2;"
>
>
> I'm mostly used to weird valgrind stuff on this platform, but it's
> usually about libarmmmem and (possibly) thinking it might access
> undefined stuff when calculating checksums etc.
>
> This seems somewhat different, so I wonder if it's something real?

It seems like a false positive to me.

According to valgrind's documentation:
https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value

" This can lead to false positive errors, as the shared memory can be
initialised via a first mapping, and accessed via another mapping. The
access via this other mapping will have its own V bits, which have not been
changed when the memory was initialised via the first mapping. The bypass
for these false positives is to use Memcheck's client requests
VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform
Memcheck about what your program does (or what another process does) to
these shared memory mappings. "

best regards,
Ranier Vilela

Re: Add pg_get_acl() function get the ACL for a database object

2024-06-19 Thread Ranier Vilela

Em qua., 19 de jun. de 2024 às 10:28, Ranier Vilela 
escreveu:

> Em qua., 19 de jun. de 2024 às 10:26, Joel Jacobson 
> escreveu:
>
>> Hi Ranier,
>>
>> Thanks for looking at this.
>>
>> I've double-checked the patch I sent, and it works fine.
>>
>> I think I know the cause of your problem:
>>
>> Since this is a catalog change, you need to run `make clean`, to ensure
>> the catalog is rebuilt,
>> followed by the usual `make && make install`.
>>
>> You also need to run `initdb` to create a new database cluster, with the
>> new catalog version.
>>
>> Let me know if you need more specific instructions.
>>
> Sorry, sorry but I'm on Windows -> meson.
>
> Double checked with:
> ninja clean
> ninja
> ninja install
>
Sorry for the noise, now pg_get_acl is shown in the regress test.

Regarding the patch, could it be written in the following style?

Datum
pg_get_acl(PG_FUNCTION_ARGS)
{
Oid classId = PG_GETARG_OID(0);
Oid objectId = PG_GETARG_OID(1);
Oid catalogId;
AttrNumber Anum_oid;
AttrNumber Anum_acl;

/* for "pinned" items in pg_depend, return null */
if (!OidIsValid(classId) && !OidIsValid(objectId))
PG_RETURN_NULL();

catalogId = (classId == LargeObjectRelationId) ?
LargeObjectMetadataRelationId : classId;
Anum_oid = get_object_attnum_oid(catalogId);
Anum_acl = get_object_attnum_acl(catalogId);

if (Anum_acl != InvalidAttrNumber)
{
Relation rel;
HeapTuple tup;
Datum datum;
bool isnull;

rel = table_open(catalogId, AccessShareLock);

tup = get_catalog_object_by_oid(rel, Anum_oid, objectId);
if (!HeapTupleIsValid(tup))
elog(ERROR, "cache lookup failed for object %u of catalog \"%s\"",
objectId, RelationGetRelationName(rel));

datum = heap_getattr(tup, Anum_acl, RelationGetDescr(rel), );

table_close(rel, AccessShareLock);

if (!isnull)
PG_RETURN_DATUM(datum);
}

PG_RETURN_NULL();
}

best regards,
Ranier Vilela

Re: Add pg_get_acl() function get the ACL for a database object

2024-06-19 Thread Ranier Vilela

Em qua., 19 de jun. de 2024 às 10:26, Joel Jacobson 
escreveu:

> Hi Ranier,
>
> Thanks for looking at this.
>
> I've double-checked the patch I sent, and it works fine.
>
> I think I know the cause of your problem:
>
> Since this is a catalog change, you need to run `make clean`, to ensure
> the catalog is rebuilt,
> followed by the usual `make && make install`.
>
> You also need to run `initdb` to create a new database cluster, with the
> new catalog version.
>
> Let me know if you need more specific instructions.
>
Sorry, sorry but I'm on Windows -> meson.

Double checked with:
ninja clean
ninja
ninja install

best regards,
Ranier Vilela

Re: Add pg_get_acl() function get the ACL for a database object

2024-06-19 Thread Ranier Vilela

Em qua., 19 de jun. de 2024 às 08:35, Joel Jacobson 
escreveu:

> Hello hackers,
>
> Currently, obtaining the Access Control List (ACL) for a database object
> requires querying specific pg_catalog tables directly, where the user
> needs to know the name of the ACL column for the object.
>
> Consider:
>
> ```
> CREATE USER test_user;
> CREATE USER test_owner;
> CREATE SCHEMA test_schema AUTHORIZATION test_owner;
> SET ROLE TO test_owner;
> CREATE TABLE test_schema.test_table ();
> GRANT SELECT ON TABLE test_schema.test_table TO test_user;
> ```
>
> To get the ACL we can do:
>
> ```
> SELECT relacl FROM pg_class WHERE oid =
> 'test_schema.test_table'::regclass::oid;
>
>  relacl
> -
>  {test_owner=arwdDxtm/test_owner,test_user=r/test_owner}
> ```
>
> Attached patch adds a new SQL-callable functoin `pg_get_acl()`, so we can
> do:
>
> ```
> SELECT pg_get_acl('pg_class'::regclass,
> 'test_schema.test_table'::regclass::oid);
>pg_get_acl
> -
>  {test_owner=arwdDxtm/test_owner,test_user=r/test_owner}
> ```
>
> The original idea for this function came from Alvaro Herrera,
> in this related discussion:
> https://postgr.es/m/261def6b-226e-4238-b7eb-ff240cb9c...@www.fastmail.com
>
> On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote:
> > On 2021-Mar-25, Joel Jacobson wrote:
> >
> >> pg_shdepend doesn't contain the aclitem info though,
> >> so it won't work for pg_permissions if we want to expose
> >> privilege_type, is_grantable and grantor.
> >
> > Ah, of course -- the only way to obtain the acl columns is by going
> > through the catalogs individually, so it won't be possible.  I think
> > this could be fixed with some very simple, quick function pg_get_acl()
> > that takes a catalog OID and object OID and returns the ACL; then
> > use aclexplode() to obtain all those details.
>
> The pg_get_acl() function has been implemented by following
> the guidance from Alvaro in the related dicussion:
>
> On Fri, Mar 26, 2021, at 13:43, Alvaro Herrera wrote:
> > AFAICS the way to do it is like AlterObjectOwner_internal obtains data
> > -- first do get_catalog_object_by_oid (gives you the HeapTuple that
> > represents the object), then
> > heap_getattr( ..., get_object_attnum_acl(), ..), and there you have the
> > ACL which you can "explode" (or maybe just return as-is).
> >
> > AFAICS if you do this, it's just one cache lookups per object, or
> > one indexscan for the cases with no by-OID syscache.  It should be much
> > cheaper than the UNION ALL query.  And you use pg_shdepend to guide
> > this, so you only do it for the objects that you already know are
> > interesting.
>
> Many thanks Alvaro for the very helpful instructions.
>
> This function would then allow users to e.g. create a view to show the
> privileges
> for all database objects, like the pg_privileges system view suggested in
> the
> related discussion.
>
> Tests and docs are added.
>
Hi,
For some reason, the function pg_get_acl, does not exist in generated
fmgrtab.c

So, when install postgres, the function does not work.

postgres=# SELECT pg_get_acl('pg_class'::regclass, 'atest2'::regclass::oid);
ERROR:  function pg_get_acl(regclass, oid) does not exist
LINE 1: SELECT pg_get_acl('pg_class'::regclass, 'atest2'::regclass::...
   ^
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.

best regards,
Ranier Vilela

Re: replace strtok()

2024-06-18 Thread Ranier Vilela

Em ter., 18 de jun. de 2024 às 04:18, Peter Eisentraut 
escreveu:

> Under the topic of getting rid of thread-unsafe functions in the backend
> [0], here is a patch series to deal with strtok().
>
> Of course, strtok() is famously not thread-safe and can be replaced by
> strtok_r().  But it also has the wrong semantics in some cases, because
> it considers adjacent delimiters to be one delimiter.  So if you parse
>
>  SCRAM-SHA-256$:$:
>
> with strtok(), then
>
>  SCRAM-SHA-256$$::$$::
>
> parses just the same.  In many cases, this is arguably wrong and could
> hide mistakes.
>
> So I'm suggesting to use strsep() in those places.  strsep() is
> nonstandard but widely available.
>
> There are a few places where strtok() has the right semantics, such as
> parsing tokens separated by whitespace.  For those, I'm using strtok_r().
>
> A reviewer job here would be to check whether I made that distinction
> correctly in each case.
>
> On the portability side, I'm including a port/ replacement for strsep()
> and some workaround to get strtok_r() for Windows.  I have included
> these here as separate patches for clarity.
>
+1 For making the code thread-safe.
But I would like to see more const char * where this is possible.

For example, in pg_locale.c
IMO, the token variable can be const char *.

At least strchr expects a const char * as the first parameter.

I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any testing.

best regards,
Ranier Vilela
/* strsep.h
 *
 *  Provides the 4.4BSD strsep(3) function for those that don't have it.
 *
 *  Copyright 2011 Michael Thomas Greer
 *  Distributed under the Boost Software License, Version 1.0.
 *  ( See accompanying file LICENSE_1_0.txt or copy at
 *   http://www.boost.org/LICENSE_1_0.txt )
 *
 *  Including this file modifies the std namespace in C++.
 *
 *  Don't include this file if your compiler provides the strsep function in 
.
 *  Make sure your build process tests for this and behaves accordingly!
 *
 */

#include 

char *
strsep(char **stringp, const char *delim)
{
char *result;

if ((stringp == NULL) || (*stringp == NULL))
return NULL;

result = *stringp;

while(**stringp && !(strchr(delim, **stringp)))
++*stringp;

if (**stringp)
*(*stringp)++ = '\0';
else
*stringp = NULL;

return result;
}

Re: Improve the granularity of PQsocketPoll's timeout parameter?

2024-06-13 Thread Ranier Vilela

Em qui., 13 de jun. de 2024 às 12:25, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > I'm unsure if the documentation matches the code?
> > " connect_timeout #
> > <
> https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT
> >
>
> > Maximum time to wait while connecting, in seconds (write as a decimal
> > integer, e.g., 10). Zero, negative, or not specified means wait
> > indefinitely. The minimum allowed timeout is 2 seconds, therefore a value
> > of 1 is interpreted as 2. This timeout applies separately to each host
> name
> > or IP address. For example, if you specify two hosts and connect_timeout
> is
> > 5, each host will time out if no connection is made within 5 seconds, so
> > the total time spent waiting for a connection might be up to 10 seconds.
> > "
> > The comments says that timeout = 0, means *Timeout is immediate (no
> > blocking)*
>
> > Does the word "indefinitely" mean infinite?
> > If yes, connect_timeout = -1, mean infinite?
>
> The sentence about "minimum allowed timeout is 2 seconds" has to go
> away, but the rest of that seems fine.
>
> But now that you mention it, we could drop the vestigial
>
> >> if (timeout < 0)
> >> timeout = 0;
>
> as well, because the rest of the function only applies the timeout
> when "timeout > 0".  Otherwise end_time (nee finish_time) stays at -1.
>
I think that's OK Tom.

+1 for push.

best regards,
Ranier Vilela

Re: Improve the granularity of PQsocketPoll's timeout parameter?

2024-06-13 Thread Ranier Vilela

Em qua., 12 de jun. de 2024 às 16:45, Tom Lane  escreveu:

> I wrote:
> > v3 attached uses pg_usec_time_t, and fixes one brown-paper-bag
> > bug the cfbot noticed in v2.
>
> Oh, I just remembered that there's a different bit of
> pqConnectDBComplete that we could simplify now:
>
> if (timeout > 0)
> {
> /*
>  * Rounding could cause connection to fail unexpectedly
> quickly;
>  * to prevent possibly waiting hardly-at-all, insist on at
> least
>  * two seconds.
>  */
> if (timeout < 2)
> timeout = 2;
> }
> else/* negative means 0 */
> timeout = 0;
>
> With this infrastructure, there's no longer any need to discriminate
> against timeout == 1 second, so we might as well reduce this to
>
> if (timeout < 0)
> timeout = 0;
>
> as it's done elsewhere.
>
I'm unsure if the documentation matches the code?
" connect_timeout #
<https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT>

Maximum time to wait while connecting, in seconds (write as a decimal
integer, e.g., 10). Zero, negative, or not specified means wait
indefinitely. The minimum allowed timeout is 2 seconds, therefore a value
of 1 is interpreted as 2. This timeout applies separately to each host name
or IP address. For example, if you specify two hosts and connect_timeout is
5, each host will time out if no connection is made within 5 seconds, so
the total time spent waiting for a connection might be up to 10 seconds.
"
The comments says that timeout = 0, means *Timeout is immediate (no
blocking)*

Does the word "indefinitely" mean infinite?
If yes, connect_timeout = -1, mean infinite?

best regards,
Ranier Vilela

Re: Improve the granularity of PQsocketPoll's timeout parameter?

2024-06-12 Thread Ranier Vilela

Em qua., 12 de jun. de 2024 às 14:53, Tom Lane  escreveu:

> Robert Haas  writes:
> > I agree this is not great. I guess I didn't think about it very hard
> > because, after all, we were just exposing an API that we'd already
> > been using internally. But I think it's reasonable to adjust the API
> > to allow for better resolution, as you propose. A second is a very
> > long amount of time, and it's entirely reasonable for someone to want
> > better granularity.
>
> Here's a v2 responding to some of the comments.
>
> * People pushed back against not using "int64", but the difficulty
> with that is that we'd have to #include c.h or at least pg_config.h
> in libpq-fe.h, and that would be a totally disastrous invasion of
> application namespace.  However, I'd forgotten that libpq-fe.h
> does include postgres_ext.h, and there's just enough configure
> infrastructure behind that to allow defining pg_int64, which indeed
> libpq-fe.h is already relying on.  So we can use that.
>
> * I decided to invent a typedef
>
> typedef pg_int64 PGusec_time_t;
>
Perhaps pg_timeusec?
I think it combines with PQgetCurrentTimeUSec


>
> instead of writing "pg_int64" explicitly everywhere.  This is perhaps
> not as useful as it was when I was thinking the definition would be
> "long long int", but it still seems to add some readability.  In my
> eyes anyway ... anyone think differently?
>
> * I also undid changes like s/end_time/end_time_us/.  I'd done that
> mostly to ensure I looked at/fixed every reference to those variables,
> but on reflection I don't think it's doing anything for readability.
>
end_time seems much better to me.


>
> * I took Ranier's suggestion to make fast paths for end_time == 0.
> I'm not sure this will make any visible performance difference, but
> it's simple and shouldn't hurt.  We do have some code paths that use
> that behavior.
>
Thanks.


>
> * Ranier also suggested using clock_gettime instead of gettimeofday,
> but I see no reason to do that.  libpq already relies on gettimeofday,
> but not on clock_gettime, and anyway post-beta1 isn't a great time to
> start experimenting with portability-relevant changes.
>
I agree.
For v18, it would be a case of thinking about not using it anymore
gettimeofday, as it appears to be deprecated.

best regards,
Ranier Vilela

Re: Columnar format export in Postgres

2024-06-12 Thread Ranier Vilela

Em qua., 12 de jun. de 2024 às 13:56, Sushrut Shivaswamy <
sushrut.shivasw...@gmail.com> escreveu:

> Hey Postgres team,
>
> I have been working on adding support for columnar format export to
> Postgres to speed up analytics queries.
> I've created an extension that achieves this functionality here
> <https://github.com/sushrut141/pg_analytica>.
>
> I"m looking to improve the performance of this extension to enable drop-in
> analytics support for Postgres. Some immediate improvements I have in mind
> are:
>  - Reduce memory consumption when exporting table data to columnar format
>  - Create a native planner / execution hook that can read columnar data
> with vectorised operations.
>
> It would be very helpful if you could take a look and suggest improvements
> to the extension.
> Hopefully, this extension can be shipped by default with postgres at some
> point in the future.
>
If you want to have any hope, the license must be BSD.
GPL is incompatible.

best regards,
Ranier Vilela

Re: Improve the granularity of PQsocketPoll's timeout parameter?

2024-06-11 Thread Ranier Vilela

Em seg., 10 de jun. de 2024 às 18:39, Tom Lane  escreveu:

> In [1] Dominique Devienne complained that PQsocketPoll would be
> far more useful to him if it had better-than-one-second timeout
> resolution.  I initially pushed back on that on the grounds that
> post-beta1 is a bit late to be redefining public APIs.  Which it is,
> but if we don't fix it now then we'll be stuck supporting that API
> indefinitely.  And it's not like one-second resolution is great
> for our internal usage either --- for example, I see that psql
> is now doing
>
> end_time = time(NULL) + 1;
> rc = PQsocketPoll(sock, forRead, !forRead, end_time);
>
> which claims to be waiting one second, but actually it's waiting
> somewhere between 0 and 1 second.  So I thought I'd look into
> whether we can still change it without too much pain, and I think
> we can.
>
> The $64 question is how to represent the end_time if not as time_t.
> The only alternative POSIX offers AFAIK is gettimeofday's "struct
> timeval", which is painful to compute with and I don't think it's
> native on Windows.  What I suggest is that we use int64 microseconds
> since the epoch, which is the same idea as the backend's TimestampTz
> except I think we'd better use the Unix epoch not 2000-01-01.
> Then converting code is just a matter of changing variable types
> and adding some zeroes to constants.
>
> The next question is how to spell "int64" in libpq-fe.h.  As a
> client-exposed header, the portability constraints on it are pretty
> stringent, so even in 2024 I'm loath to make it depend on ;
> and certainly depending on our internal int64 typedef won't do.
> What I did in the attached is to write "long long int", which is
> required to be at least 64 bits by C99.  Other opinions are possible
> of course.
>
> Lastly, we need a way to get current time in this form.  My first
> draft of the attached patch had the callers calling gettimeofday
> and doing arithmetic from that, but it seems a lot better to provide
> a function that just parallels time(2).
>
> BTW, I think this removes the need for libpq-fe.h to #include ,
> but I didn't remove that because it seems likely that some callers are
> indirectly relying on it to be present.  Removing it wouldn't gain
> very much anyway.
>
> Thoughts?
>
Regarding your patch:

1. I think can remove *int64* in comments:
+ * The timeout is specified by end_time_us, which is the number of
+ * microseconds since the Unix epoch (that is, time_t times 1 million).
+ * Timeout is infinite if end_time is -1.  Timeout is immediate (no
blocking)
+ * if end_time is 0 (or indeed, any time before now).

+ * The timeout is specified by end_time_us, which is the number of
+ * microseconds since the Unix epoch (that is, time_t times 1 million).

2. I think it's worth testing whether end_time_ns equals zero,
which can avoid a call to PQgetCurrentTimeNSec()

@@ -1103,14 +1113,16 @@ PQsocketPoll(int sock, int forRead, int forWrite,
time_t end_time)
  input_fd.events |= POLLOUT;

  /* Compute appropriate timeout interval */
- if (end_time == ((time_t) -1))
+ if (end_time_ns == -1)
  timeout_ms = -1;
+ else if (end_time_ns == 0)
+ timeout_ms = 0;

3. I think it's worth testing whether end_time_ns equals zero,
which can avoid a call to PQgetCurrentTimeNSec()

@@ -1138,17 +1150,29 @@ PQsocketPoll(int sock, int forRead, int forWrite,
time_t end_time)
  FD_SET(sock, _mask);

  /* Compute appropriate timeout interval */
- if (end_time == ((time_t) -1))
+ if (end_time_ns == -1)
  ptr_timeout = NULL;
+ else if (end_time_ns == 0)
+ {
+ timeout.tv_sec = 0;
+ timeout.tv_usec = 0;
+
+ ptr_timeout = 
+ }

best regards,
Ranier Vilela

Re: Improve the granularity of PQsocketPoll's timeout parameter?

2024-06-11 Thread Ranier Vilela

Em seg., 10 de jun. de 2024 às 18:39, Tom Lane  escreveu:

> In [1] Dominique Devienne complained that PQsocketPoll would be
> far more useful to him if it had better-than-one-second timeout
> resolution.  I initially pushed back on that on the grounds that
> post-beta1 is a bit late to be redefining public APIs.  Which it is,
> but if we don't fix it now then we'll be stuck supporting that API
> indefinitely.  And it's not like one-second resolution is great
> for our internal usage either --- for example, I see that psql
> is now doing
>
> end_time = time(NULL) + 1;
> rc = PQsocketPoll(sock, forRead, !forRead, end_time);
>
> which claims to be waiting one second, but actually it's waiting
> somewhere between 0 and 1 second.  So I thought I'd look into
> whether we can still change it without too much pain, and I think
> we can.
>
> The $64 question is how to represent the end_time if not as time_t.
> The only alternative POSIX offers AFAIK is gettimeofday's "struct
> timeval", which is painful to compute with and I don't think it's
> native on Windows.  What I suggest is that we use int64 microseconds
> since the epoch, which is the same idea as the backend's TimestampTz
> except I think we'd better use the Unix epoch not 2000-01-01.
> Then converting code is just a matter of changing variable types
> and adding some zeroes to constants.
>
> The next question is how to spell "int64" in libpq-fe.h.  As a
> client-exposed header, the portability constraints on it are pretty
> stringent, so even in 2024 I'm loath to make it depend on ;
> and certainly depending on our internal int64 typedef won't do.
> What I did in the attached is to write "long long int", which is
> required to be at least 64 bits by C99.  Other opinions are possible
> of course.
>
> Lastly, we need a way to get current time in this form.  My first
> draft of the attached patch had the callers calling gettimeofday
> and doing arithmetic from that, but it seems a lot better to provide
> a function that just parallels time(2).
>
> BTW, I think this removes the need for libpq-fe.h to #include ,
> but I didn't remove that because it seems likely that some callers are
> indirectly relying on it to be present.  Removing it wouldn't gain
> very much anyway.
>
> Thoughts?
>
Hi Tom.

Why not use uint64?
I think it's available in (fe-misc.c)

IMO, gettimeofday It also seems to me that it is deprecated.

Can I suggest a version using *clock_gettime*,
which I made based on versions available on the web?

/*
 * PQgetCurrentTimeUSec: get current time with nanosecond precision
 *
 * This provides a platform-independent way of producing a reference
 * value for PQsocketPoll's timeout parameter.
 */

uint64
PQgetCurrentTimeUSec(void)
{
#ifdef __MACH__
struct timespec ts;
clock_serv_t cclock;
mach_timespec_t mts;

host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, );
clock_get_time(cclock, );
mach_port_deallocate(mach_task_self(), cclock);
ts.tv_sec = mts.tv_sec;
ts.tv_nsec = mts.tv_nsec;
#eldef _WIN32_
struct timespec ts { long tv_sec; long tv_nsec; };
__int64 wintime;

GetSystemTimeAsFileTime((FILETIME*) );
wintime   -= 1164447360i64; // 1jan1601 to 1jan1970
ts.tv_sec  = wintime / 1000i64; // seconds
ts.tv_nsec = wintime % 1000i64 * 100;  // nano-seconds
#else
struct timespec ts;

clock_gettime(CLOCK_MONOTONIC, );
#endif

return (ts.tv_sec * 10L) + ts.tv_nsec;
}

best regards,
Ranier Vilela

Re: list_free in addRangeTableEntryForJoin

2024-06-10 Thread Ranier Vilela

Em seg., 10 de jun. de 2024 às 10:45, Ilia Evdokimov <
ilya.evdoki...@tantorlabs.com> escreveu:

>  >Now you need to analyze whether the memory in question is not managed
> by a Context
> I've already analyzed. Let's explain details:
>
>
> 1. analyze.c
> 1718: List* targetnames;
> 1815: targetnames = NIL;
> 1848: targetnames = lappend(targetnames, makeString(colName));
> 1871: addRangeTableEntryForJoin(...);
> => list_free(targetnames);
>
> 2. parse_clause.c
> 1163: List* res_colnames;
> 1289: res_colnames = NIL;
> 1324: foreach(col, res_colnames);
> 1396: res_colnames = lappend(res_colnames, lfirst(ucol));
> 1520, 1525: extractRemainingColumns(...);
> |
>  290: *res_colnames = lappend(*res_colnames, lfirst(lc));
> 1543: addRangeTableEntryForJoin(...);
> => list_free(res_colnames);
>
>
> As you can see, there are no other pointers working with this block of
> memory,

You can see this line?
sortnscolumns = (ParseNamespaceColumn *)
palloc0

All allocations in the backend occur at Context memory managers.
Resource leak can occur mainly with TopTransactionContext.


> and all operations above are either read-only or append nodes to
> the lists. If I am mistaken, please correct me.
> Furthermore, I will call `list_free` immediately after
> `addRangeTableEntryForJoin()`. The new patch is attached.
>
This style is not recommended.
You prevent the use of colnames after calling addRangeTableEntryForJoin.

Better free at the end of the function, like 0002.

Tip 0001, 0002, 0003 numerations are to different patchs.
v1, v2, v3 are new versions of the same patch.

best regards,
Ranier Vilela

Re: list_free in addRangeTableEntryForJoin

2024-06-10 Thread Ranier Vilela

Em seg., 10 de jun. de 2024 às 09:11, Ilia Evdokimov <
ilya.evdoki...@tantorlabs.com> escreveu:

>  >But callers of addRangeTableEntryForJoin(), expects to handle a list
> or NIL, if we free the memory
> I've thoroughly reviewed all callers of the
> `addRangeTableEntryForJoin()` function and confirmed that the list is
> not used after this function is called. Since
> `addRangeTableEntryForJoin()` is the last function to use this list, it
> would be more efficient to free the `List` at the point of its declaration.
>
> I'll attach new patch where I free the lists.
>
Seems like a better style.

Now you need to analyze whether the memory in question is not managed by a
Context and released in a block,
which would make this release useless.

best regards,
Ranier Vilela

Re: list_free in addRangeTableEntryForJoin

2024-06-10 Thread Ranier Vilela

Em seg., 10 de jun. de 2024 às 07:35, Ilia Evdokimov <
ilya.evdoki...@tantorlabs.com> escreveu:

> Hi Hackers
>
> I have identified a potential memory leak in the
> `addRangeTableEntryForJoin()` function. The second parameter of
> `addRangeTableEntryForJoin()`, `colnames`, is a `List*` that is
> concatenated with another `List*`, `eref->colnames`, using
> `list_concat()`. We need to pass only the last `numaliases` elements of
> the list, for which we use `list_copy_tail`. This function creates a
> copy of the `colnames` list, resulting in `colnames` pointing to the
> current list that will not be freed. Consequently, a new list is already
> concatenated.
>
> To address this issue, I have invoked `list_free(colnames)` afterwards.
> If anyone is aware of where the list is being freed or has any
> suggestions for improvement, I would greatly appreciate your input.
>
list_copy_tail actually makes a new copy.
But callers of addRangeTableEntryForJoin,
expects to handle a list or NIL, if we free the memory,
Shouldn't I set the variable *colnames* to NIL, too?

best regards,
Ranier Vilela

Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

2024-06-05 Thread Ranier Vilela

Em ter., 4 de jun. de 2024 às 16:39, Nathan Bossart <
nathandboss...@gmail.com> escreveu:

> I noticed that the "Restoring database schemas in the new cluster" part of
> pg_upgrade can take a while if you have many databases, so I experimented
> with a couple different settings to see if there are any easy ways to speed
> it up.  The FILE_COPY strategy for CREATE DATABASE helped quite
> significantly on my laptop.  For ~3k empty databases, this step went from
> ~100 seconds to ~30 seconds with the attached patch.  I see commit ad43a41
> made a similar change for initdb, so there might even be an argument for
> back-patching this to v15 (where STRATEGY was introduced).  One thing I
> still need to verify is that this doesn't harm anything when there are lots
> of objects in the databases, i.e., more WAL generated during many
> concurrent CREATE-DATABASE-induced checkpoints.
>
> Thoughts?
>
Why not use it too, if not binary_upgrade?

else
{
appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0
STRATEGY = FILE_COPY",
 qdatname);
}

It seems to me that it also improves in any use.

best regards,
Ranier Vilela

Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-06-05 Thread Ranier Vilela

Em qua., 5 de jun. de 2024 às 02:04, Michael Paquier 
escreveu:

> On Wed, Jun 05, 2024 at 01:12:41PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 27 May 2024 11:31:24 -0300, Ranier Vilela 
> wrote in
> >> The function *plpgsql_inline_handler* can use uninitialized
> >> variable retval, if PG_TRY fails.
> >> Fix like function*plpgsql_call_handler* wich declare retval as
> >> volatile and initialize to (Datum 0).
>
> You forgot to read elog.h, explaining under which circumstances
> variables related to TRY/CATCH block should be marked as volatile.
> There is a big "Note:" paragraph.
>
> It is not the first time that this is mentioned on this list: but
> sending a report without looking at the reason why a change is
> justified makes everybody waste time.  That's not productive.
>
Of course, this is very bad when it happens.


>
> > If PG_TRY fails, retval is not actually accessed, so no real issue
> > exists. Commit 7292fd8f1c changed plpgsql_call_handler() to the
> > current form, but as stated in its commit message, it did not fix a
> > real issue and was solely to silence compiler.
>
> This complain was from lapwing, that uses a version of gcc which
> produces a lot of noise with incorrect issues.  It is one of the only
> 32b buildfarm members, so it still has a lot of value.
>
I posted the report, because of an uninitialized variable warning.
Which is one of the most problematic situations, when it *actually exists*.


> > I believe we do not need to modify plpgsql_inline_handler() unless
> > compiler actually issues a false warning for it.
>
> If we were to do something, that would be to remove the volatile from
> plpgsql_call_handler() at the end once we don't have in the buildfarm
> compilers that complain about it, because there is no reason to use a
> volatile in this case.  :)
>
I don't see any motivation, since there are no reports.

best regards,
Ranier Vilela

Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-06-05 Thread Ranier Vilela

Em qua., 5 de jun. de 2024 às 01:12, Kyotaro Horiguchi <
horikyota@gmail.com> escreveu:

> At Mon, 27 May 2024 11:31:24 -0300, Ranier Vilela 
> wrote in
> > Hi.
> >
> > The function *plpgsql_inline_handler* can use uninitialized
> > variable retval, if PG_TRY fails.
> > Fix like function*plpgsql_call_handler* wich declare retval as
> > volatile and initialize to (Datum 0).
>
> If PG_TRY fails, retval is not actually accessed, so no real issue
> exists.

You say it for this call
PG_RE_THROW();


> Commit 7292fd8f1c changed plpgsql_call_handler() to the
> current form, but as stated in its commit message, it did not fix a
> real issue and was solely to silence compiler.
>

> I believe we do not need to modify plpgsql_inline_handler() unless
> compiler actually issues a false warning for it.
>
Yeah, there is a warning, but not from the compiler.

best regards,
Ranier Vilela

Re: Avoid an odd undefined behavior with memcmp (src/bin/pg_rewind/pg_rewind.c)

2024-06-02 Thread Ranier Vilela

Em qua., 29 de mai. de 2024 às 22:41, Long Song 
escreveu:

>
> Hi Ranier,
>
>
>
> > IMO, I think that pg_rewind can have a security issue,
> > if two files are exactly the same, they are considered different.
> > Because use of structs with padding values is unspecified.
> Logically you are right. But I don't understand what scenario
> would require memcmp to compare ControlFileData.
> In general, we read ControlFileData from a pg_control file
> and then use members of ControlFileData directly.
> So the two ControlFileData are not directly compared by byte.
>
Actually in pg_rewind there is a comparison using memcmp.


>
> > Fix by explicitly initializing with memset to avoid this.
> And, even if there are scenarios that use memcmp comparisons,
> your modifications are not complete.
> There are three calls to the digestControlFile in the main()
> of pg_rewind.c, and as your said(if right), these should do
> memory initialization every time.
>
In fact, initializing structures with memset does not solve anything.
Once the entire structure is populated again by a call to memcpy shortly
thereafter.
My concern now is that when the structure is saved to disk,
what are the padding fields like?

But enough noise.
Thanks for taking a look.

best regards,
Ranier Vilela

Re: Fix possible dereference null pointer (PQprint)

2024-06-02 Thread Ranier Vilela

Em sex., 31 de mai. de 2024 às 05:03, Daniel Gustafsson 
escreveu:

> > On 27 May 2024, at 16:52, Ranier Vilela  wrote:
>
> > In the function *PQprint*, the variable po->fieldName can be NULL.
>
> Yes.
>
> > See the checks a few lines up.
>
> Indeed, let's check it.
>
> for (numFieldName = 0;
>  po->fieldName && po->fieldName[numFieldName];
>  numFieldName++)
> ;
> for (j = 0; j < nFields; j++)
> {
> int len;
> const char *s = (j < numFieldName && po->fieldName[j][0]) ?
> po->fieldName[j] : PQfname(res, j);
>
> If po->fieldName is NULL then numFieldName won't be incremented and will
> remain
> zero.  In the check you reference we check (j < numFieldName) which will
> check
> the j in the range 0..nFields for being less than zero.  The code thus does
> seem quite correct to me.
>
You are completely correct. My bad.

Thank you Daniel.

best regards,
Ranier Vilela

Avoid an odd undefined behavior with memcmp (src/bin/pg_rewind/pg_rewind.c)

2024-05-28 Thread Ranier Vilela

Hi.

The function *perform_rewind* has an odd undefined behavior.
The function memcmp/ <https://cplusplus.com/reference/cstring/memcmp/>,
compares bytes to bytes.

IMO, I think that pg_rewind can have a security issue,
if two files are exactly the same, they are considered different.
Because use of structs with padding values is unspecified.

Fix by explicitly initializing with memset to avoid this.

best regards,
Ranier Vilela


avoid-undefined-compares-two-structs-with-padding-bytes-pg_rewind.patch
Description: Binary data

Re: Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Ranier Vilela

Em seg., 27 de mai. de 2024 às 14:42, Daniel Gustafsson 
escreveu:

> > On 27 May 2024, at 18:47, Jelte Fennema-Nio  wrote:
>
> > So overall I agree pqReleaseConnHost and release_conn_addrinfo can be
> > improved for easier safe usage in the future, but I don't think those
> > improvements should be grouped into the same commit with an actual
> > bugfix.
>
> I have pushed the fix for the calloc check for now.
>
Thanks Daniel.

best regards,
Ranier Vilela

Re: Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Ranier Vilela

Em seg., 27 de mai. de 2024 às 13:47, Jelte Fennema-Nio 
escreveu:

> On Mon, 27 May 2024 at 18:16, Ranier Vilela  wrote:
> > Is it mandatory to call PQcancelFinish in case PQcancelCreate fails?
>
>
> Yes, see the following line in the docs:
>
> Note that when PQcancelCreate returns a non-null pointer, you must
> call PQcancelFinish when you are finished with it, in order to dispose
> of the structure and any associated memory blocks. This must be done
> even if the cancel request failed or was abandoned.
>
> Source:
> https://www.postgresql.org/docs/17/libpq-cancel.html#LIBPQ-PQCANCELCREATE
>
> > IMO, I would expect problems with users.
>
> This pattern is taken from regular connection creation and is really
> common in libpq code so I don't expect issues. And even if there were
> an issue, your v1 patch would not be nearly enough. Because you're
> only freeing the connhost and addr field now in case of OOM. But there
> are many more fields that would need to be freed.
>
> >> And that function will free any
> >> partially initialized PGconn correctly. This is also how
> >> pqConnectOptions2 works.
> >
> > Well, I think that function *pqReleaseConnHost*, is incomplete.
> > 1. Don't free connhost field;
>
> ehm... it does free that afaict? It only doesn't set it to NULL. Which
> indeed would be good to do, but not doing so doesn't cause any issues
> with it's current 2 usages afaict.
>
> > 2. Don't free addr field;
>
> That's what release_conn_addrinfo is for.
>
> > 3. Leave nconnhost and naddr, non-zero.
>
> I agree that it would be good to do that pqReleaseConnHost and
> release_conn_addrinfo. But also here, I don't see any issues caused by
> not doing that currently.
>
> So overall I agree pqReleaseConnHost and release_conn_addrinfo can be
> improved for easier safe usage in the future, but I don't think those
> improvements should be grouped into the same commit with an actual
> bugfix.
>
Thanks for detailed comments.
I agreed to apply the trivial fix.

Would you like to take charge of pqReleaseConnHost and
release_conn_addrinfo improvements?

best regards,
Ranier Vilela

Re: Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Ranier Vilela

Em seg., 27 de mai. de 2024 às 12:40, Jelte Fennema-Nio 
escreveu:

> On Mon, 27 May 2024 at 16:06, Ranier Vilela  wrote:
> > Em seg., 27 de mai. de 2024 às 10:23, Daniel Gustafsson 
> escreveu:
> >> > On 27 May 2024, at 14:25, Ranier Vilela  wrote:
> >> > I think that commit 61461a3, left some oversight.
> >> > The function *PQcancelCreate* fails in check,
> >> > return of *calloc* function.
> >> >
> >> > Trivial fix is attached.
> >>
> >> Agreed, this looks like a copy/paste from the calloc calls a few lines
> up.
> >
> > Yeah.
>
> Agreed, this was indeed a copy paste mistake
>
>
> >> > But, IMO, I think that has more problems.
> >> > If any allocation fails, all allocations must be cleared.
> >> > Or is the current behavior acceptable?
> >>
> >> Since this is frontend library code I think we should free all the
> allocations
> >> in case of OOM.
> >
> > Agreed.
> >
> > With v1 patch, it is handled.
>
> I much prefer the original trivial patch to the v1. Even in case of
> OOM users are expected to call PQcancelFinish on a non-NULL result,
> which in turn calls freePGConn.

Is it mandatory to call PQcancelFinish in case PQcancelCreate fails?
IMO, I would expect problems with users.

And that function will free any
> partially initialized PGconn correctly. This is also how
> pqConnectOptions2 works.
>
Well, I think that function *pqReleaseConnHost*, is incomplete.
1. Don't free connhost field;
2. Don't free addr field;
3. Leave nconnhost and naddr, non-zero.

So trust in *pqReleaseConnHost *, to clean properly, It's insecure.

best regards,
Ranier Vilela

Fix possible dereference null pointer (PQprint)

2024-05-27 Thread Ranier Vilela

Hi.

In the function *PQprint*, the variable po->fieldName
can be NULL.
See the checks a few lines up.

for (numFieldName = 0;
po->fieldName && po->fieldName[numFieldName];
numFieldName++)

So, I think that must be checked, when used,
in the loop below.

best regards,
Ranier Vilela


fix-possible-dereference-null-pointer-PQprint.patch
Description: Binary data

Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-05-27 Thread Ranier Vilela

Hi.

The function *plpgsql_inline_handler* can use uninitialized
variable retval, if PG_TRY fails.
Fix like function*plpgsql_call_handler* wich declare retval as
volatile and initialize to (Datum 0).

best regards,
Ranier Vilela


fix-use-uninitialized-retval-variable-pl_handler.patch
Description: Binary data

Re: Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Ranier Vilela

Hi Daniel,

Em seg., 27 de mai. de 2024 às 10:23, Daniel Gustafsson 
escreveu:

> > On 27 May 2024, at 14:25, Ranier Vilela  wrote:
>
> > I think that commit 61461a3, left some oversight.
> > The function *PQcancelCreate* fails in check,
> > return of *calloc* function.
> >
> > Trivial fix is attached.
>
> Agreed, this looks like a copy/paste from the calloc calls a few lines up.
>
Yeah.


>
> > But, IMO, I think that has more problems.
> > If any allocation fails, all allocations must be cleared.
> > Or is the current behavior acceptable?
>
> Since this is frontend library code I think we should free all the
> allocations
> in case of OOM.
>
Agreed.

With v1 patch, it is handled.

best regards,
Ranier Vilela


v1-fix-calloc-check-oom-PQcancelCreate.patch
Description: Binary data

Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Ranier Vilela

Hi.

I think that commit 61461a3
<https://github.com/postgres/postgres/commit/61461a300c1cb5d53955ecd792ad0ce75a104736>,
left some oversight.
The function *PQcancelCreate* fails in check,
return of *calloc* function.

Trivial fix is attached.

But, IMO, I think that has more problems.
If any allocation fails, all allocations must be cleared.
Or is the current behavior acceptable?

best regards,
Ranier Vilela


fix-calloc-check-oom-PQcancelCreate.patch
Description: Binary data

Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-24 Thread Ranier Vilela

Em sex., 24 de mai. de 2024 às 08:48, Ashutosh Bapat <
ashutosh.bapat@gmail.com> escreveu:

>
>
> On Fri, May 24, 2024 at 12:16 PM Michael Paquier 
> wrote:
>
>> On Fri, May 24, 2024 at 11:58:51AM +0530, Ashutosh Bapat wrote:
>> > If we are looking for avoiding a segfault and get a message which helps
>> > debugging, using get_attname and get_attnum might be better options.
>> > get_attname throws an error. get_attnum doesn't throw an error and
>> returns
>> > InvalidAttnum which won't return any valid identity sequence, and thus
>> > return a NIL sequence list which is handled in that function already.
>> Using
>> > these two functions will avoid the clutter as well as segfault. If
>> that's
>> > acceptable, I will provide a patch.
>>
>> Yeah, you could do that with these two routines as well.  The result
>> would be the same in terms of runtime validity checks.
>>
>
> PFA patch using those two routines.
>
The function *get_attname* palloc the result name (pstrdup).
Isn't it necessary to free the memory here (pfree)?

best regards,
Ranier Vilela

Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-23 Thread Ranier Vilela

Em qui., 23 de mai. de 2024 às 06:27, Ashutosh Bapat <
ashutosh.bapat@gmail.com> escreveu:

>
>
> On Thu, May 23, 2024 at 5:52 AM Michael Paquier 
> wrote:
>
>> On Wed, May 22, 2024 at 03:28:48PM -0300, Ranier Vilela wrote:
>> > 1. Another concern is the function *get_partition_ancestors*,
>> > which may return NIL, which may affect *llast_oid*, which does not
>> handle
>> > NIL entries.
>>
>> Hm?  We already know in the code path that the relation we are dealing
>> with when calling get_partition_ancestors() *is* a partition thanks to
>> the check on relispartition, no?  In this case, calling
>> get_partition_ancestors() is valid and there should be a top-most
>> parent in any case all the time.  So I don't get the point of checking
>> get_partition_ancestors() for NIL-ness just for the sake of assuming
>> that it would be possible.
>>
>
> +1.
>
>
>>
>> > 2. Is checking *relispartition* enough?
>> > There a function *check_rel_can_be_partition*
>> > (src/backend/utils/adt/partitionfuncs.c),
>> > which performs a much more robust check, would it be worth using it?
>> >
>> > With the v2 attached, 1 is handled, but, in this case,
>> > will it be the most correct?
>>
>> Saying that, your point about the result of SearchSysCacheAttName not
>> checked if it is a valid tuple is right.  We paint errors in these
>> cases even if they should not happen as that's useful when it comes to
>> debugging, at least.
>>
>
> I think an Assert would do instead of whole ereport().
>
IMO, Assert there is no better solution here.


> The callers have already resolved attribute name to attribute number.
> Hence the attribute *should* exist in both partition as well as topmost
> partitioned table.
>
>relid = llast_oid(ancestors);
> +
>   ptup = SearchSysCacheAttName(relid, attname);
> + if (!HeapTupleIsValid(ptup))
> + ereport(ERROR,
> + (errcode(ERRCODE_UNDEFINED_COLUMN),
> + errmsg("column \"%s\" of relation \"%s\" does not exist",
> + attname, RelationGetRelationName(rel;
>
> We changed the relid from OID of partition to that of topmost partitioned
> table but didn't change rel; which still points to partition relation. We
> have to invoke relation_open() with new relid, in order to use rel in the
> error message. I don't think all that is worth it, unless we find a
> scenario when SearchSysCacheAttName() returns NULL.
>
All calls to functions like SearchSysCacheAttName, in the whole codebase,
checks if returns are valid.
It must be for a very strong reason, such a style.

So, v3, implements it this way.

best regards,
Ranier Vilela


v3-fix-catalog-cache-search-check.patch
Description: Binary data

Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-23 Thread Ranier Vilela

Hi Micheal,

Em qua., 22 de mai. de 2024 às 21:21, Michael Paquier 
escreveu:

> On Wed, May 22, 2024 at 03:28:48PM -0300, Ranier Vilela wrote:
> > 1. Another concern is the function *get_partition_ancestors*,
> > which may return NIL, which may affect *llast_oid*, which does not handle
> > NIL entries.
>
> Hm?  We already know in the code path that the relation we are dealing
> with when calling get_partition_ancestors() *is* a partition thanks to
> the check on relispartition, no?  In this case, calling
> get_partition_ancestors() is valid and there should be a top-most
> parent in any case all the time.  So I don't get the point of checking
> get_partition_ancestors() for NIL-ness just for the sake of assuming
> that it would be possible.
>
I don't have strong feelings about this.
But analyzing the function, *pg_partition_root*
(src/backend/utils/adt/partitionfuncs.c),
we see that checking whether it is a partition is done by
check_rel_can_be_partition.
And it doesn't trust get_partition_ancestors, checking
if the return is NIL.

>
> > 2. Is checking *relispartition* enough?
> > There a function *check_rel_can_be_partition*
> > (src/backend/utils/adt/partitionfuncs.c),
> > which performs a much more robust check, would it be worth using it?
> >
> > With the v2 attached, 1 is handled, but, in this case,
> > will it be the most correct?
>
> Saying that, your point about the result of SearchSysCacheAttName not
> checked if it is a valid tuple is right.  We paint errors in these
> cases even if they should not happen as that's useful when it comes to
> debugging, at least.
>
Thanks.

best regards,
Ranier Vilela

> --
> Michael
>

Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Ranier Vilela

Em qua., 22 de mai. de 2024 às 13:09, Ranier Vilela 
escreveu:

> Em qua., 22 de mai. de 2024 às 11:44, Ranier Vilela 
> escreveu:
>
>> Hi.
>>
>> Per Coverity.
>>
>> 2. returned_null: SearchSysCacheAttName returns NULL (checked 20 out of
>> 21 times).
>> 3. var_assigned: Assigning: ptup = NULL return value from
>> SearchSysCacheAttName.
>>  964ptup = SearchSysCacheAttName(relid, attname);
>> CID 1545986: (#1 of 1): Dereference null return value (NULL_RETURNS)
>> 4. dereference: Dereferencing ptup, which is known to be NULL.
>>
>> The functions SearchSysCacheAttNum and SearchSysCacheAttName,
>> need to have the result checked.
>>
>> The commit 5091995
>> <https://github.com/postgres/postgres/commit/509199587df73f06eda898ae13284292f4ae573a>,
>> left an oversight.
>>
>> Fixed by the patch attached, a change of style, unfortunately, was
>> necessary.
>>
> v1 Attached, fix wrong column variable name in error report.
>
1. Another concern is the function *get_partition_ancestors*,
which may return NIL, which may affect *llast_oid*, which does not handle
NIL entries.

2. Is checking *relispartition* enough?
There a function *check_rel_can_be_partition*
(src/backend/utils/adt/partitionfuncs.c),
which performs a much more robust check, would it be worth using it?

With the v2 attached, 1 is handled, but, in this case,
will it be the most correct?

best regards,
Ranier Vilela


v2-fix-catalog-cache-search-check.patch
Description: Binary data

Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Ranier Vilela

Em qua., 22 de mai. de 2024 às 11:44, Ranier Vilela 
escreveu:

> Hi.
>
> Per Coverity.
>
> 2. returned_null: SearchSysCacheAttName returns NULL (checked 20 out of
> 21 times).
> 3. var_assigned: Assigning: ptup = NULL return value from
> SearchSysCacheAttName.
>  964ptup = SearchSysCacheAttName(relid, attname);
> CID 1545986: (#1 of 1): Dereference null return value (NULL_RETURNS)
> 4. dereference: Dereferencing ptup, which is known to be NULL.
>
> The functions SearchSysCacheAttNum and SearchSysCacheAttName,
> need to have the result checked.
>
> The commit 5091995
> <https://github.com/postgres/postgres/commit/509199587df73f06eda898ae13284292f4ae573a>,
> left an oversight.
>
> Fixed by the patch attached, a change of style, unfortunately, was
> necessary.
>
v1 Attached, fix wrong column variable name in error report.

best regards,
Ranier Vilela


v1-fix-catalog-cache-seach-check.patch
Description: Binary data

Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Ranier Vilela

Hi.

Per Coverity.

2. returned_null: SearchSysCacheAttName returns NULL (checked 20 out of 21
times).
3. var_assigned: Assigning: ptup = NULL return value from
SearchSysCacheAttName.
 964ptup = SearchSysCacheAttName(relid, attname);
CID 1545986: (#1 of 1): Dereference null return value (NULL_RETURNS)
4. dereference: Dereferencing ptup, which is known to be NULL.

The functions SearchSysCacheAttNum and SearchSysCacheAttName,
need to have the result checked.

The commit 5091995
<https://github.com/postgres/postgres/commit/509199587df73f06eda898ae13284292f4ae573a>,
left an oversight.

Fixed by the patch attached, a change of style, unfortunately, was
necessary.

best regards,
Ranier Vilela


fix-catalog-cache-search-check.patch
Description: Binary data

Re: Convert node test compile-time settings into run-time parameters

2024-05-21 Thread Ranier Vilela

Em ter., 21 de mai. de 2024 às 09:25, Peter Eisentraut 
escreveu:

> On 20.05.24 15:59, Tom Lane wrote:
> > Peter Eisentraut  writes:
> >> This patch converts the compile-time settings
> >>   COPY_PARSE_PLAN_TREES
> >>   WRITE_READ_PARSE_PLAN_TREES
> >>   RAW_EXPRESSION_COVERAGE_TEST
> >
> >> into run-time parameters
> >
> >>   debug_copy_parse_plan_trees
> >>   debug_write_read_parse_plan_trees
> >>   debug_raw_expression_coverage_test
> >
> > I'm kind of down on this.  It seems like forcing a bunch of
> > useless-in-production debug support into the standard build.
> > What of this would be of any use to any non-developer?
>
> We have a bunch of other debug_* settings that are available in
> production builds, such as
>
> debug_print_parse
> debug_print_rewritten
> debug_print_plan
> debug_pretty_print
> debug_discard_caches
> debug_io_direct
> debug_parallel_query
> debug_logical_replication_streaming
>
If some of this is useful for non-developer users,
it shouldn't be called debug, or in this category.


> Maybe we could hide all of them behind some #ifdef DEBUG_OPTIONS, but in
> any case, I don't think the ones being proposed here are substantially
> different from those existing ones that they would require a separate
> treatment.
>
> My goal is to make these facilities easier to use, avoiding hand-editing
> pg_config_manual.h and having to recompile.
>
Although there are some developer users.
I believe that anything that is not useful for common users and is not used
for production
should not be compiled at runtime.

best regards,
Ranier Vilela

Re: CREATE DATABASE with filesystem cloning

2024-05-21 Thread Ranier Vilela

Em ter., 21 de mai. de 2024 às 05:18, Nazir Bilal Yavuz 
escreveu:

> Hi,
>
> On Thu, 16 May 2024 at 17:47, Robert Haas  wrote:
> >
> > On Thu, May 16, 2024 at 9:43 AM Nazir Bilal Yavuz 
> wrote:
> > > Actually, the documentation for the file_copy_method was mentioning
> > > the things it controls; I converted it to an itemized list now. Also,
> > > changed the comment to: 'Further uses of this function requires
> > > updates to the list that GUC controls in its documentation.'. v7 is
> > > attached.
> >
> > I think the comments need some wordsmithing.
>
> I changed it to 'Uses of this function must be documented in the list
> of places affected by this GUC.', I am open to any suggestions.
>
> > I don't see why this parameter should be PGC_POSTMASTER.
>
> I changed it to 'PGC_USERSET' now. My initial point was the database
> or tablespace to be copied with the same method. I thought copying
> some portion of the database with the copy and rest with the clone
> could cause potential problems. After a bit of searching, I could not
> find any problems related to that.
>
> v8 is attached.
>
Hi,
I did some research on the subject and despite being an improvement,
I believe that some terminologies should be changed in this patch.
Although the new function is called *clone_file*, I'm not sure if it really
is "clone".
Why MacOS added an API called clonefile [1] and Linux exists
another called FICLONE.[2]
So perhaps it should be treated here as a copy and not a clone?
Leaving it open, is the possibility of implementing a true clone api?
Thoughts?

best regards,
Ranier Vilela

[1] clonefile <https://www.unix.com/man-page/mojave/2/clonefile/>
[2] ioctl_ficlonerange
<https://www.unix.com/man-page/linux/2/ioctl_ficlonerange/>


>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Re: Convert node test compile-time settings into run-time parameters

2024-05-20 Thread Ranier Vilela

Em seg., 20 de mai. de 2024 às 04:28, Peter Eisentraut 
escreveu:

> This patch converts the compile-time settings
>
>  COPY_PARSE_PLAN_TREES
>  WRITE_READ_PARSE_PLAN_TREES
>  RAW_EXPRESSION_COVERAGE_TEST
>
> into run-time parameters
>
>  debug_copy_parse_plan_trees
>  debug_write_read_parse_plan_trees
>  debug_raw_expression_coverage_test
>
> They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS.
>
> The effect is the same, but now you don't need to recompile in order to
> use these checks.
>
> The compile-time symbols are kept for build farm compatibility, but they
> now just determine the default value of the run-time settings.
>
> Possible concerns:
>
> - Performance?  Looking for example at pg_parse_query() and its
> siblings, they also check for other debugging settings like
> log_parser_stats in the main code path, so it doesn't seem to be a concern.
>
> - Access control?  I have these settings as PGC_USERSET for now. Maybe
> they should be PGC_SUSET?
>
> Another thought:  Do we really need three separate settings?
>
What is the use for production use?

best regards,
Ranier Vilela

Re: Sort functions with specialized comparators

2024-05-18 Thread Ranier Vilela

Em sáb., 18 de mai. de 2024 às 15:52, Andrey M. Borodin <
x4...@yandex-team.ru> escreveu:

> Hi!
>
> In a thread about sorting comparators[0] Andres noted that we have
> infrastructure to help compiler optimize sorting. PFA attached PoC
> implementation. I've checked that it indeed works on the benchmark from
> that thread.
>
> postgres=# CREATE TABLE arrays_to_sort AS
>SELECT array_shuffle(a) arr
>FROM
>(SELECT ARRAY(SELECT generate_series(1, 100)) a),
>generate_series(1, 10);
>
> postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
> Time: 990.199 ms
> postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
> Time: 696.156 ms
>
> The benefit seems to be on the order of magnitude with 30% speedup.
>
> There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid
> etc. But this sorting routines never show up in perf top or something like
> that.
>
> Seems like in most cases we do not spend much time in sorting. But
> specialization does not cost us much too, only some CPU cycles of a
> compiler. I think we can further improve speedup by converting inline
> comparator to value extractor: more compilers will see what is actually
> going on. But I have no proofs for this reasoning.
>
> What do you think?
>
Makes sense.

Regarding the patch.
You could change the style to:

+sort_int32_asc_cmp(const int32 *a, const int32 *b)
+sort_int32_desc_cmp(const int32 *a, const int32 *b)

We must use const in all parameters that can be const.

best regards,
Ranier Vilela

Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-05-14 Thread Ranier Vilela

Em ter., 14 de mai. de 2024 às 07:21, Daniel Gustafsson 
escreveu:

> > On 14 May 2024, at 08:03, Michael Paquier  wrote:
> >
> > On Mon, May 13, 2024 at 08:06:57PM +0200, Daniel Gustafsson wrote:
> >>> Any chance we'll have these fixes in v17?
> >>
> >> Nice timing, I was actually rebasing them today to get them committed.
> >
> > Looks sensible seen from here, as these paths could use a LOG or rely
> > on a memory context permanent to the backend causing junk to be
> > accumulated.  It's not that much, still that would accumulate.
>
> Pushed.
>
Thanks Daniel.

best regards,
Ranier Vilela

Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-05-13 Thread Ranier Vilela

Em qua., 10 de abr. de 2024 às 15:33, Daniel Gustafsson 
escreveu:

> On 10 Apr 2024, at 20:31, Ranier Vilela  wrote:
>
> Em ter., 2 de abr. de 2024 às 15:31, Daniel Gustafsson 
> escreveu:
>
>> > On 2 Apr 2024, at 20:13, Ranier Vilela  wrote:
>>
>> > Fix by freeing the pointer, like pclose_check (src/common/exec.c)
>> similar case.
>>
>> Off the cuff, seems reasonable when loglevel is LOG.
>>
>
> Per Coverity.
>
> Another case of resource leak, when loglevel is LOG.
> In the function shell_archive_file (src/backend/archive/shell_archive.c)
> The pointer *xlogarchcmd*  is not freed.
>
>
> Thanks, I'll have a look.  I've left this for post-freeze on purpose to not
> cause unnecessary rebasing.  Will take a look over the next few days unless
> beaten to it.
>
Any chance we'll have these fixes in v17?

best regards,
Ranier Vilela

Re: Fix out-of-bounds in the function GetCommandTagName

2024-05-13 Thread Ranier Vilela

Em seg., 13 de mai. de 2024 às 14:38, Tom Lane  escreveu:

> David Rowley  writes:
> > I've added a CF entry under your name for this:
> > https://commitfest.postgresql.org/48/4927/
>
> > If it was code new to PG17 I'd be inclined to go ahead with it now,
> > but it does not seem to align with making the release mode stable.
> > I'd bet others will feel differently about that.  Delaying seems a
> > better default choice at least.
>
> The security team's Coverity instance has started to show this
> complaint now too.  So I'm going to go ahead and push this change
> in HEAD.  It's probably unwise to change it in stable branches,
> since there's at least a small chance some external code is using
> COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does.
> But we aren't anywhere near declaring v17's API stable, so
> I'd rather fix the issue than dismiss it in HEAD.
>
Thanks for the commit, Tom.

best regards,
Ranier Vilela

Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Ranier Vilela

Em qua., 8 de mai. de 2024 às 10:06, Nazir Bilal Yavuz 
escreveu:

> Hi,
>
> On Wed, 8 May 2024 at 15:23, Ranier Vilela  wrote:
> >
> > Em qua., 8 de mai. de 2024 às 08:42, Nazir Bilal Yavuz <
> byavu...@gmail.com> escreveu:
> >>
> >> Hi,
> >>
> >> On Wed, 8 May 2024 at 14:16, Ranier Vilela  wrote:
> >> >
> >> >
> >> > Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz <
> byavu...@gmail.com> escreveu:
> >> >>
> >> >> Hi Ranier,
> >> >>
> >> >> Thanks for looking into this!
> >> >>
> >> >> I am not sure why but your reply does not show up in the thread, so I
> >> >> copied your reply and answered it in the thread for visibility.
> >> >>
> >> >> On Tue, 7 May 2024 at 16:28, Ranier Vilela 
> wrote:
> >> >> >
> >> >> > I know it's coming from copy-and-paste, but
> >> >> > I believe the flags could be:
> >> >> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> >> >> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >> >> >
> >> >> > The flags:
> >> >> > O_WRONLY | O_TRUNC
> >> >> >
> >> >> > Allow the OS to make some optimizations, if you deem it possible.
> >> >> >
> >> >> > The flag O_RDWR indicates that the file can be read, which is not
> true in this case.
> >> >> > The destination file will just be written.
> >> >>
> >> >> You may be right about the O_WRONLY flag but I am not sure about the
> >> >> O_TRUNC flag.
> >> >>
> >> >> O_TRUNC from the linux man page [1]: If the file already exists and
> is
> >> >> a regular file and the access mode allows writing (i.e., is O_RDWR or
> >> >> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO
> or
> >> >> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> >> >> effect of O_TRUNC is unspecified.
> >> >
> >> > O_TRUNC is usually used in conjunction with O_WRONLY.
> >> > See the example at:
> >> > open.html
> >> >
> >> > O_TRUNC signals the OS to forget the current contents of the file, if
> it happens to exist.
> >> > In other words, there will be no seeks, only and exclusively writes.
> >>
> >> You are right; the O_TRUNC flag truncates the file, if it happens to
> >> exist. But it should not exist in the first place because the code at
> >> [2] should check this before the OpenTransientFile() call. Even if we
> >> assume somehow the check at [2] does not work, I do not think the
> >> correct operation is to truncate the contents of the existing file.
> >
> > I don't know if there is a communication problem here.
> > But what I'm trying to suggest refers to the destination file,
> > which doesn't matter if it exists or not?
>
> I do not think there is a communication problem. Actually it matters
> because the destination file should not exist, there is a code [2]
> which already checks and confirms that it does not exist.
>
I got it.


>
> >
> > If the destination file does not exist, O_TRUNC is ignored.
> > If the destination file exists, O_TRUNC truncates the current contents
> of the file.
> > I don't see why you think it's a problem to truncate the current content
> if the destination file exists.
> > Isn't he going to be replaced anyway?
>
> 'If the destination file does not exist' means the code at [2] works
> well and we do not need the O_TRUNC flag.
>
True, the O_TRUNC is ignored in this case.


>
> 'If the destination file already exists' means the code at [2] is
> broken somehow and there is a high chance that we could truncate
> something that we do not want to. For example, there is a foo db and
> we want to create bar db, Postgres chose the foo db's location as the
> destination of the bar db (which should not happen but let's assume
> somehow checks at [2] failed), then we could wrongly truncate the foo
> db's contents.
>
Of course, truncating the wrong file would be pretty bad.


>
> Hence, if Postgres works successfully I think the O_TRUNC flag is
> unnecessary but if Postgres does not work successfully, the O_TRUNC
> flag could cause harm.
>
The disaster will happen anyway, but of course we can help in some way.
Even without truncating, the wrong file will be destroyed anyway.


> >
> > Unless we want to preserve the current content (destination file), in
> case the copy/clone fails?
>
> Like I said above, Postgres should not choose the existing file as the
> destination file.
>
> Also, we have O_CREAT | O_EXCL flags together, from the link [3] you
> shared before: If O_CREAT and O_EXCL are set, open() shall fail if the
> file exists. So, overwriting the already existing file is already
> prevented.
>
That said, I agree that not using O_TRUNC helps in some way.

best regards,
Ranier Vilela

Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Ranier Vilela

Em qua., 8 de mai. de 2024 às 08:42, Nazir Bilal Yavuz 
escreveu:

> Hi,
>
> On Wed, 8 May 2024 at 14:16, Ranier Vilela  wrote:
> >
> >
> > Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz <
> byavu...@gmail.com> escreveu:
> >>
> >> Hi Ranier,
> >>
> >> Thanks for looking into this!
> >>
> >> I am not sure why but your reply does not show up in the thread, so I
> >> copied your reply and answered it in the thread for visibility.
> >>
> >> On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
> >> >
> >> > I know it's coming from copy-and-paste, but
> >> > I believe the flags could be:
> >> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> >> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >> >
> >> > The flags:
> >> > O_WRONLY | O_TRUNC
> >> >
> >> > Allow the OS to make some optimizations, if you deem it possible.
> >> >
> >> > The flag O_RDWR indicates that the file can be read, which is not
> true in this case.
> >> > The destination file will just be written.
> >>
> >> You may be right about the O_WRONLY flag but I am not sure about the
> >> O_TRUNC flag.
> >>
> >> O_TRUNC from the linux man page [1]: If the file already exists and is
> >> a regular file and the access mode allows writing (i.e., is O_RDWR or
> >> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
> >> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> >> effect of O_TRUNC is unspecified.
> >
> > O_TRUNC is usually used in conjunction with O_WRONLY.
> > See the example at:
> > open.html
> >
> > O_TRUNC signals the OS to forget the current contents of the file, if it
> happens to exist.
> > In other words, there will be no seeks, only and exclusively writes.
>
> You are right; the O_TRUNC flag truncates the file, if it happens to
> exist. But it should not exist in the first place because the code at
> [2] should check this before the OpenTransientFile() call. Even if we
> assume somehow the check at [2] does not work, I do not think the
> correct operation is to truncate the contents of the existing file.
>
I don't know if there is a communication problem here.
But what I'm trying to suggest refers to the destination file,
which doesn't matter if it exists or not?

If the destination file does not exist, O_TRUNC is ignored.
If the destination file exists, O_TRUNC truncates the current contents of
the file.
I don't see why you think it's a problem to truncate the current content if
the destination file exists.
Isn't he going to be replaced anyway?

Unless we want to preserve the current content (destination file), in case
the copy/clone fails?

best regards,
Ranier Vilela

Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Ranier Vilela

Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz 
escreveu:

> Hi Ranier,
>
> Thanks for looking into this!
>
> I am not sure why but your reply does not show up in the thread, so I
> copied your reply and answered it in the thread for visibility.
>
> On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
> >
> > I know it's coming from copy-and-paste, but
> > I believe the flags could be:
> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >
> > The flags:
> > O_WRONLY | O_TRUNC
> >
> > Allow the OS to make some optimizations, if you deem it possible.
> >
> > The flag O_RDWR indicates that the file can be read, which is not true
> in this case.
> > The destination file will just be written.
>
> You may be right about the O_WRONLY flag but I am not sure about the
> O_TRUNC flag.
>
> O_TRUNC from the linux man page [1]: If the file already exists and is
> a regular file and the access mode allows writing (i.e., is O_RDWR or
> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> effect of O_TRUNC is unspecified.
>
O_TRUNC is usually used in conjunction with O_WRONLY.
See the example at:
open.html
<https://pubs.opengroup.org/onlinepubs/009696699/functions/open.html>

O_TRUNC signals the OS to forget the current contents of the file, if it
happens to exist.
In other words, there will be no seeks, only and exclusively writes.


> But it should be already checked if the destination directory already
> exists in dbcommands.c file in createdb() function [2], so this should
> not happen.
>
I'm not sure what you're referring to here.

best regards,
Ranier Vilela

Re: CREATE DATABASE with filesystem cloning

2024-05-07 Thread Ranier Vilela

Hi,

Nazir Bilal Yavuz  wrote:

>Any kind of feedback would be appreciated.

I know it's coming from copy-and-paste, but
I believe the flags could be:
- dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+ dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC | O_EXCL |
PG_BINARY);

The flags:
O_WRONLY | O_TRUNC

Allow the OS to make some optimizations, if you deem it possible.

The flag O_RDWR indicates that the file can be read, which is not true in
this case.
The destination file will just be written.

best regards,
Ranier Vilela

Re: Direct SSL connection and ALPN loose ends

2024-04-29 Thread Ranier Vilela

Em seg., 29 de abr. de 2024 às 15:36, Heikki Linnakangas 
escreveu:

> On 29/04/2024 21:06, Ranier Vilela wrote:
> > Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas
> > mailto:hlinn...@iki.fi>> escreveu:
> >
> > On 29/04/2024 20:10, Ranier Vilela wrote:
> >  > Hi,
> >  >
> >  > With TLS 1.3 and others there is possibly a security flaw using
> > ALPN [1].
> >  >
> >  > It seems to me that the ALPN protocol can be bypassed if the
> > client does
> >  > not correctly inform the ClientHello header.
> >  >
> >  > So, the suggestion is to check the ClientHello header in the
> > server and
> >  > terminate the TLS handshake early.
> >
> > Sounds to me like it's working as designed. ALPN in general is
> > optional;
> > if the client doesn't request it, then you proceed without it. We do
> > require ALPN for direct SSL connections though. We can, because
> direct
> > SSL connections is a new feature in Postgres. But we cannot require
> it
> > for the connections negotiated with SSLRequest, or we break
> > compatibility with old clients that don't use ALPN.
> >
> > Ok.
> > But what if I have a server configured for TLS 1.3 and that requires
> > ALPN to allow access?
> > What about a client configured without ALPN requiring connection?
>
> Sorry, I don't understand the questions. What about them?
>
Sorry, I'll try to be clearer.
The way it is designed, can we impose TLS 1.3 and ALPN to allow access to a
public server?

And if on the other side we have a client, configured without ALPN,
when requesting access, the server will refuse?

best regards,
Ranier Vilela

Re: Direct SSL connection and ALPN loose ends

2024-04-29 Thread Ranier Vilela

Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas 
escreveu:

> On 29/04/2024 20:10, Ranier Vilela wrote:
> > Hi,
> >
> > With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
> >
> > It seems to me that the ALPN protocol can be bypassed if the client does
> > not correctly inform the ClientHello header.
> >
> > So, the suggestion is to check the ClientHello header in the server and
> > terminate the TLS handshake early.
>
> Sounds to me like it's working as designed. ALPN in general is optional;
> if the client doesn't request it, then you proceed without it. We do
> require ALPN for direct SSL connections though. We can, because direct
> SSL connections is a new feature in Postgres. But we cannot require it
> for the connections negotiated with SSLRequest, or we break
> compatibility with old clients that don't use ALPN.
>
Ok.
But what if I have a server configured for TLS 1.3 and that requires ALPN
to allow access?
What about a client configured without ALPN requiring connection?


>
> There is a check in direct SSL mode that ALPN was used
> (ProcessSSLStartup in backend_startup.c):
>
> > if (!port->alpn_used)
> > {
> > ereport(COMMERROR,
> > (errcode(ERRCODE_PROTOCOL_VIOLATION),
> >  errmsg("received direct SSL connection
> request without ALPN protocol negotiation extension")));
> > goto reject;
> > }
>
> That happens immediately after the SSL connection has been established.
>
> Hmm. I guess it would be better to abort the connection earlier, without
> completing the TLS handshake. Otherwise the client might send the first
> message in wrong protocol to the PostgreSQL server. That's not a
> security issue for the PostgreSQL server: the server disconnects without
> reading the message. And I don't see any way for an ALPACA attack when
> the server ignores the client's message. Nevertheless, from the point of
> view of keeping the attack surface as small as possible, aborting
> earlier seems better.
>
So the ClientHello callback is the correct way to determine the end.

best regards,
Ranier Vilela

re: Direct SSL connection and ALPN loose ends

2024-04-29 Thread Ranier Vilela

Hi,

With TLS 1.3 and others there is possibly a security flaw using ALPN [1].

It seems to me that the ALPN protocol can be bypassed if the client does
not correctly inform the ClientHello header.

So, the suggestion is to check the ClientHello header in the server and
terminate the TLS handshake early.

Patch attached.

best regards,
Ranier Vilela

[1] terminate-tlsv1-3-handshake-if-alpn-is-missing
<https://stackoverflow.com/questions/77271498/terminate-tlsv1-3-handshake-if-alpn-is-missing>


terminate-tls-handshake-if-no-alpn.patch
Description: Binary data

Possible data race on Windows (src/bin/pg_dump/parallel.c)

2024-04-29 Thread Ranier Vilela

Hi,

Per Coverity.

CID 1542943: (#1 of 1): Data race condition (MISSING_LOCK)
3. missing_lock: Accessing slot->AH without holding lock signal_info_lock.
Elsewhere, ParallelSlot.AH is written to with signal_info_lock held 1 out
of 1 times (1 of these accesses strongly imply that it is necessary).

The function DisconnectDatabase effectively writes the ParallelSlot.AH.
So the call in the function archive_close_connection:

if (slot->AH)
DisconnectDatabase(&(slot->AH->public));

It should also be protected on Windows, correct?

Patch attached.

best regards,
Ranier Vilela


fix-possible-data-race-windows-parallel.patch
Description: Binary data

Re: plenty code is confused about function level static

2024-04-18 Thread Ranier Vilela

Em qui., 18 de abr. de 2024 às 14:16, Andres Freund 
escreveu:

> Hi,
>
> On 2024-04-18 09:07:43 -0300, Ranier Vilela wrote:
> >  On 18/04/2024 00:39, Andres Freund wrote:
> > >There are lots of places that could benefit from adding 'static
> > >const'.
> >
> > I found a few more places.
>
> Good catches.
>
>
> > Patch 004
> >
> > The opposite would also help, adding static.
> > In these places, I believe it is safe to add static,
> > allowing the compiler to transform into read-only, definitively.
>
> I don't think this would even compile?

Compile, at least with msvc 2022.
Pass ninja test.


E.g. LockTagTypeNames, pg_wchar_table
> are declared in a header and used across translation units.
>
Sad.
There should be a way to export a read-only (static const) variable.
Better remove these.

v1-0005 attached.

best regards,
Ranier Vilela


v1-0005-const-convert-static-const.patch
Description: Binary data

Re: plenty code is confused about function level static

2024-04-18 Thread Ranier Vilela

 On 18/04/2024 00:39, Andres Freund wrote:

>We have a fair amount of code that uses non-constant function level static
>variables for read-only data. Which makes little sense - it prevents the
>compiler from understanding

>a) that the data is read only and can thus be put into a segment that's
shared
>between all invocations of the program
>b) the data will be the same on every invocation, and thus from optimizing
>based on that.

>The most common example of this is that all our binaries use
>static struct option long_options[] = { ... };
>which prevents long_options from being put into read-only memory.

+1 static const allows the compiler to make additional optimizations.

>There are lots of places that could benefit from adding 'static
>const'.

I found a few more places.

Patch 004

The opposite would also help, adding static.
In these places, I believe it is safe to add static,
allowing the compiler to transform into read-only, definitively.

Patch 005

best regards,

Ranier Vilela


0004-static-const-convert-plpython.patch
Description: Binary data


0005-const-convert-static-const.patch
Description: Binary data

Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-15 Thread Ranier Vilela

Em dom., 14 de abr. de 2024 às 21:29, Michael Paquier 
escreveu:

> On Sat, Apr 13, 2024 at 10:40:35AM -0300, Ranier Vilela wrote:
> > This sounds a little confusing to me.
> > Is the project policy to *tolerate* dereferencing NULL pointers?
> > If this is the case, no problem, using Assert would serve the purpose of
> > protecting against these errors well.
>
> In most cases, it does not matter to have an assertion for a code path
> that's going to crash a few lines later.  The result is the same: the
> code will crash and inform about the failure.
>
> > rbtxn_get_toptxn already calls rbtxn_is_subtx,
> > which adds a little unnecessary overhead.
> > I made a v1 of your patch, modifying this, please ignore it if you don't
> > agree.
>
> c55040ccd017 has been added in v14~, so this is material for 18~ at
> this stage.  If you want to move on with these changes, I'd suggest to
> add a CF entry.
>
I think it's worth it, because it's a case of a possible bug, but very
unlikely,
and Heikki's suggestions improve the code.

best regards,
Ranier Vilela

Re: Fix out-of-bounds in the function GetCommandTagName

2024-04-15 Thread Ranier Vilela

Em dom., 14 de abr. de 2024 às 23:12, David Rowley 
escreveu:

> On Mon, 15 Apr 2024 at 12:12, Ranier Vilela  wrote:
> >
> > Em dom., 14 de abr. de 2024 às 20:38, David Rowley 
> escreveu:
> >> You seem to have forgotten to attach it, but my comments above were
> >> written with the assumption that the patch is what I've attached here.
> >
> > Yes, I actually forgot.
> >
> > +1 for your patch.
>
> I've added a CF entry under your name for this:
> https://commitfest.postgresql.org/48/4927/

Thank you.


>
>
> If it was code new to PG17 I'd be inclined to go ahead with it now,
> but it does not seem to align with making the release mode stable.t
> I'd bet others will feel differently about that.  Delaying seems a
> better default choice at least.
>
I agree. Although I initially thought it was a live bug, that's actually
not the case.
In fact, this is a refactoring.

best regards,
Ranier Vilela

Re: Fix out-of-bounds in the function GetCommandTagName

2024-04-14 Thread Ranier Vilela

Em dom., 14 de abr. de 2024 às 20:38, David Rowley 
escreveu:

> On Mon, 15 Apr 2024 at 11:17, Ranier Vilela  wrote:
> > Coverity has reported some out-of-bounds bugs
> > related to the GetCommandTagName function.
> >
> > The size of the array is defined by COMMAND_TAG_NEXTTAG enum,
> > whose value currently corresponds to 193.
> > Since enum items are evaluated starting at zero, by default.
>
> I think the change makes sense. I don't see any good reason to define
> COMMAND_TAG_NEXTTAG or force the compiler's hand when it comes to
> sizing that array.
>
> Clearly, Coverity does not understand that we'll never call any of
> those GetCommandTag* functions with COMMAND_TAG_NEXTTAG.
>
I think that Coverity understood it this way because when
including COMMAND_TAG_NEXTTAG, in the enum definition,
led to 193 items, and the last item in the array is currently 192.


> > Patch attached.
>
> You seem to have forgotten to attach it, but my comments above were
> written with the assumption that the patch is what I've attached here.
>
Yes, I actually forgot.

+1 for your patch.

best regards,
Ranier Vilela

Fix out-of-bounds in the function GetCommandTagName

2024-04-14 Thread Ranier Vilela

Hi,

Per Coverity.

Coverity has reported some out-of-bounds bugs
related to the GetCommandTagName function.

CID 1542964: (#1 of 1): Out-of-bounds access (OVERRUN)
7. overrun-call: Overrunning callee's array of size 193 by passing argument
commandtag (which evaluates to 193) in call to GetCommandTagName.[

It turns out that the root of the problem is found in the declaration of
the tag_behavior array, which is found in src/backend/tcop/cmdtag.c.

The size of the array is defined by COMMAND_TAG_NEXTTAG enum,
whose value currently corresponds to 193.
Since enum items are evaluated starting at zero, by default.

It turns out that the final size of the array, 193, limits the number of
items to 192, which excludes the last TAG
PG_CMDTAG(CMDTAG_VACUUM, "VACUUM", false, false, false)

Fixed leaving it up to the compiler to determine the final size of the
array.

Patch attached.

best regards,
Ranier Vilela

Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-13 Thread Ranier Vilela

Em sáb., 13 de abr. de 2024 às 04:16, Heikki Linnakangas 
escreveu:

> On 11/04/2024 16:37, Ranier Vilela wrote:
> > Em qui., 11 de abr. de 2024 às 09:54, Heikki Linnakangas
> > mailto:hlinn...@iki.fi>> escreveu:
> >
> > On 11/04/2024 15:03, Ranier Vilela wrote:
> >  > Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas
> >  > mailto:hlinn...@iki.fi> <mailto:hlinn...@iki.fi
> > <mailto:hlinn...@iki.fi>>> escreveu:
> >  >
> >  > On 10/04/2024 21:07, Ranier Vilela wrote:
> >  >  > Hi,
> >  >  >
> >  >  > Per Coverity.
> >  >  >
> >  >  > The function ReorderBufferTXNByXid,
> >  >  > can return NULL when the parameter *create* is false.
> >  >  >
> >  >  > In the functions ReorderBufferSetBaseSnapshot
> >  >  > and ReorderBufferXidHasBaseSnapshot,
> >  >  > the second call to ReorderBufferTXNByXid,
> >  >  > pass false to *create* argument.
> >  >  >
> >  >  > In the function ReorderBufferSetBaseSnapshot,
> >  >  > fixed passing true as argument to always return
> >  >  > a valid ReorderBufferTXN pointer.
> >  >  >
> >  >  > In the function ReorderBufferXidHasBaseSnapshot,
> >  >  > fixed by checking if the pointer is NULL.
> >  >
> >  > If it's a "known subxid", the top-level XID should already
> > have its
> >  > ReorderBufferTXN entry, so ReorderBufferTXN() should never
> > return NULL.
> >  >
> >  > There are several conditions to not return NULL,
> >  > I think trusting never is insecure.
> >
> > Well, you could make it an elog(ERROR, ..) instead. But the point is
> > that it should not happen, and if it does for some reason, that's
> very
> > suprpising and there is a bug somewhere. In that case, we should
> *not*
> > just blindly create it and proceed as if everything was OK.
> >
> > Thanks for the clarification.
> > I will then suggest improving robustness,
> > but avoiding hiding any possible errors that may occur.
>
> I don't much like adding extra runtime checks for "can't happen"
> scenarios either. Assertions would be more clear, but in this case the
> code would just segfault trying to dereference the NULL pointer, which
> is fine for a "can't happen" scenario.
>
This sounds a little confusing to me.
Is the project policy to *tolerate* dereferencing NULL pointers?
If this is the case, no problem, using Assert would serve the purpose of
protecting against these errors well.


> Looking closer, when we identify an XID as a subtransaction, we:
> - assign toplevel_xid,
> - set RBTXN_IS_SUBXACT, and
> - assign toptxn pointer.
>
> ISTM the 'toplevel_xid' and RBTXN_IS_SUBXACT are redundant.
> 'toplevel_xid' is only used in those two calls that do
> ReorderBufferTXNByXid(rb, txn->toplevel_xid,...), and you could replace
> those by following the 'toptxn' pointer directly. And RBTXN_IS_SUBXACT
> is redundant with (toptxn != NULL). So here's a patch to remove
> 'toplevel_xid' and RBTXN_IS_SUBXACT altogether.
>
+ if (rbtxn_is_subtxn(txn))
+ txn = rbtxn_get_toptxn(txn);

rbtxn_get_toptxn already calls rbtxn_is_subtx,
which adds a little unnecessary overhead.
I made a v1 of your patch, modifying this, please ignore it if you don't
agree.

best regards,
Ranier Vilela


v1-0001-Remove-redundant-field-and-flag-from-ReorderBufferTX.patch
Description: Binary data

Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-11 Thread Ranier Vilela

Em qui., 11 de abr. de 2024 às 09:54, Heikki Linnakangas 
escreveu:

> On 11/04/2024 15:03, Ranier Vilela wrote:
> > Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas
> > mailto:hlinn...@iki.fi>> escreveu:
> >
> > On 10/04/2024 21:07, Ranier Vilela wrote:
> >  > Hi,
> >  >
> >  > Per Coverity.
> >  >
> >  > The function ReorderBufferTXNByXid,
> >  > can return NULL when the parameter *create* is false.
> >  >
> >  > In the functions ReorderBufferSetBaseSnapshot
> >  > and ReorderBufferXidHasBaseSnapshot,
> >  > the second call to ReorderBufferTXNByXid,
> >  > pass false to *create* argument.
> >  >
> >  > In the function ReorderBufferSetBaseSnapshot,
> >  > fixed passing true as argument to always return
> >  > a valid ReorderBufferTXN pointer.
> >  >
> >  > In the function ReorderBufferXidHasBaseSnapshot,
> >  > fixed by checking if the pointer is NULL.
> >
> > If it's a "known subxid", the top-level XID should already have its
> > ReorderBufferTXN entry, so ReorderBufferTXN() should never return
> NULL.
> >
> > There are several conditions to not return NULL,
> > I think trusting never is insecure.
>
> Well, you could make it an elog(ERROR, ..) instead. But the point is
> that it should not happen, and if it does for some reason, that's very
> suprpising and there is a bug somewhere. In that case, we should *not*
> just blindly create it and proceed as if everything was OK.
>
Thanks for the clarification.
I will then suggest improving robustness,
but avoiding hiding any possible errors that may occur.

v1 patch attached.

best regards,
Ranier Vilela


v1-fix-possible-dereference-null-pointer-reorderbuffer.patch
Description: Binary data

Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-11 Thread Ranier Vilela

Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas 
escreveu:

> On 10/04/2024 21:07, Ranier Vilela wrote:
> > Hi,
> >
> > Per Coverity.
> >
> > The function ReorderBufferTXNByXid,
> > can return NULL when the parameter *create* is false.
> >
> > In the functions ReorderBufferSetBaseSnapshot
> > and ReorderBufferXidHasBaseSnapshot,
> > the second call to ReorderBufferTXNByXid,
> > pass false to *create* argument.
> >
> > In the function ReorderBufferSetBaseSnapshot,
> > fixed passing true as argument to always return
> > a valid ReorderBufferTXN pointer.
> >
> > In the function ReorderBufferXidHasBaseSnapshot,
> > fixed by checking if the pointer is NULL.
>
> If it's a "known subxid", the top-level XID should already have its
> ReorderBufferTXN entry, so ReorderBufferTXN() should never return NULL.
>
There are several conditions to not return NULL,
I think trusting never is insecure.

It's not surprising if Coverity doesn't understand that, but setting the
> 'create' flag doesn't seem like the right fix.

ReorderBufferSetBaseSnapshot always expects that *txn* exists.
If a second call fails, the only solution is to create a new one, no?


> If we add "Assert(txn !=
> NULL)", does that silence it?
>
I think no.
I always compile it as a release to send to Coverity.

best regards,
Ranier Vilela

Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-04-10 Thread Ranier Vilela

Em ter., 2 de abr. de 2024 às 15:31, Daniel Gustafsson 
escreveu:

> > On 2 Apr 2024, at 20:13, Ranier Vilela  wrote:
>
> > Fix by freeing the pointer, like pclose_check (src/common/exec.c)
> similar case.
>
> Off the cuff, seems reasonable when loglevel is LOG.
>

Per Coverity.

Another case of resource leak, when loglevel is LOG.
In the function shell_archive_file (src/backend/archive/shell_archive.c)
The pointer *xlogarchcmd*  is not freed.

best regards,
Ranier Vilela

fix-resource-leak-shell_archive.patch
Description: Binary data

Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-10 Thread Ranier Vilela

Hi,

Per Coverity.

The function ReorderBufferTXNByXid,
can return NULL when the parameter *create* is false.

In the functions ReorderBufferSetBaseSnapshot
and ReorderBufferXidHasBaseSnapshot,
the second call to ReorderBufferTXNByXid,
pass false to *create* argument.

In the function ReorderBufferSetBaseSnapshot,
fixed passing true as argument to always return
a valid ReorderBufferTXN pointer.

In the function ReorderBufferXidHasBaseSnapshot,
fixed by checking if the pointer is NULL.

best regards,
Ranier Vilela


fix-possible-dereference-null-pointer-reorderbuffer.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-04-09 Thread Ranier Vilela

Em seg., 8 de abr. de 2024 às 09:27, Jelte Fennema-Nio 
escreveu:

> On Sun, 7 Apr 2024 at 11:34, David Rowley  wrote:
> > That seems to require modifying the following function signatures:
> > secure_write(), be_tls_write(), be_gssapi_write().  That's not an area
> > I'm familiar with, however.
>
> Attached is a new patchset where 0003 does exactly that. The only
> place where we need to cast to non-const is for GSS, but that seems
> fine (commit message has more details).
>
+1.
Looks ok to me.
The GSS pointer *ptr, is already cast to char * where it is needed,
so the code is already correct.

best regards,
Ranier Vilela

Re: Flushing large data immediately in pqcomm

2024-04-08 Thread Ranier Vilela

Em seg., 8 de abr. de 2024 às 07:42, Jelte Fennema-Nio 
escreveu:

> On Sun, 7 Apr 2024 at 14:41, David Rowley  wrote:
> > Looking at the code in socket_putmessage_noblock(), I don't understand
> > why it's ok for PqSendBufferSize to be int but "required" must be
> > size_t.  There's a line that does "PqSendBufferSize = required;". It
> > kinda looks like they both should be size_t.  Am I missing something
> > that you've thought about?
>
>
> You and Ranier are totally right (I missed this assignment). Attached is
> v8.
>
+1
LGTM.

best regards,
Ranier Vilela

Re: Security lessons from liblzma

2024-04-07 Thread Ranier Vilela

Hi,

Sorry, without any connection with the technical part of the thread.
But I couldn't help but record this, and congratulate Andres Freund, for
the excellent work.
It's not every day that a big press, in Brazil or around the world,
publishes something related to technology people.

Today I came across a publication in Brazil's most widely circulated
newspaper, calling him a hero for preventing the attack.

programador-conserta-falha-no-linux-evita-ciberataque-e-vira-heroi-da-internet.shtml
<https://www1.folha.uol.com.br/tec/2024/04/programador-conserta-falha-no-linux-evita-ciberataque-e-vira-heroi-da-internet.shtml>

Congratulations Andres Freund, nice work.

best regards,
Ranier Vilela

Re: Flushing large data immediately in pqcomm

2024-04-07 Thread Ranier Vilela

Em sáb., 6 de abr. de 2024 às 22:39, Andres Freund 
escreveu:

> Hi,
>
> On 2024-04-07 00:45:31 +0200, Jelte Fennema-Nio wrote:
> > On Sat, 6 Apr 2024 at 22:21, Andres Freund  wrote:
> > > The small regression for small results is still kinda visible, I
> haven't yet
> > > tested the patch downthread.
> >
> > Thanks a lot for the faster test script, I'm also impatient. I still
> > saw the small regression with David his patch. Here's a v6 where I
> > think it is now gone. I added inline to internal_put_bytes too. I
> > think that helped especially because for two calls to
> > internal_put_bytes len is a constant (1 and 4) that is smaller than
> > PqSendBufferSize. So for those calls the compiler can now statically
> > eliminate the new codepath because "len >= PqSendBufferSize" is known
> > to be false at compile time.
>
> Nice.
>
>
> > Also I incorporated all of Ranier his comments.
>
> Changing the global vars to size_t seems mildly bogus to me. All it's
> achieving is to use slightly more memory. It also just seems unrelated to
> the
> change.
>
I don't agree with this thought.
Actually size_t uses 4 bytes of memory than int, right.
But mixing up int and size_t is a sure way to write non-portable code.
And the compilers will start showing messages such as " signed/unsigned
mismatch".

The global vars PqSendPointer and PqSendStart were changed in the v5 patch,
so for the sake of style and consistency, I understand that it is better
not to mix the types.

The compiler will promote PqSendBufferSize to size_t in all comparisons.

And finally the correct type to deal with char * variables is size_t.

Best regards,
Ranier Vilela


> Greetings,
>
> Andres Freund
>

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread Ranier Vilela

Hi,

On Fri, 5 Apr 2024 at 03:28, Melih Mutlu
 wrote:
>Right. It was a mistake, forgot to remove that. Fixed it in v5.

If you don't mind, I have some suggestions for patch v5.

1.  Shouldn't PqSendBufferSize be of type size_t?
There are several comparisons with other size_t variables.
static size_t PqSendBufferSize; /* Size send buffer */

I think this would prevent possible overflows.

2. If PqSendBufferSize is changed to size_t, in the function
socket_putmessage_noblock, the variable which name is *required*, should
be changed to size_t as well.

static void
socket_putmessage_noblock(char msgtype, const char *s, size_t len)
{
int res PG_USED_FOR_ASSERTS_ONLY;
size_t required;

3. In the internal_putbytes function, the *amout* variable could
have the scope safely reduced.

else
{
size_t amount;

amount = PqSendBufferSize - PqSendPointer;

4. In the function internal_flush_buffer, the variables named
*bufptr* and *bufend* could be const char * type, like:

static int
internal_flush_buffer(const char *s, size_t *start, size_t *end)
{
static int last_reported_send_errno = 0;

const char *bufptr = s + *start;
const char *bufend = s + *end;

best regards,
Ranier Vilela

Re: Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)

2024-04-04 Thread Ranier Vilela

Em qua., 3 de abr. de 2024 às 08:36, Ranier Vilela 
escreveu:

>
> Em ter., 2 de abr. de 2024 às 18:13, Tom Lane 
> escreveu:
>
>> Ranier Vilela  writes:
>> > While I working in [1], Coverity reported some errors:
>> > src/bin/pg_basebackup/pg_createsubscriber.c
>> > CID 1542690: (#1 of 2): Out-of-bounds access (OVERRUN)
>> > alloc_strlen: Allocating insufficient memory for the terminating null of
>> > the string. [Note: The source code implementation of the function has
>> been
>> > overridden by a builtin model.]
>> > CID 1542690: (#2 of 2): Out-of-bounds access (OVERRUN)
>> > alloc_strlen: Allocating insufficient memory for the terminating null of
>> > the string. [Note: The source code implementation of the function has
>> been
>> > overridden by a builtin model.]
>>
>> Yeah, we saw that in the community run too.  I'm tempted to call it
>> an AI hallucination.  The "Note" seems to mean that they're not
>> actually analyzing our code but some made-up substitute.
>>
> Yeah, the message is a little confusing.
> It seems to me that it is a replacement of the malloc function with its
> own, with some type of security cookie,
> like /GS (Buffer Security Check)
> <https://learn.microsoft.com/en-us/cpp/build/reference/gs-buffer-security-check?view=msvc-170>
>
>
>> > The source of errors is the function PQescapeInternal.
>> > The slow path has bugs when num_quotes or num_backslashes are greater
>> than
>> > zero.
>> > For each num_quotes or num_backslahes we need to allocate two more.
>>
>> Nonsense.  The quote or backslash is already counted in input_len,
>> so we need to add just one more.
>>
> Right, the quote or backslash is counted in input_len.
> In the test I did, the string had 10 quotes, so
> input_len had at least 10 bytes for quotes.
> But we write 20 quotes, in the slow path.
>
Sorry, some kind of brain damage.
I ran the test with a debugger, and step by step, the defect does not occur
in the section I indicated.
Only the exact bytes counted from quote_char and num_backslashes are
actually written.


>
> if (*s == quote_char || (!as_ident && *s == '\\'))
> *rp++ = *s;
> *rp++ = *s;
>
> |0| = quote_char
> |1| = quote_char
> |2| = quote_char
> |3| = quote_char
> |4| = quote_char
> |5| = quote_char
> |6| = quote_char
> |7| = quote_char
> |8| = quote_char
> |9| = quote_char
> |10| = quote_char <--memory overwrite
> |11| = quote_char
> |12| = quote_char
> |13| = quote_char
> |14| = quote_char
> |15| = quote_char
> |16| = quote_char
> |17| = quote_char
> |18| = quote_char
> |19| = quote_char
>
>
>
>> If there were anything wrong here, I'm quite sure our testing under
>> e.g. valgrind would have caught it years ago.  However, just to be
>> sure, I tried adding an Assert that the allocated space is filled
>> exactly, as attached.  It gets through check-world just fine.
>>
> It seems to me that only the fast path is being validated by the assert.
>
The assert works fine.

The only catch is Coverity will continue to report the error.
But in this case, the error does not exist and the warning is wrong.

I will remove the patch.

best regards,
Ranier Vilela

Re: Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)

2024-04-03 Thread Ranier Vilela

Em ter., 2 de abr. de 2024 às 18:13, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > While I working in [1], Coverity reported some errors:
> > src/bin/pg_basebackup/pg_createsubscriber.c
> > CID 1542690: (#1 of 2): Out-of-bounds access (OVERRUN)
> > alloc_strlen: Allocating insufficient memory for the terminating null of
> > the string. [Note: The source code implementation of the function has
> been
> > overridden by a builtin model.]
> > CID 1542690: (#2 of 2): Out-of-bounds access (OVERRUN)
> > alloc_strlen: Allocating insufficient memory for the terminating null of
> > the string. [Note: The source code implementation of the function has
> been
> > overridden by a builtin model.]
>
> Yeah, we saw that in the community run too.  I'm tempted to call it
> an AI hallucination.  The "Note" seems to mean that they're not
> actually analyzing our code but some made-up substitute.
>
Yeah, the message is a little confusing.
It seems to me that it is a replacement of the malloc function with its
own, with some type of security cookie,
like /GS (Buffer Security Check)
<https://learn.microsoft.com/en-us/cpp/build/reference/gs-buffer-security-check?view=msvc-170>


> > The source of errors is the function PQescapeInternal.
> > The slow path has bugs when num_quotes or num_backslashes are greater
> than
> > zero.
> > For each num_quotes or num_backslahes we need to allocate two more.
>
> Nonsense.  The quote or backslash is already counted in input_len,
> so we need to add just one more.
>
Right, the quote or backslash is counted in input_len.
In the test I did, the string had 10 quotes, so
input_len had at least 10 bytes for quotes.
But we write 20 quotes, in the slow path.

if (*s == quote_char || (!as_ident && *s == '\\'))
*rp++ = *s;
*rp++ = *s;

|0| = quote_char
|1| = quote_char
|2| = quote_char
|3| = quote_char
|4| = quote_char
|5| = quote_char
|6| = quote_char
|7| = quote_char
|8| = quote_char
|9| = quote_char
|10| = quote_char <--memory overwrite
|11| = quote_char
|12| = quote_char
|13| = quote_char
|14| = quote_char
|15| = quote_char
|16| = quote_char
|17| = quote_char
|18| = quote_char
|19| = quote_char



> If there were anything wrong here, I'm quite sure our testing under
> e.g. valgrind would have caught it years ago.  However, just to be
> sure, I tried adding an Assert that the allocated space is filled
> exactly, as attached.  It gets through check-world just fine.
>
It seems to me that only the fast path is being validated by the assert.

if (num_quotes == 0 && (num_backslashes == 0 || as_ident))
{
memcpy(rp, str, input_len);
rp += input_len;
}

best regards,
Ranier Vilela

Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-04-02 Thread Ranier Vilela

Hi,

Per Coverity.

Coverity reported a resource leak at the function
run_ssl_passphrase_command.
7. alloc_fn: Storage is returned from allocation function
wait_result_to_str.["show details"]

8. noescape: Assuming resource wait_result_to_str(pclose_rc) is not freed
or pointed-to as ellipsis argument to errdetail_internal.

CID 1533043: (#1 of 1): Resource leak (RESOURCE_LEAK)
9. leaked_storage: Failing to save or free storage allocated by
wait_result_to_str(pclose_rc) leaks it.

I think that Coverity is right.

Fix by freeing the pointer, like pclose_check (src/common/exec.c) similar
case.

Patch attached.

best regards,


fix-resource-leak-be-secure-common.patch
Description: Binary data

Re: Fix some resources leaks (src/bin/pg_basebackup/pg_createsubscriber.c)

2024-04-01 Thread Ranier Vilela

Em seg., 1 de abr. de 2024 às 14:52, Tom Lane  escreveu:

> "Euler Taveira"  writes:
> > On Wed, Mar 27, 2024, at 8:50 PM, Ranier Vilela wrote:
> >> Coverity has some reports in the new code
> >> pg_createsubscriber.c
> >> I think that Coverity is right.
>
> > It depends on your "right" definition. If your program execution is
> ephemeral
> > and the leak is just before exiting, do you think it's worth it?
>
> I agree with Ranier, actually.  The functions in question don't
> exit() themselves, but return control to a caller that might or
> might not choose to exit.  I don't think they should be assuming
> that an exit will happen promptly, even if that's true today.
>
> The community Coverity run found a few additional leaks of the same
> kind in that file.  I pushed a merged fix for all of them.
>
Thanks Tom, for the commit.

best regards,
Ranier Vilela

Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)

2024-03-30 Thread Ranier Vilela

Hi,

While I working in [1], Coverity reported some errors:

src/bin/pg_basebackup/pg_createsubscriber.c
CID 1542690: (#1 of 2): Out-of-bounds access (OVERRUN)
alloc_strlen: Allocating insufficient memory for the terminating null of
the string. [Note: The source code implementation of the function has been
overridden by a builtin model.]
CID 1542690: (#2 of 2): Out-of-bounds access (OVERRUN)
alloc_strlen: Allocating insufficient memory for the terminating null of
the string. [Note: The source code implementation of the function has been
overridden by a builtin model.]

I think that is right.

The source of errors is the function PQescapeInternal.
The slow path has bugs when num_quotes or num_backslashes are greater than
zero.
For each num_quotes or num_backslahes we need to allocate two more.

Code were out-of-bounds it happens:
for (s = str; s - str < input_len; ++s)
{
if (*s == quote_char || (!as_ident && *s == '\\'))
{
*rp++ = *s;
*rp++ = *s;
}

Patch attached.

Best regards,
Ranier Vilela

[1] Re: Fix some resources leaks
(src/bin/pg_basebackup/pg_createsubscriber.c)
<https://www.postgresql.org/message-id/CAEudQAqQHGrhmY3%2BrgdqJLM-76sozLm__0_NSJetuQHsa%2Bd41Q%40mail.gmail.com>


fix-out-of-bouds-libpqinternal.patch
Description: Binary data

Re: Fix some resources leaks (src/bin/pg_basebackup/pg_createsubscriber.c)

2024-03-28 Thread Ranier Vilela

Em qua., 27 de mar. de 2024 às 23:08, Euler Taveira 
escreveu:

> On Wed, Mar 27, 2024, at 8:50 PM, Ranier Vilela wrote:
>
> Coverity has some reports in the new code
> pg_createsubscriber.c
> I think that Coverity is right.
>
>
> It depends on your "right" definition.
>
I do not think so.

If your program execution is ephemeral
> and the leak is just before exiting, do you think it's worth it?
>
I think it's worth it.
Even an ephemeral execution can allocate resources, for example, and
mainly, in Windows,
and block these resources for other programs, and on a highly loaded server
with very few free resources,
releasing resources as quickly as possible makes a difference.


> 1.
> CID 1542682: (#1 of 1): Resource leak (RESOURCE_LEAK)
> leaked_storage: Variable buf going out of scope leaks the storage it
> points to.
>
>
> It will exit in the next instruction.
>
Yes, by exit() call function.


>
> 2.
> CID 1542704: (#1 of 1): Resource leak (RESOURCE_LEAK)
> leaked_storage: Variable conn going out of scope leaks the storage it
> points to.
>
>
> The connect_database function whose exit_on_error is false is used in 2
> routines:
>
Calling connect_database with false, per example:
conn = connect_database(dbinfo[i].pubconninfo, false);

If the connection status != CONNECTION_OK, the function returns NULL,
but does not free connection struct, memory and data.

In the loop with thousands of "number of specified databases",
It would quickly end up in problems, especially on Windows.


> * cleanup_objects_atexit: that's about to exit;
> * drop_primary_replication_slot: that will execute a few routines before
> exiting.
>
> 3.
> CID 1542691: (#1 of 1): Resource leak (RESOURCE_LEAK)
> leaked_storage: Variable str going out of scope leaks the storage it
> points to.
>
>
> It will exit in the next instruction.
>
Yes, by exit() call function.

About exit() function:

deallocating-memory-when-using-exit1-c
<https://stackoverflow.com/questions/7414051/deallocating-memory-when-using-exit1-c>
"exit *does not* call the destructors of any stack based objects so if
those objects have allocated any memory internally then yes that memory
will be leaked. "

reference/cstdlib/exit/ <https://cplusplus.com/reference/cstdlib/exit/>
"Note that objects with automatic storage are not destroyed by calling exit
(C++)."

I think that relying on the exit function for cleaning is extremely
fragile, especially on Windows.


> Having said that, applying this patch is just a matter of style.
>
IMO, a better and more correct style.

best regards,
Ranier Vilela

Fix some resources leaks (src/bin/pg_basebackup/pg_createsubscriber.c)

2024-03-27 Thread Ranier Vilela

Hi,

Per Coverity.

Coverity has some reports in the new code
pg_createsubscriber.c
I think that Coverity is right.

1.
CID 1542682: (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable buf going out of scope leaks the storage it points
to.

2.
CID 1542704: (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable conn going out of scope leaks the storage it
points to.

3.
CID 1542691: (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable str going out of scope leaks the storage it points
to.

Patch attached.

best regards,
Ranier Vilela


fix-some-issues-pg_createsubscriber.patch
Description: Binary data

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-27 Thread Ranier Vilela

Em qua., 27 de mar. de 2024 às 14:35, Nathan Bossart <
nathandboss...@gmail.com> escreveu:

> On Wed, Mar 27, 2024 at 01:47:38PM -0300, Ranier Vilela wrote:
> > Em qua., 27 de mar. de 2024 às 13:41, Nathan Bossart <
> > nathandboss...@gmail.com> escreveu:
> >> On Wed, Mar 27, 2024 at 01:21:23PM -0300, Ranier Vilela wrote:
> >> > I think that left an oversight in a commit d365ae7
> >> > <
> >>
> https://github.com/postgres/postgres/commit/d365ae705409f5d9c81da4b668f59c3598feb512
> >> >
> >> > If the admin_role is a NULL pointer, so, can be dereferenced
> >> > in the main loop of the function roles_is_member_of and
> >> > worst, IMO, can be destroying aleatory memory?
> >> >
> >> > First, is a better shortcut test to check if admin_role is NOT NULL.
> >> > Second, !OidIsValid(*admin_role), It doesn't seem necessary anymore.
> >> >
> >> > Or am I losing something?
> >>
> >> If admin_role is NULL, then admin_of is expected to be set to
> InvalidOid.
> >> See the assertion at the top of the function.  AFAICT the code that
> >> dereferences admin_role short-circuits if admin_of is invalid.
> >>
> > These conditions seem a little fragile and confusing to me.
> > When a simple test, it protects the pointer and avoids a series of tests,
> > which are unnecessary if the pointer is invalid.
>
> Maybe.  But that doesn't seem like an oversight in commit d365ae7.
>
Sorry for exceeding.

>
> -   if (otherid == admin_of && form->admin_option &&
> -   OidIsValid(admin_of) &&
> !OidIsValid(*admin_role))
> +   if (admin_role != NULL && otherid == admin_of &&
> form->admin_option &&
> +   OidIsValid(admin_of))
> *admin_role = memberid;
>
> I'm not following why it's safe to remove the !OidIsValid(*admin_role)
> check here.  We don't want to overwrite a previously-set value of
> *admin_role, as per the comment above roles_is_member_of():
>
>  * If admin_of is not InvalidOid, this function sets *admin_role, either
>  * to the OID of the first role in the result list that directly possesses
>  * ADMIN OPTION on the role corresponding to admin_of, or to InvalidOid if
>  * there is no such role.
>
Ok. If admin_role is NOT NULL, so *admin_role is InvalidOid, by
initialization
in the head of function.

I think that a cheap test *admin_role == InvalidOid, is enough?
What do you think?

v1 attached.

best regards,
Ranier Vilela


v1-avoid-dereference-a-null-pointer-acl.patch
Description: Binary data

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-27 Thread Ranier Vilela

Em qua., 27 de mar. de 2024 às 13:41, Nathan Bossart <
nathandboss...@gmail.com> escreveu:

> On Wed, Mar 27, 2024 at 01:21:23PM -0300, Ranier Vilela wrote:
> > Nathan Bossart  writes:
> >>Committed with that change. Thanks for the guidance on this one.
> >
> > I think that left an oversight in a commit d365ae7
> > <
> https://github.com/postgres/postgres/commit/d365ae705409f5d9c81da4b668f59c3598feb512
> >
> > If the admin_role is a NULL pointer, so, can be dereferenced
> > in the main loop of the function roles_is_member_of and
> > worst, IMO, can be destroying aleatory memory?
> >
> > First, is a better shortcut test to check if admin_role is NOT NULL.
> > Second, !OidIsValid(*admin_role), It doesn't seem necessary anymore.
> >
> > Or am I losing something?
>
> If admin_role is NULL, then admin_of is expected to be set to InvalidOid.
> See the assertion at the top of the function.  AFAICT the code that
> dereferences admin_role short-circuits if admin_of is invalid.
>
These conditions seem a little fragile and confusing to me.
When a simple test, it protects the pointer and avoids a series of tests,
which are unnecessary if the pointer is invalid.

best regards,
Ranier Vilela

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-27 Thread Ranier Vilela

Hi,

Nathan Bossart  writes:
>Committed with that change. Thanks for the guidance on this one.

I think that left an oversight in a commit d365ae7
<https://github.com/postgres/postgres/commit/d365ae705409f5d9c81da4b668f59c3598feb512>
If the admin_role is a NULL pointer, so, can be dereferenced
in the main loop of the function roles_is_member_of and
worst, IMO, can be destroying aleatory memory?

First, is a better shortcut test to check if admin_role is NOT NULL.
Second, !OidIsValid(*admin_role), It doesn't seem necessary anymore.

Or am I losing something?

best regards,
Ranier Vilela


avoid-dereference-a-null-pointer-acl.patch
Description: Binary data

Re: Avoid is possible a expensive function call (src/backend/utils/adt/varlena.c)

2024-03-05 Thread Ranier Vilela

Em seg., 4 de mar. de 2024 às 20:28, Tom Lane  escreveu:

> Michael Paquier  writes:
> > On Mon, Mar 04, 2024 at 03:08:03PM -0300, Ranier Vilela wrote:
> >> I can't see any user validation at the function
> pg_newlocale_from_collation.
>
> > Matthias is right, look closer.  I can see more than one check,
> > especially note the one related to the collation version mismatch that
> > should not be silently ignored.
>
> The fast path through that code doesn't include any checks, true,
> but the point is that finding the entry proves that we made those
> checks previously.  I can't agree with making those semantics
> squishy in order to save a few cycles in the exact-equality case.
>
Robustness is a fair point.

best regards,
Ranier Vilela

1 2 3 4 5 6 7 8 9 >

1 - 100 of 864 matches

Mail list logo