Re: Fix example in partitioning documentation

2019-09-24 Thread Amit Langote
On Wed, Sep 25, 2019 at 1:47 PM Michael Paquier  wrote:
>
> On Tue, Sep 24, 2019 at 11:36:40AM +0900, Amit Langote wrote:
> > Sure, thank you.
>
> And done with f5daf7f, back-patched down to 12.

Thanks again. :)

Regards,
Amit




Re: allow online change primary_conninfo

2019-09-24 Thread Kyotaro Horiguchi
Hello.

At Sat, 21 Sep 2019 13:06:25 +0300, Sergei Kornilov  wrote in 
<41171569060...@myt5-b646bde4b8f3.qloud-c.yandex.net>
> Hello
> 
> Thank you for review! Can you please also check v4 version? v5 implements 
> design suggested by Kyotaro Horiguchi-san, while v4 has another design. Which 
> one do you prefer? Or are both wrong?
> 
> > I can't parse that comment. What does "skipping to starting" mean? I
> > assume it's just about avoiding wal_retrieve_retry_interval, but I think
> > the comment ought to be rephrased.
> 
> Design idea is to rewrite current state from working XLOG_FROM_STREAM to 
> failed XLOG_FROM_ARCHIVE (without actually try this method on this iteration) 
> and immediately go to next iteration to advance the state machine. Next 
> iteration after failed archive recovery is walreceiver. So walreceiver will 
> be stopped just before this lines and started on next iteration. Walreceiver 
> will be restarted, we do not call restore_command

Sorry, it's my bad. It meant "If wal receiver is requested to
restart, change state to XLOG_FROM_STREAM immediately skipping
the next XLOG_FROM_ARCHIVE state.".

> > Also, should we really check this before scanning for new timelines?
> 
> Hmm, on the next day... No, this is not really necessary.

No problem when timeline is not changed when explicit restart of
wal receiver.  But if it happened just after the standby received
new hisotry file, we suffer an extra restart of wal receiver. It
seems better that we check that in the case.

> > Why is it the right thing to change to XLOG_FROM_ARCHIVE when we're just
> > restarting walreceiver? The server might unnecessarily get stuck in
> > archive based recovery for a long time this way? It seems to me that
> > we'd need to actually trigger RequestXLogStreaming() in this case.
> 
> I hope I clarified this in design idea description above.

My suggestion was just to pretend that the next XLOG_FROM_STREAM
failed, but the outcome actually looks wierd as Andres commented.

I think that comes from the structure of the state machine. WAL
receiver is started not at the beginning of XLOG_FROM_STREAM
state but at the end of XLOG_FROM_ARCHIVE. So we need to switch
once to XLOG_FROM_ARCHIVE in order to start wal receiver.

So, I'd like to propose to move the stuff to the second switch().
(See the attached incomplete patch.) This is rather similar to
Sergei's previous proposal, but the structure of the state
machine is kept.

Some other comments are below.

In ProcessStartupSigHup, conninfo and slotname don't need to be
retained until the end of the function.

The log message in the function seems to be too detailed.  On the
other hand, if we changed primary_conninfo to '' (stop) or vise
versa (start), the message (restart) looks strange.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6c69eb6dd7..dba0042db6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11755,12 +11755,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	for (;;)
 	{
 		int			oldSource = currentSource;
+		bool		start_wal_receiver = false;
 
 		/*
-		 * First check if we failed to read from the current source, and
+		 * First check if we failed to read from the current source or if
+		 * we want to restart wal receiver, and
 		 * advance the state machine if so. The failure to read might've
 		 * happened outside this function, e.g when a CRC check fails on a
 		 * record, or within this loop.
+		 * Restart wal receiver if explicitly requested, too.
 		 */
 		if (lastSourceFailed)
 		{
@@ -11788,53 +11791,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	if (!StandbyMode)
 		return false;
 
-	/*
-	 * If primary_conninfo is set, launch walreceiver to try
-	 * to stream the missing WAL.
-	 *
-	 * If fetching_ckpt is true, RecPtr points to the initial
-	 * checkpoint location. In that case, we use RedoStartLSN
-	 * as the streaming start position instead of RecPtr, so
-	 * that when we later jump backwards to start redo at
-	 * RedoStartLSN, we will have the logs streamed already.
-	 */
-	if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
-	{
-		XLogRecPtr	ptr;
-		TimeLineID	tli;
-
-		if (fetching_ckpt)
-		{
-			ptr = RedoStartLSN;
-			tli = ControlFile->checkPointCopy.ThisTimeLineID;
-		}
-		else
-		{
-			ptr = RecPtr;
-
-			/*
-			 * Use the record begin position to determine the
-			 * TLI, rather than the position we're reading.
-			 */
-			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
-
-			if (curFileTLI > 0 && tli < curFileTLI)
-elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
-	 (uint32) (tliRecPtr >> 32),
-	 (uint32) 

Re: Hypothetical indexes using BRIN broken since pg10

2019-09-24 Thread Julien Rouhaud
On Tue, Sep 24, 2019 at 11:53 PM Alvaro Herrera
 wrote:
>
> I think the danger is what happens if a version of your plugin that was
> compiled with the older definition runs in a Postgres which has been
> recompiled with the new code.  This has happened to me with previous
> unnoticed ABI breaks, and it has resulted in crashes in production
> systems.  It's not a nice situation to be in.

Indeed.

> If the break is benign, i.e. "nothing happens", then it's possibly a
> worthwhile change to consider.  I suppose the only way to know is to
> write patches for both sides and try it out.

IIUC, if something like Heikki's patch is applied on older branch the
problem will be magically fixed from the extension point of view so
that should be safe (an extension would only need to detect the minor
version to get a more useful error message for users), and all
alternatives are too intrusive to be patckbatched.




Re: Cache lookup errors with functions manipulation object addresses

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 03:25:08PM +0900, Kyotaro Horiguchi wrote:
> In 0003, empty string and NULL are not digtinguishable in psql
> text output. It'd be better that the regression test checks that
> the return is actually NULL using "is NULL" or "\pset null
> ''" or something like.

For a patch whose purpose is to check after NULL, I missed to consider
that...  Thanks!  I have just added a "\pset null NULL" because that's
less bulky than the other option, and all the results properly show
NULL, without any empty strings around.  I am not sending a new patch
set just for that change though, and the most recent version is here:
https://github.com/michaelpq/postgres/tree/objaddr_nulls
--
Michael


signature.asc
Description: PGP signature


Re: Fix example in partitioning documentation

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 11:36:40AM +0900, Amit Langote wrote:
> Sure, thank you.

And done with f5daf7f, back-patched down to 12.
--
Michael


signature.asc
Description: PGP signature


Re: dropdb --force

2019-09-24 Thread vignesh C
On Sat, Sep 21, 2019 at 10:09 PM Pavel Stehule  wrote:
> fixed
>
> Thank you for check. I am sending updated patch
>
Thanks for fixing all the comments.
Couple of suggestions:
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ ( option
[, ...] ) ] [ IF EXISTS ] name
+
+where option can
be one of:
+
+FORCE

+drop_option_list:
+ drop_option
+ {
+ $$ = list_make1((Node *) $1);
+ }
+ | drop_option_list ',' drop_option
+ {
+ $$ = lappend($1, (Node *) $3);
+ }
+ ;
+
+drop_option: FORCE
+ {
+ $$ = makeDefElem("force", NULL, @1);
+ }
+ ;

Currently only force option is supported in: drop database (force) dbname
Should we have a list or a single element for this?
I'm not sure if we have any plans to add more option in this area.

If possible we can add an error message like 'ERROR:  unrecognized
drop database option "force1"' if an invalid input is given.
The above error message will be inline with error message of vacuum's
error message whose syntax is similar to the current feature.
We could throw the error from here:
  case T_DropdbStmt:
  {
  DropdbStmt *stmt = (DropdbStmt *) parsetree;
+ bool force = false;
+ ListCell   *lc;
+
+ foreach(lc, stmt->options)
+ {
+ DefElem*item = (DefElem *) lfirst(lc);
+
+ if (strcmp(item->defname, "force") == 0)
+ force = true;
+ }
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: range test for hash index?

2019-09-24 Thread Amit Kapila
On Wed, Sep 18, 2019 at 9:30 AM Amit Kapila  wrote:
>
> On Mon, Sep 16, 2019 at 11:24 PM Paul A Jungwirth
>  wrote:
> >
> > On Mon, Sep 16, 2019 at 5:28 AM Amit Kapila  wrote:
> > > I don't see this function on the master branch.  Is this function name
> > > correct?  Are you looking at some different branch?
> >
> > Sorry about that! You're right, I was on my multirange branch. But I
> > see the same thing on latest master (but calling hash_range instead of
> > hash_range_internal).
> >
>
> No problem, attached is a patch with a proposed commit message.  I
> will wait for a few days to see if Heikki/Jeff or anyone else responds
> back, otherwise will commit and backpatch this early next week.
>

Today, while I was trying to backpatch, I realized that hash indexes
were not WAL-logged before 10 and they give warning "WARNING:  hash
indexes are not WAL-logged and their use is discouraged".  However,
this test has nothing to do with the durability of hash-indexes, so I
think we can safely backpatch, but still, I thought it is better to
check if anybody thinks that is not a good idea.   In back-branches,
we are already using hash-index in regression tests in some cases like
enum.sql, macaddr.sql, etc., so adding for one more genuine case
should be fine.  OTOH, we can back-patch till 10, but the drawback is
the tests will be inconsistent across branches.  Does anyone think it
is not a good idea to backpatch this till 9.4?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 02:48:16PM +0900, Kyotaro Horiguchi wrote:
> Of course I found no *explicit* ones. But I found one
> ereport(DEBUG1 in register_dirty_segment. So it will work at
> least for the case where fsync request queue is full.

Exactly.  I have not checked the patch in details, but I think that
we should not rely on the assumption that no code paths in this area do
not check after CHECK_FOR_INTERRUPTS() as smgrtruncate() does much
more than just the physical segment truncation.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 12:38:30PM -0300, Alvaro Herrera wrote:
> On 2019-Sep-24, Nikolay Shaplov wrote:
>> В письме от вторник, 24 сентября 2019 г. 9:25:54 MSK пользователь Alvaro 
>> Herrera написал:
>>> 0003 looks useful, thanks for completing it.  I think it would be a good
>>> idea to test invalid values for each type of reloption too (passing
>>> floating point to integers, strings to floating point, and so on).
>> 
>> We already do it in reloption regression tests.
>> 
>> My idea was to test here only the things that can't be tested in regression 
>> tests, on in real indexes like bloom.
> 
> I suppose that makes sense.  But of course when I push enum reloptions
> I will have to add such a test, since bloom does not have one.

Good point.  We rely now on the GUC parsing for reloptions, so having
cross-checks about what patterns are allowed or not is a good idea for
all reloption types.  I have added all that, and committed the
module.  The amount of noise generated by the string validator routine
was a bit annoying, so I have silenced them where they don't really
matter (basically everything except the initial creation).
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 04:49:11PM +0300, Nikolay Shaplov wrote:
> And then we came to a GUC variables. Because it we have five reloptions
> and we print them all each time we change something, there would be
> quite huge output.

Well, that depends on how you design your tests.  The first versions
of the patch overdid it, and those GUCs have IMO little place for a
module aimed as well to be a minimized referential template focused on
testing some portions of the backend code.

> It is ok when everything goes well. Comparing with 'expected' is cheap.
> But is something goes wrong, then it would be very difficult to find
> proper place in this output to deal with it.
> So I created GUCs so we can get only one output in a row, not a whole
> bunch.

I am still not convinced that this is worth the complication.  Your
point is that you want to make *on-demand* and *user-visible* the set
of options stored in rd_options after filling in the relation options
using the static table used in the AM.

One way to do that could be to have a simple wrapper function which
could be called at SQL level to do those checks, or you could issue a
NOTICE with all the data filled in amoptions() or even ambuild(),
though the former makes the most sense as we fill in the options
there.

One thing that I think would value in the module would be to show how
a custom string option can be properly parsed when doing some
decisions in the AM.  Now we store an offset in the static table, and
one needs to do a small dance with it to fetch the actual option
value.

This can be guessed easily as for example gist has a string option
with "buffering", but we could document that better in the dummy
template, say like that:
@@ -206,6 +210,15 @@ dioptions(Datum reloptions, bool validate)
fillRelOptions((void *) rdopts, sizeof(DummyIndexOptions), options, 
numoptions,
   validate, di_relopt_tab,  lengthof(di_relopt_tab));

+   option_string_val = (char *) rdopts + rdopts->option_string_val_offset;
+   option_string_null = (char *) rdopts + rdopts->option_string_null_offset;
+   ereport(NOTICE,
+   (errmsg("table option_int %d, option_real %f, option_bool %s, "
+   "option_string_val %s, option_option_null %s",
+   rdopts->option_int, rdopts->option_real,
+   rdopts->option_bool ? "true" : "false",
+   option_string_val ? option_string_val : "NULL",
+   option_string_null ? option_string_null : "NULL")));

The patch I have in my hands now is already doing a lot, so I am
discarding that part for now.  And we can easily improve it
incrementally.

(One extra thing which is also annoying with the current interface is
that we don't actually pass down the option name within the validator
function for string options like GUCs, so you cannot know on which
option you work on if a module generates logs, I'll send an extra
patch for that on a separate thread.)
--
Michael


signature.asc
Description: PGP signature


Re: dropdb --force

2019-09-24 Thread Amit Kapila
On Tue, Sep 24, 2019 at 6:22 PM Amit Kapila  wrote:
> On Sat, Sep 21, 2019 at 10:09 PM Pavel Stehule  
> wrote:
> >
> > Thank you for check. I am sending updated patch
> >
>
> Alvaro has up thread suggested some alternative syntax [1] for this
> patch, but I don't see any good argument to not go with what he has
> proposed.  In other words, why we should prefer the current syntax as
> in the patch over what Alvaro has proposed?
>
> IIUC,  the current syntax implemented by the patch is:
> Drop Database [(options)] [If Exists] name
> Alvaro suggested using options at the end (and use optional keyword
> WITH) based on what other Drop commands does.  I see some merits to
> that idea which are (a) if tomorrow we want to introduce new options
> like CASCADE, RESTRICT then it will be better to have all the options
> at the end as we have for other Drop commands, (b) It will resemble
> more with Create Database syntax.
>
> Now, I think the current syntax is also not bad and we already do
> something like that for other commands like Vaccum where options are
> provided before object_name, but I think in this case putting at the
> end is more appealing unless there are some arguments against that.
>
> One other minor comment:
> +
> +  This will also fail, if the connections do not terminate in 5 seconds.
> + 
>
> Is there any implementation in the patch for the above note?
>

One more point I would like to add here is that I think it is worth
considering to split this patch by keeping the changes in dropdb
utility as a separate patch.  Even though the code is not very much
but I think it can be a separate patch atop the main patch which
contains the core server changes.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-24 Thread Paul Guo
>
> On 2019-Sep-20, Paul Guo wrote:
>
> > The patch series failed on Windows CI. I modified the Windows build file
> to
> > fix that. See attached for the v7 version.
>
> Thanks.
>
> Question about 0003.  If we specify --skip-clean-shutdown and the cluter
> was not cleanly shut down, shouldn't we error out instead of trying to
> press on?


pg_rewind would error out in this case, see sanityChecks().
Users are expected to clean up themselves if they use this argument.


Re: Index Skip Scan

2019-09-24 Thread Kyotaro Horiguchi
At Tue, 24 Sep 2019 09:06:27 -0300, Alvaro Herrera  
wrote in <20190924120627.GA12454@alvherre.pgsql>
> On 2019-Sep-24, Kyotaro Horiguchi wrote:
> 
> > Sorry, it's not a boolean. A tristate value. From the definition
> > (Back, NoMove, Forward) = (-1, 0, 1), (dir1 == -dir2) if
> > NoMovement did not exist. If it is not guranteed,
> > 
> > (dir1 != 0 && dir1 == -dir2) ?
> 
> Maybe just add ScanDirectionIsOpposite(dir1, dir2) with that
> definition? :-)

Yeah, sounds good to establish it as a part of ScanDirection's
definition.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Memory Accounting

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 02:05:51PM +0200, Tomas Vondra wrote:
> The way I see it we can do either eager or lazy accounting. Eager might
> work better for aggregates with many contexts, but it does increase the
> overhead for the "regular" aggregates with just one or two contexts.
> Considering how rare those many-context aggregates are (I'm not aware of
> any such aggregate at the moment), it seems reasonable to pick the lazy
> accounting.

Okay.

> So I think the approach Jeff ended up with sensible - certainly as a
> first step. We may improve it in the future, of course, once we have
> more practical experience.
> 
> Barring objections, I do plan to get this committed by the end of this
> CF (i.e. sometime later this week).

Sounds good to me.  Though I have not looked at the patch in details,
the arguments are sensible.  Thanks for confirming.
--
Michael


signature.asc
Description: PGP signature


RE: [PATCH] Speedup truncates of relation forks

2019-09-24 Thread Jamison, Kirk
On Tuesday, September 24, 2019 5:41 PM (GMT+9), Fujii Masao wrote:
> On Thu, Sep 19, 2019 at 9:42 AM Jamison, Kirk 
> wrote:
> >
> > On Wednesday, September 18, 2019 8:38 PM, Fujii Masao wrote:
> > > On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk
> > > 
> > > wrote:
> > > >
> > > > On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:
> > > > > On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera
> > > > > 
> > > > > wrote:
> > > > > >
> > > > > > On 2019-Sep-13, Fujii Masao wrote:
> > > > > >
> > > > > > > On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk
> > > > > > > 
> > > > > wrote:
> > > > > >
> > > > > > > > > Please add a preliminary patch that removes the function.
> > > > > > > > > Dead code is good, as long as it is gone.  We can get it
> > > > > > > > > pushed ahead of
> > > > > the rest of this.
> > > > > > > >
> > > > > > > > Alright. I've attached a separate patch removing the
> > > smgrdounlinkfork.
> > > > > > >
> > > > > > > Per the past discussion, some people want to keep this "dead"
> > > > > > > function for some reasons. So, in my opinion, it's better to
> > > > > > > just enclose the function with #if NOT_USED and #endif, to
> > > > > > > keep the function itself as it is, and then to start new
> > > > > > > discussion on hackers about the removal of that separatedly from
> this patch.
> > > > > >
> > > > > > I searched for anybody requesting to keep the function.  I
> > > > > > couldn't find anything.  Tom said in 2012:
> > > > > > https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.
> > > > > > pa.u
> > > > > > s
> > > > >
> > > > > Yes. And I found Andres.
> > > > > https://www.postgresql.org/message-id/20180621174129.hogefyopje4
> > > > > xazn
> > > > > u@al
> > > > > ap3.anarazel.de
> > > > >
> > > > > > > As committed, the smgrdounlinkfork case is actually dead
> > > > > > > code; it's never called from anywhere.  I left it in place
> > > > > > > just in case we want it someday.
> > > > > >
> > > > > > but if no use has appeared in 7 years, I say it's time to kill it.
> > > > >
> > > > > +1
> > > >
> > > > The consensus is we remove it, right?
> > > > Re-attaching the patch that removes the deadcode: smgrdounlinkfork().
> > > >
> > > > ---
> > > > I've also fixed Fujii-san's comments below in the latest attached
> > > > speedup
> > > truncate rel patch (v8).
> > >
> > > Thanks for updating the patch!
> > >
> > > + block = visibilitymap_truncate_prepare(rel, 0); if
> > > + (BlockNumberIsValid(block))
> > >   {
> > > - xl_smgr_truncate xlrec;
> > > + fork = VISIBILITYMAP_FORKNUM;
> > > + smgrtruncate(rel->rd_smgr, , 1, );
> > > +
> > > + if (RelationNeedsWAL(rel))
> > > + {
> > > + xl_smgr_truncate xlrec;
> > >
> > > I don't think this fix is right. Originally, WAL is generated even
> > > in the case where visibilitymap_truncate_prepare() returns
> > > InvalidBlockNumber. But the patch unexpectedly changed the logic so
> > > that WAL is not generated in that case.
> > >
> > > + if (fsm)
> > > + FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
> > > + InvalidBlockNumber);
> > >
> > > This code means that FreeSpaceMapVacuumRange() is called if FSM
> > > exists even if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
> > > This seems not right. Originally, FreeSpaceMapVacuumRange() is not
> > > called in the case where InvalidBlockNumber is returned.
> > >
> > > So I updated the patch based on yours and fixed the above issues.
> > > Attached. Could you review this one? If there is no issue in that,
> > > I'm thinking to commit that.
> >
> > Oops. Thanks for the catch to correct my fix and revision of some
> descriptions.
> > I also noticed you reordered the truncation of forks,  by which main
> > fork will be truncated first instead of FSM. I'm not sure if the order
> > matters now given that we're truncating the forks simultaneously, so I'm
> ok with that change.
> 
> I changed that order so that DropRelFileNodeBuffers() can scan shared_buffers
> more efficiently. Usually the number of buffers for MAIN fork is larger than
> the others, in shared_buffers. So it's better to compare MAIN fork first for
> performance, during full scan of shared_buffers.
> 
> > Just one minor comment:
> > + * Return the number of blocks of new FSM after it's truncated.
> >
> > "after it's truncated" is quite confusing.
> > How about, "as a result of previous truncation" or just end the sentence
> after new FSM?
> 
> Thanks for the comment!
> I adopted the latter and committed the patch. Thanks!

Thank you very much Fujii-san for taking time to review
as well as for committing this patch!

Regards,
Kirk Jamison


Re: DROP SUBSCRIPTION with no slot

2019-09-24 Thread Jeff Janes
On Tue, Sep 24, 2019 at 6:42 PM Ziga  wrote:

> This also seems to be a problem for somewhat fringe case of subscriptions
> created with connect=false option.
> They cannot be dropped in an obvious way, without knowing the ALTER
> SUBSCRIPTION trick.
>
> For example:
>
> contrib_regression=# create subscription test_sub connection
> 'dbname=contrib_regression' publication test_pub with ( connect=false );
>
>
> WARNING:  tables were not subscribed, you will have to run ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
> CREATE SUBSCRIPTION
>
> contrib_regression=# drop subscription test_sub; -- fails
> ERROR:  could not drop the replication slot "test_sub" on publisher
> DETAIL:  The error was: ERROR:  replication slot "test_sub" does not exist
>
> contrib_regression=# alter subscription test_sub set ( slot_name=none );
> ALTER SUBSCRIPTION
>
> contrib_regression=# drop subscription test_sub; -- succeeds
> DROP SUBSCRIPTION
>
>
> Note that the publication was never refreshed.
> It seems that the first DROP should succeed in the above case.
> Or at least some hint should be given how to fix this.
>

There is no HINT in the error message itself, but there is in the
documentation, see note at end of
https://www.postgresql.org/docs/current/sql-dropsubscription.html.  I agree
with you that the DROP should just work in this case, even more so than in
my case.  But if we go with the argument that doing that is too error
prone, then do we want to include a HINT on how to be error prone more
conveniently?

Cheers,

Jeff


Re: DROP SUBSCRIPTION with no slot

2019-09-24 Thread Jeff Janes
On Tue, Sep 24, 2019 at 5:25 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-09-24 16:31, Jeff Janes wrote:
> > I recently had to cut loose (pg_drop_replication_slot) a logical replica
> > that couldn't keep up and so was threatening to bring down the master.
> >
> > In mopping up on the replica side, I couldn't just drop the
> > subscription, because it couldn't drop the nonexistent slot on the
> > master and so refused to work.  So I had to do a silly little dance
> > where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
> > (slot_name = NONE), then drop it.
> >
> > Wanting to clean up after itself is admirable, but if there is nothing
> > to clean up, why should that be an error condition?
>
> The alternatives seem quite error prone to me.  Better be explicit.
>

If you can connect to the master, and see that the slot already fails to
exist, what is error prone about that?

If someone goes to the effort of setting up a different master, configures
it to receive replica connections, and alters the subscription CONNECTION
parameter on the replica to point to that poisoned master, will an error on
the DROP SUBSCRIPTION really force them to see the error of their ways, or
will they just succeed at explicitly doing the silly dance and finalize the
process of shooting themselves in the foot via a roundabout mechanism?
(And at the point the CONNECTION is changed, he is in the same boat even if
he doesn't try to drop the sub--either way the master has a dangling
slot).  I'm in favor of protecting a fool from his foolishness, except when
it is annoying to the rest of us (Well, annoying to me, I guess. I don't
know yet about the rest of us).

Cheers,

Jeff


Re: FETCH FIRST clause WITH TIES option

2019-09-24 Thread Tom Lane
Alvaro Herrera  writes:
> create table w (a point);

> # select * from w order by a fetch first 2 rows with ties;
> ERROR:  could not identify an ordering operator for type point
> LINE 1: select * from w order by a fetch first 2 rows with ties;
>   ^
> HINT:  Use an explicit ordering operator or modify the query.

> I'm not sure that the HINT is useful here.

That's not new to this patch, HEAD does the same:

regression=# create table w (a point);
CREATE TABLE
regression=# select * from w order by a ;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a ;
 ^
HINT:  Use an explicit ordering operator or modify the query.

It is a meaningful hint IMO, since (in theory) you could add
something like "USING <<" to the ORDER BY to specify a
particular ordering operator.  The fact that no suitable
operator is actually available in core doesn't seem like
a reason not to give the hint.

regards, tom lane




Re: Bug in GiST paring heap comparator

2019-09-24 Thread Alexander Korotkov
On Wed, Sep 25, 2019 at 1:22 AM Nikita Glukhov  wrote:
> Attached another one-line patch that fixes incorrect number of distances used
> in pairingheap_SpGistSearchItem_cmp():
>
> -for (i = 0; i < so->numberOfOrderBys; i++)
> +for (i = 0; i < so->numberOfNonNullOrderBys; i++)
>
>
> This change was present in my original commit, but it seems to have been
> missed after the rebase.  Sorry.

Pushed, thanks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: allocation limit for encoding conversion

2019-09-24 Thread Andres Freund
Hi, 

On September 24, 2019 3:09:28 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On 2019-09-24 16:19:41 -0400, Tom Lane wrote:
>>> Andres Freund  writes:
 It's probably too big a hammer for this specific case, but I think
>at
 some point we ought to stop using fixed size allocations for this
>kind
 of work. Instead we should use something roughly like our
>StringInfo,
 except that when exceeding the current size limit, the overflowing
>data
 is stored in a separate allocation. And only once we actually need
>the
 data in a consecutive form, we allocate memory that's large enough
>to
 store the all the separate allocations in their entirety.
>
>>> That sounds pretty messy :-(.
>
>> I don't think it's too bad - except for now allowing the .data member
>of
>> such a 'chunked' StringInfo to be directly accessible, it'd be just
>> about the same interface as the current stringinfo.
>
>I dunno.  What you're describing would be a whole lotta work, and it'd
>break a user-visible API, and no amount of finagling is going to
>prevent
>it from making conversions somewhat slower, and the cases where it
>matters
>to not preallocate a surely-large-enough buffer are really few and far
>between.  I have to think that we have better ways to spend our time.

It'd not just avoid the overallocation, but also avoid the strlen and memcpy 
afterwards at the callsites, as well as the separate allocation. So I'd bet 
it'd be almost always a win.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: DROP SUBSCRIPTION with no slot

2019-09-24 Thread Ziga
This also seems to be a problem for somewhat fringe case of subscriptions 
created with connect=false option.
They cannot be dropped in an obvious way, without knowing the ALTER 
SUBSCRIPTION trick.

For example:

contrib_regression=# create subscription test_sub connection 
'dbname=contrib_regression' publication test_pub with ( connect=false );


WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION 
... REFRESH PUBLICATION to subscribe the tables
CREATE SUBSCRIPTION

contrib_regression=# drop subscription test_sub; -- fails
ERROR:  could not drop the replication slot "test_sub" on publisher
DETAIL:  The error was: ERROR:  replication slot "test_sub" does not exist

contrib_regression=# alter subscription test_sub set ( slot_name=none );
ALTER SUBSCRIPTION

contrib_regression=# drop subscription test_sub; -- succeeds
DROP SUBSCRIPTION


Note that the publication was never refreshed.
It seems that the first DROP should succeed in the above case. 
Or at least some hint should be given how to fix this.




> On 24 Sep 2019, at 23:25, Peter Eisentraut  
> wrote:
> 
> On 2019-09-24 16:31, Jeff Janes wrote:
>> I recently had to cut loose (pg_drop_replication_slot) a logical replica
>> that couldn't keep up and so was threatening to bring down the master.
>> 
>> In mopping up on the replica side, I couldn't just drop the
>> subscription, because it couldn't drop the nonexistent slot on the
>> master and so refused to work.  So I had to do a silly little dance
>> where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
>> (slot_name = NONE), then drop it.
>> 
>> Wanting to clean up after itself is admirable, but if there is nothing
>> to clean up, why should that be an error condition?
> 
> The alternatives seem quite error prone to me.  Better be explicit.
> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 



Re: Bug in GiST paring heap comparator

2019-09-24 Thread Nikita Glukhov

On 20.09.2019 0:15, Nikita Glukhov wrote:

On 19.09.2019 22:14, Alexander Korotkov wrote:

Pushed.

Attached patch fixes premature xs_orderbynulls[] assignment.  The old value
of NULL flag, not the new, should be checked before pfree()ing the old value.


Attached another one-line patch that fixes incorrect number of distances used
in pairingheap_SpGistSearchItem_cmp():

-for (i = 0; i < so->numberOfOrderBys; i++)
+for (i = 0; i < so->numberOfNonNullOrderBys; i++)


This change was present in my original commit, but it seems to have been
missed after the rebase.  Sorry.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 2e58a94fd2f973257d1c8a3d9cd0b907c719a859 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Tue, 24 Sep 2019 01:33:03 +0300
Subject: [PATCH] Fix number of distances in pairingheap_SpGistSearchItem_cmp()

---
 src/backend/access/spgist/spgscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index cfba470..6e940fd 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -56,7 +56,7 @@ pairingheap_SpGistSearchItem_cmp(const pairingheap_node *a,
 	else
 	{
 		/* Order according to distance comparison */
-		for (i = 0; i < so->numberOfOrderBys; i++)
+		for (i = 0; i < so->numberOfNonNullOrderBys; i++)
 		{
 			if (isnan(sa->distances[i]) && isnan(sb->distances[i]))
 continue;		/* NaN == NaN */
-- 
2.7.4



Re: allocation limit for encoding conversion

2019-09-24 Thread Tom Lane
Andres Freund  writes:
> On 2019-09-24 16:19:41 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> It's probably too big a hammer for this specific case, but I think at
>>> some point we ought to stop using fixed size allocations for this kind
>>> of work. Instead we should use something roughly like our StringInfo,
>>> except that when exceeding the current size limit, the overflowing data
>>> is stored in a separate allocation. And only once we actually need the
>>> data in a consecutive form, we allocate memory that's large enough to
>>> store the all the separate allocations in their entirety.

>> That sounds pretty messy :-(.

> I don't think it's too bad - except for now allowing the .data member of
> such a 'chunked' StringInfo to be directly accessible, it'd be just
> about the same interface as the current stringinfo.

I dunno.  What you're describing would be a whole lotta work, and it'd
break a user-visible API, and no amount of finagling is going to prevent
it from making conversions somewhat slower, and the cases where it matters
to not preallocate a surely-large-enough buffer are really few and far
between.  I have to think that we have better ways to spend our time.

regards, tom lane




Re: [bug fix??] Fishy code in tts_cirtual_copyslot()

2019-09-24 Thread Andres Freund
Hi,

On 2019-09-22 14:24:36 -0400, Tom Lane wrote:
> "Tsunakawa, Takayuki"  writes:
> > In the following code in execTuples.c, shouldn' srcdesc point to the source 
> > slot's tuple descriptor?  The attached fix passes make check.  What kind of 
> > failure could this cause?
> 
> Yeah, sure looks like a typo to me too.

Indeed, thanks for catching and pushing.


> I temporarily changed the Assert to be "==" rather than "<=", and
> it still passed check-world, so evidently we are not testing any
> cases where the descriptors are of different lengths.  This explains
> the lack of symptoms.

I have a hard time seeing cases where it'd be a good idea to copy slots
of a smaller natts into a slot with larger natts. So i'm not too
surprised.


> It's still a bug though, so pushed.

Indeed.

Greetings,

Andres Freund




Re: Hypothetical indexes using BRIN broken since pg10

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-24, Julien Rouhaud wrote:

> On Mon, Sep 9, 2019 at 5:03 PM Tom Lane  wrote:

> > Whether we should bother back-patching a less capable stopgap fix
> > is unclear to me.  Yeah, it's a bug that an index adviser can't
> > try a hypothetical BRIN index; but given that nobody noticed till
> > now, it doesn't seem like there's much field demand for it.
> > And I'm not sure that extension authors would want to deal with
> > testing minor-release versions to see if the fix is in, so
> > even if there were a back-patch, it might go unused.
> 
> FWIW I maintain such an extension and testing for minor release
> version is definitely not a problem.

I think the danger is what happens if a version of your plugin that was
compiled with the older definition runs in a Postgres which has been
recompiled with the new code.  This has happened to me with previous
unnoticed ABI breaks, and it has resulted in crashes in production
systems.  It's not a nice situation to be in.

If the break is benign, i.e. "nothing happens", then it's possibly a
worthwhile change to consider.  I suppose the only way to know is to
write patches for both sides and try it out.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Peter Eisentraut
On 2019-09-24 09:18, Victor Wagner wrote:
> Problem is that some code in src/backend/libpq/be-secure-openssl.c
> assumes that if preprocessor symbols TLS1_1_VERSION and TLS1_2_VERSION
> are defined in the openssl headers, corresponding versions of TLS are
> supported by the library.
> 
> It is not so. Here is exempt from tls1.h header file from the openssl
> 0.9.8j
> 
> #define TLS1_VERSION0x0301
> #define TLS1_1_VERSION  0x0302
> #define TLS1_2_VERSION  0x0303
> /* TLS 1.1 and 1.2 are not supported by this version of OpenSSL, so
>  * TLS_MAX_VERSION indicates TLS 1.0 regardless of the above
>  * definitions. (s23_clnt.c and s23_srvr.c have an OPENSSL_assert()
>  * check that would catch the error if TLS_MAX_VERSION was too low.)
>  */
> #define TLS_MAX_VERSION TLS1_VERSION

That's not actually what this file looks like in the upstream release.
It looks like the packagers must have patched in the protocol codes for
TLS 1.1 and 1.2 themselves.  Then they should also add the corresponding
SSL_OP_NO_* flags.  AFAICT, these pairs of symbols are always added
together in upstream commits.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: allocation limit for encoding conversion

2019-09-24 Thread Andres Freund
On 2019-09-24 16:19:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-08-16 17:31:49 -0400, Tom Lane wrote:
> >> I fear that allowing pg_do_encoding_conversion to return strings longer
> >> than 1GB is just going to create failure cases somewhere else.
> >>
> >> However, it's certainly true that 4x growth is a pretty unlikely worst
> >> case.  Maybe we could do something like
> >> 1. If string is short (say up to a few megabytes), continue to do it
> >> like now.  This avoids adding overhead for typical cases.
> >> 2. Otherwise, run some lobotomized form of encoding conversion that
> >> just computes the space required (as an int64, I guess) without saving
> >> the result anywhere.
> >> 3. If space required > 1GB, fail.
> >> 4. Otherwise, allocate just the space required, and convert.
>
> > It's probably too big a hammer for this specific case, but I think at
> > some point we ought to stop using fixed size allocations for this kind
> > of work. Instead we should use something roughly like our StringInfo,
> > except that when exceeding the current size limit, the overflowing data
> > is stored in a separate allocation. And only once we actually need the
> > data in a consecutive form, we allocate memory that's large enough to
> > store the all the separate allocations in their entirety.
>
> That sounds pretty messy :-(.

I don't think it's too bad - except for now allowing the .data member of
such a 'chunked' StringInfo to be directly accessible, it'd be just
about the same interface as the current stringinfo. Combined perhaps
with a helper or two for "de-chunking" directly into another stringinfo,
network buffer etc, to avoid the unnecessary allocation of a buffer of
the overall size when the result is just going to be memcpy'd elsewhere.

Obviously a good first step would just to pass a StringInfo to the
encoding routines. That'd solve the need for pessimistic overallocation,
because the buffer can be enlarged. And by sizing the initial allocation
to the input string (or at least only a small factor above), we'd not
usually need (many) reallocations.

That'd also remove the need for unnecessary strlen/memcpy done in many
encoding conversion callsites, like e.g.:

p = pg_server_to_client(str, slen);
if (p != str)   /* actual conversion has been 
done? */
{
slen = strlen(p);
appendBinaryStringInfo(buf, p, slen);
pfree(p);
}

which do show up in profiles.


> I spent some time looking at what I proposed above, and concluded that
> it's probably impractical.  In the first place, we'd have to change
> the API spec for encoding conversion functions.  Now maybe that would
> not be a huge deal, because there likely aren't very many people outside
> the core code who are defining their own conversion functions, but it's
> still a negative.  More importantly, unless we wanted to duplicate
> large swaths of code, we'd end up inserting changes about like this
> into the inner loops of encoding conversions:
>
> - *dest++ = code;
> + if (dest)
> + *dest++ = code;
> + outcount++;
> which seems like it'd be bad for performance.

One thing this made me wonder is if we shouldn't check the size of the
output string explicitly, rather than relying on overallocation. The
only reason we need an allocation bigger than MaxAllocSize here is that
we don't pass the output buffer size to the conversion routines. If
those routines instead checked whether the output buffer size is
exceeded, and returned with the number of converted input bytes *and*
the position in the output buffer, we wouldn't have to overallocate
quite so much.

But I suspect using a StringInfo, as suggested above, would be better.

To avoid making the tight innermost loop more expensive, I'd perhaps
code it roughly like:



/*
 * Initially size output buffer to the likely required length, to
 * avoid unnecessary reallocations while growing.
 */
enlargeStringInfo(output, input_len * ESTIMATED_GROWTH_FACTOR);

/*
 * Process input in chunks, to reduce overhead of maintaining output buffer
 * for each processed input char. Increasing the buffer size too much will
 * lead to memory being wasted due to the necessary over-allocation.
 */
#define CHUNK_SIZE 128
remaining_bytes = input_len;
while (remaining_bytes > 0)
{
local_len = Min(remaining_bytes, CHUNK_SIZE);

/* ensure we have output buffer space for this chunk */
enlargeStringInfo(output, MAX_CONVERSION_GROWTH * local_len);

/* growing the stringinfo may invalidate previous dest */
dest = output->data + output->len;

while (local_len > 0)
{
/* current conversion logic, barely any slower */
}

remaining_bytes -= local_len;
output->len = dest - output->data;
}

Assert(remaining_bytes == 

Release 11 of PostgreSQL Buildfarm client

2019-09-24 Thread Andrew Dunstan


Release 11 of the PostgreSQL Buildfarm client is now available


Apart from some bug fixes, there are to following features:


. Allow a list of branches as positional arguments to run_branches.pl
  This overrides what is found in the config file. The list can't include
  metabranches like ALL, nor can it contain regexes.
. improve diagnostic capture for git and fetching branches of interest
. unify config.log and configure.log
. add siginfo to gdb output
. improve test coverage
  - run check for test modules marked NO_INSTALLCHECK
  - run TAP tests for test modules that have them
  - run TAP tests for contrib modules that have them
. explicitly use "trust" with initdb


Download from
 or




Enjoy


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: DROP SUBSCRIPTION with no slot

2019-09-24 Thread Peter Eisentraut
On 2019-09-24 16:31, Jeff Janes wrote:
> I recently had to cut loose (pg_drop_replication_slot) a logical replica
> that couldn't keep up and so was threatening to bring down the master.
> 
> In mopping up on the replica side, I couldn't just drop the
> subscription, because it couldn't drop the nonexistent slot on the
> master and so refused to work.  So I had to do a silly little dance
> where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
> (slot_name = NONE), then drop it.
> 
> Wanting to clean up after itself is admirable, but if there is nothing
> to clean up, why should that be an error condition?

The alternatives seem quite error prone to me.  Better be explicit.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Improving on MAX_CONVERSION_GROWTH

2019-09-24 Thread Tom Lane
Thinking about the nearby thread[1] about overrunning MaxAllocSize
during encoding conversion, it struck me that another thing
we could usefully do to improve that situation is to be smarter
about what's the growth factor --- the existing one-size-fits-all
choice of MAX_CONVERSION_GROWTH = 4 is leaving a lot on the table.

In particular, it seems like we could usefully frame things as
having a constant max growth factor associated with each target
encoding, stored as a new field in pg_wchar_table[].  By definition,
the max growth factor cannot be more than the maximum character
length in the target encoding.  So this approach immediately gives
us a growth factor of 1 with any single-byte output encoding,
and even many of the multibyte encodings would have max growth 2
or 3 without having to think any harder than that.

But we can do better, I think, recognizing that all the supported
encodings are ASCII extensions.  The only possible way to expend
4 output bytes per input byte is if there is some 1-byte character
that translates to a 4-byte character, and I think this is not the
case for converting any of our encodings to UTF8.  If you need at
least a 2-byte character to produce a 3-byte or 4-byte UTF8 character,
then UTF8 has max growth 2.  I'm not quite sure if that's true
for every source encoding, but I'm pretty certain it couldn't be
worse than 3.

It might be worth getting a bit more complex and having a 2-D
array indexed by both source and destination encodings to determine
the max growth factor.  I haven't run tests to empirically verify
what is the max growth factor.

A fly in this ointment is: could a custom encoding conversion
function violate our conclusions about what's the max growth
factor?  Maybe it would be worth treating the growth factor
as a property of a particular conversion (i.e., add a column
to pg_conversion) rather than making it a hard-wired property.

In any case, it seems likely that we could end up with a
multiplier of 1, 2, or 3 rather than 4 in just about every
case of practical interest.  That sure seems like a win
when converting long strings.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/20190816181418.GA898@alvherre.pgsql




Re: JSONPATH documentation

2019-09-24 Thread Peter Eisentraut
On 2019-09-23 00:03, Tom Lane wrote:
> While we're whining about this, I find it very off-putting that
> the jsonpath stuff was inserted in the JSON functions section
> ahead of the actual JSON functions.  I think it should have
> gone after them, because it feels like a barely-related interjection
> as it stands.  Maybe there's even a case that it should be
> its own , after the "functions-json" section.

I'd just switch the sect2's around.

That would be similar to how the documentation of regular expressions is
laid out: functions first, then details of the contained mini-language.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: abort-time portal cleanup

2019-09-24 Thread Andres Freund
Hi,

On 2019-09-12 16:42:46 -0400, Robert Haas wrote:
> The trouble starts with the header comment for AtAbort_Portals, which
> states that "At this point we run the cleanup hook if present, but we
> can't release the portal's memory until the cleanup call." At the time
> this logic was introduced in commit
> de28dc9a04c4df5d711815b7a518501b43535a26 (2003-05-02),
> AtAbort_Portals() affected all non-held portals without caring whether
> they were active or not, and, UserAbortTransactionBlock() called
> AbortTransaction() directly, so typing "ROLLBACK;" would cause
> AtAbort_Portals() to be reached from within PortalRun().  Even if
> PortalRun() managed to return without crashing, the caller would next
> try to call PortalDrop() on what was now an invalid pointer. However,
> commit 8f9f1986034a2273e09ad10671e10d1adda21d1f (2004-09-16) changed
> things so that UserAbortEndTransaction() just sets things up so that
> the subsequent call to CommitTransactionCommand() would call
> AbortTransaction() instead of trying to do it right away, and that
> doesn't happen until after we're done with the portal.  As far as I
> can see, that change made this comment mostly false, but the comment
> has nevertheless managed to survive for another ~15 years. I think we
> can, and in fact should, just drop the portal right here.

Nice digging.


> As far as actually making that work, there are a few wrinkles. The
> first is that we might be in the middle of FATAL. In that case, unlike
> the ROLLBACK case, a call to PortalRun() is still on the stack, but
> we'll exit the process rather than returning, so the fact that we've
> created a dangling pointer for the caller won't matter. However, as
> shown by commit ad9a274778d2d88c46b90309212b92ee7fdf9afe (2018-02-01)
> and the report that led up to it at
> https://www.postgresql.org/message-id/20180128034531.h6o4w3727ifof3jy%40alap3.anarazel.de
> it's not a good idea to try to clean up the portal in that case,
> because we might've already started shutting down critical systems.
> It seems not only risky but also unnecessary: our process-local state
> is about to go away, and the executor shouldn't need to clean up any
> shared memory state that won't also get cleaned up by some other
> mechanism. So, it seems to me that if we reach AtAbort_Portals()
> during FATAL processing, we should either (1) do nothing at all and
> just return or (2) forget about all the existing portals without
> cleaning them up and then return.  The second option seems a little
> safer to me, because it guarantees that if we somehow reach code that
> might try to look up a portal later, it won't find anything. But I
> think it's arguable.

Hm. Doing that cleanup requires digging through all the portals etc. I'd
rather rely on less state being correct than more during FATAL
processing.


> The second wrinkle is that there might be an active portal.  Apart
> from the FATAL case already mentioned, I think the only way this can
> happen is some C code that calls purposefully calls AbortTransaction()
> in the middle of executing a command.  It can't be an ERROR, because
> then the portal would be marked as failed before we get here, and it
> can't be an explicit ROLLBACK, because as noted above, that case was
> changed 15 years ago.  It's got to be some other case where C code
> calls AbortTransaction() voluntarily in the middle of a statement. For
> over a decade, there were no cases at all of this type, but the code
> in this function catered to hypothetical cases by marking the portal
> failed. By 2016, Noah had figured out that this was bogus, and that
> any future cases would likely require different handling, but Tom and
> I shouted him down:

Hm. But wouldn't doing so run into a ton of problems anyway? I mean we
need to do a ton of checks and special case hangups to make
CommitTransactionCommand();StartTransactionCommand(); work for VACUUM,
CIC, ...

The cases where one can use AbortTransaction() (via
AbortCurrentTransaction() presumably) are ones where either there's no
surrounding code relying on the transaction (e.g. autovacuum,
postgres.c), or where special care has been taken with portals
(e.g. _SPI_rollback()).  We didn't have the pin mechanism back then, so
I think even if we accept your/Tom's reasoning from back then (I don't
really), it's outdated now that the pin mechanism exists.

I'd be happy if we added some defenses against such bogus cases being
introduced (i.e. erroring out if we encounter an active portal during
abort processing).

> Stepping back a bit, stored procedures are a good example of a command
> that uses multiple transactions internally. We have a few others, such
> as VACUUM, but at present, that case only commits transactions
> internally; it does not roll them back internally.  If it did, it
> would want the same thing that procedures want, namely, to leave the
> active portal alone. It doesn't quite work to leave the active portal
> completely alone, because the portal 

Re: allocation limit for encoding conversion

2019-09-24 Thread Tom Lane
Andres Freund  writes:
> On 2019-08-16 17:31:49 -0400, Tom Lane wrote:
>> I fear that allowing pg_do_encoding_conversion to return strings longer
>> than 1GB is just going to create failure cases somewhere else.
>> 
>> However, it's certainly true that 4x growth is a pretty unlikely worst
>> case.  Maybe we could do something like
>> 1. If string is short (say up to a few megabytes), continue to do it
>> like now.  This avoids adding overhead for typical cases.
>> 2. Otherwise, run some lobotomized form of encoding conversion that
>> just computes the space required (as an int64, I guess) without saving
>> the result anywhere.
>> 3. If space required > 1GB, fail.
>> 4. Otherwise, allocate just the space required, and convert.

> It's probably too big a hammer for this specific case, but I think at
> some point we ought to stop using fixed size allocations for this kind
> of work. Instead we should use something roughly like our StringInfo,
> except that when exceeding the current size limit, the overflowing data
> is stored in a separate allocation. And only once we actually need the
> data in a consecutive form, we allocate memory that's large enough to
> store the all the separate allocations in their entirety.

That sounds pretty messy :-(.

I spent some time looking at what I proposed above, and concluded that
it's probably impractical.  In the first place, we'd have to change
the API spec for encoding conversion functions.  Now maybe that would
not be a huge deal, because there likely aren't very many people outside
the core code who are defining their own conversion functions, but it's
still a negative.  More importantly, unless we wanted to duplicate
large swaths of code, we'd end up inserting changes about like this
into the inner loops of encoding conversions:

-   *dest++ = code;
+   if (dest)
+   *dest++ = code;
+   outcount++;

which seems like it'd be bad for performance.

So I now think that Alvaro's got basically the right idea, except
that I'm still afraid to allow strings larger than MaxAllocSize
to run around loose in the backend.  So in addition to the change
he suggested, we need a final check on strlen(result) not being
too large.  We can avoid doing a useless strlen() if the input len
is small, though.

It then occurred to me that we could also repalloc the output buffer
down to just the required size, which is pointless if it's small
but not if we can give back several hundred MB.  This is conveniently
mergeable with the check to see whether we need to check strlen or not.

... or at least, that's what I thought we could do.  Testing showed
me that AllocSetRealloc never actually gives back any space, even
when it's just acting as a frontend for a direct malloc.  However,
we can fix that for little more than the price of swapping the order
of the is-it-a-decrease and is-it-a-large-chunk stanzas, as in the
0002 patch below.

I also put back the missing overflow check --- although that's unreachable
in a 64-bit machine, it's not at all in 32-bit.  The patch is still
useful in 32-bit though, since it still doubles the size of string
we can cope with.

I think this is committable, though surely another pair of eyeballs
on it wouldn't hurt.  Also, is it worth having a different error
message for the case where the output does exceed MaxAllocSize?

regards, tom lane

diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index bec54bb..6b08b77 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -349,16 +349,24 @@ pg_do_encoding_conversion(unsigned char *src, int len,
 		pg_encoding_to_char(dest_encoding;
 
 	/*
-	 * Allocate space for conversion result, being wary of integer overflow
+	 * Allocate space for conversion result, being wary of integer overflow.
+	 *
+	 * len * MAX_CONVERSION_GROWTH is typically a vast overestimate of the
+	 * required space, so it might exceed MaxAllocSize even though the result
+	 * would actually fit.  We do not want to hand back a result string that
+	 * exceeds MaxAllocSize, because callers might not cope gracefully --- but
+	 * if we just allocate more than that, and don't use it, that's fine.
 	 */
-	if ((Size) len >= (MaxAllocSize / (Size) MAX_CONVERSION_GROWTH))
+	if ((Size) len >= (MaxAllocHugeSize / (Size) MAX_CONVERSION_GROWTH))
 		ereport(ERROR,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg("out of memory"),
  errdetail("String of %d bytes is too long for encoding conversion.",
 		   len)));
 
-	result = palloc(len * MAX_CONVERSION_GROWTH + 1);
+	result = (unsigned char *)
+		MemoryContextAllocHuge(CurrentMemoryContext,
+			   (Size) len * MAX_CONVERSION_GROWTH + 1);
 
 	OidFunctionCall5(proc,
 	 Int32GetDatum(src_encoding),
@@ -366,6 +374,26 @@ pg_do_encoding_conversion(unsigned char *src, int len,
 	 CStringGetDatum(src),
 	 CStringGetDatum(result),
 	 

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-20, Paul Guo wrote:

> The patch series failed on Windows CI. I modified the Windows build file to
> fix that. See attached for the v7 version.

Thanks.

Question about 0003.  If we specify --skip-clean-shutdown and the cluter
was not cleanly shut down, shouldn't we error out instead of trying to
press on?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Attempt to consolidate reading of XLOG page

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-24, Antonin Houska wrote:

> Alvaro Herrera  wrote:

> > If you don't have any strong dislikes for these changes, I'll push this
> > part and let you rebase the remains on top.
> 
> No objections here.

oK, pushed.  Please rebase the other parts.

I made one small adjustment: in read_local_xlog_page() there was one
*readTLI output parameter that was being changed to a local variable
plus later assigment to the output struct member; I changed the code to
continue to assign directly to the output variable instead.  There was
an error case in which the TLI was not assigned to; I suppose this
doesn't really change things (we don't examine the TLI in that case, do
we?), but it seemed dangerous to leave like that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-09-24 Thread Peter Geoghegan
On Mon, Sep 23, 2019 at 5:13 PM Peter Geoghegan  wrote:
> I attach version 17.

I attach a patch that applies on top of v17. It adds support for
deduplication within unique indexes. Actually, this is a snippet of
code that appeared in my prototype from August 5 (we need very little
extra code for this now). Unique index support kind of looked like a
bad idea at the time, but things have changed a lot since then.

I benchmarked this overnight using a custom pgbench-based test that
used a Zipfian distribution, with a single-row SELECT and an UPDATE of
pgbench_accounts per xact. I used regular/logged tables this time
around, since WAL-logging is now fairly efficient. I added a separate
low cardinality index on pgbench_accounts(abalance). A low cardinality
index is the most interesting case for this patch, obviously, but it
also serves to prevent all HOT updates, increasing bloat for both
indexes. We want a realistic case that creates a lot of index bloat.

This wasn't a rigorous enough benchmark to present here in full, but
the results were very encouraging. With reasonable client counts for
the underlying hardware, we seem to have a small increase in TPS, and
a small decrease in latency. There is a regression with 128 clients,
when contention is ridiculously high (this is my home server, which
only has 4 cores). More importantly:

* The low cardinality index is almost 3x smaller with the patch -- no
surprises there.

* The read latency is where latency goes up, if it goes up at all.
Whatever it is that might increase latency, it doesn't look like it's
deduplication itself. Yeah, deduplication is expensive, but it's not
nearly as expensive as a page split. (I'm talking about the immediate
cost, not the bigger picture, though the bigger picture matters even
more.)

* The growth in primary key size over time is the thing I find really
interesting. The patch seems to really control the number of pages
splits over many hours with lots of non-HOT updates. I think that a
timeline of days or weeks could be really interesting.

I am now about 75% convinced that adding deduplication to unique
indexes is a good idea, at least as an option that is disabled by
default. We're already doing well here, even though there has been no
work on optimizing deduplication in unique indexes. Further
optimizations seem quite possible, though. I'm mostly thinking about
optimizing non-HOT updates by teaching nbtree some basic things about
versioning with unique indexes.

For example, we could remember "recently dead" duplicates of the value
we are about to insert (as part of an UPDATE statement) from within
_bt_check_unique(). Then, when it looks like a page split may be
necessary, we can hint to  _bt_dedup_one_page(): "please just
deduplicate the group of duplicates starting from this offset, which
are duplicates of the this new item I am inserting -- do not create a
posting list that I will have to split, though". We already cache the
binary search bounds established within _bt_check_unique() in
insertstate, so perhaps we could reuse that to make this work. The
goal here is that the the old/recently dead versions end up together
in their own posting list (or maybe two posting lists), whereas our
new/most current tuple is on its own. There is a very good chance that
our transaction will commit, leaving somebody else to set the LP_DEAD
bit on the posting list that contains those old versions. In short,
we'd be making deduplication and opportunistic garbage collection
cooperate closely.

This can work both ways -- maybe we should also teach
_bt_vacuum_one_page() to cooperate with _bt_dedup_one_page(). That is,
if we add the mechanism I just described in the last paragraph, maybe
_bt_dedup_one_page() marks the posting list that is likely to get its
LP_DEAD bit set soon with a new hint bit -- the LP_REDIRECT bit. Here,
LP_REDIRECT means "somebody is probably going to set the LP_DEAD bit
on this posting list tuple very soon". That way, if nobody actually
does set the LP_DEAD bit, _bt_vacuum_one_page() still has options. If
it goes to the heap and finds the latest version, and that that
version is visible to any possible MVCC snapshot, that means that it's
safe to kill all the other versions, even without the LP_DEAD bit set
-- this is a unique index. So, it often gets to kill lots of extra
garbage that it wouldn't get to kill, preventing page splits. The cost
is pretty low: the risk that the single heap page check will be a
wasted effort. (Of course, we still have to visit the heap pages of
things that we go on to kill, to get the XIDs to generate recovery
conflicts -- the important point is that we only need to visit one
heap page in _bt_vacuum_one_page(), to *decide* if it's possible to do
all this -- cases that don't benefit at all also don't pay very much.)

I don't think that we need to do either of these two other things to
justify committing the patch with unique index support. But, teaching
nbtree a little bit about versioning 

Re: Memory Accounting

2019-09-24 Thread Melanie Plageman
On Thu, Sep 19, 2019 at 11:00 AM Jeff Davis  wrote:

> On Wed, 2019-09-18 at 13:50 -0700, Soumyadeep Chakraborty wrote:
> > Hi Jeff,
>
> Hi Soumyadeep and Melanie,
>
> Thank you for the review!
>
> > max_stack_depth   max level   lazy (ms)   eager (ms)
> (eage
> > r/lazy)
> > 2MB   82  302.715 427.554 1.4123978
> > 3MB   3474567.829 896.143 1.578191674
> > 7.67MB86942657.9724903.0631.844663149
>
> Thank you for collecting data on this. Were you able to find any
> regression when compared to no memory accounting at all?
>
>
We didn't spend much time comparing performance with and without
memory accounting, as it seems like this was discussed extensively in
the previous thread.


> It looks like you agree with the approach and the results. Did you find
> any other issues with the patch?
>

We didn't observe any other problems with the patch and agree with the
approach. It is a good start.


>
> I am also including Robert in this thread. He had some concerns the
> last time around due to a small regression on POWER.
>

I think it would be helpful if we could repeat the performance tests
Robert did on that machine with the current patch (unless this version
of the patch is exactly the same as the ones he tested previously).

Thanks,
Soumyadeep & Melanie


Re: The flinfo->fn_extra question, from me this time.

2019-09-24 Thread Robert Haas
On Sun, Jul 21, 2019 at 5:55 PM Tom Lane  wrote:
> The FROM case could be improved perhaps, if somebody wanted to put
> time into it.  You'd still need to be prepared to build a tuplestore,
> in case of rescan or backwards fetch; but in principle you could return
> rows immediately while stashing them aside in a tuplestore.

But you could skip it if you could prove that no rescans or backward
fetches are possible for a particular node, something that we also
want for Gather, as discussed not long ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Take skip header out of a loop in COPY FROM

2019-09-24 Thread Tom Lane
Heikki Linnakangas  writes:
> On 22/08/2019 12:54, Adam Lee wrote:
>> My next thought is to call unlikely() here, but we don't have it...

> We do, actually, since commit aa3ca5e3dd in v10.

> Not sure it's worth the trouble here. Optimizing COPY in general would 
> be good, even small speedups there are helpful because everyone uses 
> COPY, but without some evidence I don't believe particular branch is 
> even measurable.

I concur that there's no reason to think that this if-test has a
measurable performance cost.  We're about to do CopyReadLine which
certainly has way more than one branch's worth of processing in it.

If we want to get involved with sprinkling unlikely() calls into
copy.c, the inner per-character or per-field loops would be the
place to look for wins IMO.

I'm going to mark this CF entry as Returned With Feedback.

regards, tom lane




Re: WIP/PoC for parallel backup

2019-09-24 Thread Robert Haas
On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman  wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans 
> PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This 
> commands will be send by each worker and that worker will be getting the said 
> files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

> I have done a basic proof of concenpt (POC), which is also attached. I would 
> appreciate some input on this. So far, I am simply dividing the list equally 
> and assigning them to worker processes. I intend to fine tune this by taking 
> into consideration file sizes. Further to add tar format support, I am 
> considering that each worker process, processes all files belonging to a 
> tablespace in its list (i.e. creates and copies tar file), before it 
> processes the next tablespace. As a result, this will create tar files that 
> are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file.  That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even.  If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones.  Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace 
> t2 has 10. Ignoring all other factors for the sake of this example, each 
> worker process will get a group of 4 files of t1 and 2 files of t2. Each 
> process will create 2 tar files, one for t1 containing 4 files and another 
> for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers.  Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: log message in proto.c

2019-09-24 Thread Robert Haas
On Tue, Sep 24, 2019 at 5:41 AM Fujii Masao  wrote:
> src/backend/replication/logical/proto.c
> action = pq_getmsgbyte(in);
> if (action != 'N')
> elog(ERROR, "expected new tuple but got %d",
> action);
>
> "%d" in the above message should be "%c" because the type of
> the variable "action" is char? There are other log messages that
> "%c" is used for such variable, in proto.c. Seems the above is
> only message that "%d" is used for such variable.

The potential problem with using %c to print characters is that the
character might be a null byte or something else that ends up making
the log file invalid under the relevant encoding.

However, if the goal of using %d is to protect against such problems,
it has to be done consistently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Tom Lane
Victor Wagner  writes:
> I'm attaching patch which uses solution mentioned above.
> It seems that chedk for SSL_OP_NO_TLSvX_Y is redundant if 
> we are checking for TLS_MAX_VERSION.

One thing I'm wondering is if it's safe to assume that TLS_MAX_VERSION
will be defined whenever these other symbols are.  Looking in an
0.9.8x install tree, that doesn't seem to define any of them; while
in 1.0.1e I see

./tls1.h:#define TLS1_1_VERSION 0x0302
./tls1.h:#define TLS1_2_VERSION 0x0303
./tls1.h:#define TLS_MAX_VERSIONTLS1_2_VERSION

So the patch seems okay for these two versions, but I have no data about
intermediate OpenSSL versions.

BTW, the spacing in this patch seems rather random.

regards, tom lane




Re: Unwanted expression simplification in PG12b2

2019-09-24 Thread Robert Haas
On Mon, Sep 23, 2019 at 9:20 PM Finnerty, Jim  wrote:
> If the function was moved to the FROM clause where it would be executed as a 
> lateral cross join instead of a target list expression, how would this affect 
> the cost-based positioning of the Gather?

I think you'd end up turning what is now a Seq Scan into a Nested Loop
with a Seq Scan on one side. I think the computation of the target
list would be done by the Function Scan or Result node on the other
side of the Nested Loop, and couldn't move anywhere else. The planner
would consider putting the Gather either on top of the Nested Loop or
on top of the Seq Scan, and the former would probably win. So I think
this would give the desired behavior, but I haven't thought about it
very hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: dropdb --force

2019-09-24 Thread vignesh C
On Tue, Sep 24, 2019 at 6:22 PM Amit Kapila  wrote:
>
> On Sat, Sep 21, 2019 at 10:09 PM Pavel Stehule  
> wrote:
> >
> > Thank you for check. I am sending updated patch
> >
>
Session termination in case of drop database is solved.
Some typos:
+ /*
+ * Similar code to pg_terminate_backend, but we check rigts first
+ * here, and only when we have all necessary rights we start to
+ * terminate other clients. In this case we should not to raise
+ * some warnings - like "PID %d is not a PostgreSQL server process",
+ * because for this purpose - already finished session is not
+ * problem.
+ */
"rigts" should be "rights/privilege"
"should not to raise" could be "should not raise"

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: FETCH FIRST clause WITH TIES option

2019-09-24 Thread Alvaro Herrera
Hmm,

create table w (a point);

# select * from w order by a fetch first 2 rows with ties;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a fetch first 2 rows with ties;
  ^
HINT:  Use an explicit ordering operator or modify the query.

I'm not sure that the HINT is useful here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-24, Nikolay Shaplov wrote:

> В письме от вторник, 24 сентября 2019 г. 9:25:54 MSK пользователь Alvaro 
> Herrera написал:
> 
> > 0003 looks useful, thanks for completing it.  I think it would be a good
> > idea to test invalid values for each type of reloption too (passing
> > floating point to integers, strings to floating point, and so on).
> 
> We already do it in reloption regression tests.
> 
> My idea was to test here only the things that can't be tested in regression 
> tests, on in real indexes like bloom.

I suppose that makes sense.  But of course when I push enum reloptions
I will have to add such a test, since bloom does not have one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Attempt to consolidate reading of XLOG page

2019-09-24 Thread Antonin Houska
Alvaro Herrera  wrote:

> I spent a couple of hours on this patchset today.  I merged 0001 and
> 0002, and decided the result was still messier than I would have liked,
> so I played with it a bit more -- see attached.  I think this is
> committable, but I'm afraid it'll cause quite a few conflicts with the
> rest of your series.
> 
> I had two gripes, which I feel solved with my changes:
> 
> 1. I didn't like that "dir" and "wal segment size" were part of the
> "currently open segment" supporting struct.  It seemed that those two
> were slightly higher-level, since they apply to every segment that's
> going to be opened, not just the current one.

ok

> My first thought was to put those as members of XLogReaderState, but
> that doesn't work because the physical walsender.c code does not use
> xlogreader at all, even though it is reading WAL.

`I don't remember clearly but I think that this was the reason I tried to move
"wal_segment_size" away from XLogReaderState.

 
> Separately from those two API-wise points, there was one bug which meant
> that with your 0002+0003 the recovery tests did not pass -- code
> placement bug.  I suppose the bug disappears with later patches in your
> series, which probably is why you didn't notice.  This is the fix for that:
> 
> -   XLogRead(cur_page, state->seg.size, state->seg.tli, targetPagePtr,
> -   state->seg.tli = pageTLI;
> +   state->seg.ws_tli = pageTLI;
> +   XLogRead(cur_page, state->segcxt.ws_segsize, state->seg.ws_tli, 
> targetPagePtr,
>  XLOG_BLCKSZ);
> 

Yes, it seems so - the following parts ensure that XLogRead() adjusts the
timeline itself. I only checked that the each part of the series keeps the
source tree compilable. Thanks for fixing.

> ... Also, yes, I renamed all the struct members.
>
> 
> If you don't have any strong dislikes for these changes, I'll push this
> part and let you rebase the remains on top.

No objections here.

> 2. Not a fan of the InvalidTimeLineID stuff offhand.  Maybe it's okay ...
> not convinced yet either way.

Well, it seems that the individual callbacks only use this constant in
Assert() statements. I'll consider if we really need it. The argument value
should not determine whether the callback derives the TLI or not.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Tom Lane
Alvaro Herrera  writes:
> ... I wonder if we should really continue to support
> OpenSSL 0.9.8.

Fair question, but post-rc1 is no time to be moving that goalpost
for the v12 branch.

> Anyway I suppose it's not impossible that third parties are still
> maintaining their 1.0.0 branch,

Another data point on that is that Red Hat is still supporting
1.0.1e in RHEL6.  I don't think we should assume that just because
OpenSSL upstream has dropped support for a branch, it no longer
exists in the wild.

Having said that, if it makes our lives noticeably easier to
drop support for 0.9.8 in HEAD, I won't stand in the way.

(We should survey the buildfarm and see what the older critters
are running, perhaps.)

regards, tom lane




Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-24 Thread Nikolay Shaplov
В письме от вторник, 24 сентября 2019 г. 9:25:54 MSK пользователь Alvaro 
Herrera написал:

> 0003 looks useful, thanks for completing it.  I think it would be a good
> idea to test invalid values for each type of reloption too (passing
> floating point to integers, strings to floating point, and so on).

We already do it in reloption regression tests.

My idea was to test here only the things that can't be tested in regression 
tests, on in real indexes like bloom.






Re: Optimze usage of immutable functions as relation

2019-09-24 Thread Tom Lane
rmrodrig...@carto.com writes:
> This commit is breaking some Postgis tests with custom types.

Hm, yeah, the code fails to consider the possibility that the function
returns a composite type.

For the moment I'm just going to make it punt if the function result
class isn't TYPEFUNC_SCALAR.  In principle, if we have a composite
result, we could disassemble it into per-column constant values, but
I'm not sure it'd be worth the work.

regards, tom lane




Re: PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-24, Victor Wagner wrote:

> Dear hackers,
> 
> PostgreSQL 12 documentation states, that minimum required version of
> OpenSSL is 0.9.8. However, I was unable to сompile current
> PGPRO_12_STABLE with OpenSSL 0.9.8j (from SLES 11sp4).

(Nice branch name.)  I wonder if we should really continue to support
OpenSSL 0.9.8.  That branch was abandoned by the OpenSSL dev group in
2015 ... and I wouldn't want to assume that there are no security
problems fixed in the meantime.  Why shouldn't we drop support for that
going forward, raising our minimum required OpenSSL version to be at
least something in the 1.0 branch?

(I'm not entirely sure about minor version numbers in OpenSSL -- it
seems 1.0.2 is still being maintained, but 1.0.0 itself was also
abandoned in 2016, as was 1.0.1.  As far as I understand they use the
alphabetical sequence *after* the three-part version number in the way
we use minor number; so 1.0.1u (2016) is the last there, and 1.0.2t is a
recent one in the maintained branch.

Along the same lines, 0.9.8j was released in Jan 2009.  The last in
0.9.8 was 0.9.8zi in December 2015.)

Anyway I suppose it's not impossible that third parties are still
maintaining their 1.0.0 branch, but I doubt anyone cares for 0.9.8 with
Postgres 12 ... particularly since SUSE themselves suggest not to use
the packaged OpenSSL for their stuff but rather stick to NSS.  That
said, in 2014 (!!) SUSE released OpenSSL 1.0.1 separately, for use with
SLES 11:
https://www.suse.com/c/introducing-the-suse-linux-enterprise-11-security-module/
Who would use the already obsolete SLES 11 (general support ended in
March 2019, though extended support ends in 2022) with Postgres 12?
That seems insane.

All that being said, I don't oppose to this patch, since it seems a
quick way to get out of the immediate trouble.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




DROP SUBSCRIPTION with no slot

2019-09-24 Thread Jeff Janes
I recently had to cut loose (pg_drop_replication_slot) a logical replica
that couldn't keep up and so was threatening to bring down the master.

In mopping up on the replica side, I couldn't just drop the subscription,
because it couldn't drop the nonexistent slot on the master and so refused
to work.  So I had to do a silly little dance where I first disable the
subscription, then ALTER SUBSCRIPTION ... SET (slot_name = NONE), then drop
it.

Wanting to clean up after itself is admirable, but if there is nothing to
clean up, why should that be an error condition?  Should this be an item on
https://wiki.postgresql.org/wiki/Todo (to whatever extent that is still
used).

Cheers,

Jeff


Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-24 Thread Nikolay Shaplov
В Fri, 20 Sep 2019 20:58:27 +0900
Michael Paquier  пишет:


Sorry I am really slow to answer... I hope it is not too late.

> The GUCs are also basically not necessary, as you can just replace the
> various WARNING calls (please don't call elog on anything which can be
> reached by the user!) by lookups at reloptions in pg_class.  Once this
> is removed, the whole code gets more simple, and there is no point in
> having neither a separate header nor a different set of files and the
> size of the whole module gets really reduced.

Reading options from pg_class is not a good idea. We already do this in
reloption regression test. Here the thing is almost same...

My point of testing was to read these values from bytea right from
inside of the module. This is not exactly the same value as in pg_class.
It _should_ be the same. But nobody promised is _is_ the same. That is
why I was reading it right from relotions in-memory bytea, the same way
real access methods do it.

And then we came to a GUC variables. Because it we have five reloptions
and we print them all each time we change something, there would be
quite huge output.
It is ok when everything goes well. Comparing with 'expected' is cheap.
But is something goes wrong, then it would be very difficult to find
proper place in this output to deal with it.
So I created GUCs so we can get only one output in a row, not a whole
bunch.

These are my points.







Re: pgbench - allow to create partitioned tables

2019-09-24 Thread Fabien COELHO


Hello Amit,


{...]
If you agree with this, then why haven't you changed below check in patch:

+ if (partition_method != PART_NONE)
+ printf("partition method: %s\npartitions: %d\n",
+PARTITION_METHOD[partition_method], partitions);

This is exactly the thing bothering me.  It won't be easy for others
to understand why this check for partitioning information is different
from other checks.


As I tried to explain with an example, using "partitions > 0" does not 
work in this case because you can have a partitioned table with zero 
partitions attached while benchmarking, but this cannot happen while 
creating.


For you or me, it might be okay as we have discussed this case, but it 
won't be apparent to others.  This doesn't buy us much, so it is better 
to keep this code consistent with other places that check for 
partitions.


Attached uses "partition_method != PART_NONE" consistently, plus an assert 
on "partitions > 0" for checking and for triggering the default method at 
the end of option processing.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..2d93f6fbb2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,25 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/*
+ * Number of "pgbench_accounts" partitions, found or to create.
+ * When creating, 0 is the default and means no partitioning.
+ * When running, this is the actual number of partitions.
+ */
+static int	partitions = 0;
+
+/* partitioning strategy for "pgbench_accounts" */
+typedef enum
+{
+	PART_NONE,		/* no partitioning */
+	PART_RANGE,	/* range partitioning */
+	PART_HASH		/* hash partitioning */
+}
+			partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = {"none", "range", "hash"};
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +636,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3623,88 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	/* as default is 100, it could be removed in this case */
+	snprintf(opts + strlen(opts), len - strlen(opts),
+			 " with (fillfactor=%d)", fillfactor);
+}
+
+/*
+ * Create "pgbench_accounts" partitions if needed.
+ *
+ * This is the larger table of pgbench default tpc-b like schema
+ * with a known size, so that it can be partitioned by range.
+ */
+static void
+createPartitions(PGconn *con)
+{
+	char		ff[64];
+
+	ff[0] = '\0';
+
+	/*
+	 * Per ddlinfo in initCreateTables below, fillfactor is needed on
+	 * table pgbench_accounts.
+	 */
+	append_fillfactor(ff, sizeof(ff));
+
+	/* we must have to create some partitions */
+	Assert(partitions > 0);
+
+	fprintf(stderr, "creating %d partitions...\n", partitions);
+
+	for (int p = 1; p <= partitions; p++)
+	{
+		char		query[256];
+
+		if (partition_method == PART_RANGE)
+		{
+			int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+			char		minvalue[32],
+		maxvalue[32];
+
+			/*
+			 * For RANGE, we use open-ended partitions at the beginning and
+			 * end to allow any valid value for the primary key.
+			 * Although the actual minimum and maximum values can be derived
+			 * from the scale, it is more generic and the performance is better.
+			 */
+			if (p == 1)
+sprintf(minvalue, "minvalue");
+			else
+

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-24 Thread Alexey Kondratov

On 20.09.2019 19:38, Alvaro Herrera wrote:

On 2019-Sep-19, Robert Haas wrote:


So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?

Hmm, my own reading of that was to add tablespace changing abilities to
ALTER TABLE *in addition* to this patch, not instead of it.


That was my understanding too.

On 20.09.2019 11:26, Jose Luis Tallon wrote:

On 20/9/19 4:06, Michael Paquier wrote:

Personally, I don't find this idea very attractive as ALTER TABLE is
already complicated enough with all the subqueries we already support
in the command, all the logic we need to maintain to make combinations
of those subqueries in a minimum number of steps, and also the number
of bugs we have seen because of the amount of complication present.


Yes, but please keep the other options: At it is, cluster, vacuum full 
and reindex already rewrite the table in full; Being able to write the 
result to a different tablespace than the original object was stored 
in enables a whole world of very interesting possibilities 
including a quick way out of a "so little disk space available that 
vacuum won't work properly" situation --- which I'm sure MANY users 
will appreciate, including me 


Yes, sure, that was my main motivation. The first message in the thread 
contains a patch, which adds SET TABLESPACE support to all of CLUSTER, 
VACUUM FULL and REINDEX. However, there came up an idea to integrate 
CLUSTER/VACUUM FULL with ALTER TABLE and do their work + all the ALTER 
TABLE stuff in a single table rewrite. I've dig a little bit into this 
and ended up with some architectural questions and concerns [1]. So I 
decided to start with a simple REINDEX patch.


Anyway, I've followed Michael's advice and split the last patch into two:

1) Adds all the main functionality, but with simplified 'REINDEX INDEX [ 
CONCURRENTLY ] ... [ TABLESPACE ... ]' grammar;


2) Adds a more sophisticated syntax with '[ SET TABLESPACE ... [ NOWAIT 
] ]'.


Patch 1 contains all the docs and tests and may be applied/committed 
separately or together with 2, which is fully optional.


Recent merge conflicts and reindex_index validations order are also 
fixed in the attached version.


[1] 
https://www.postgresql.org/message-id/6b2a5c4de19f111ef24b63428033bb67%40postgrespro.ru



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 4f06996f1e86dee389cb0f901cb83dba77c2abd8 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 24 Sep 2019 12:29:57 +0300
Subject: [PATCH v3 1/2] Allow REINDEX and REINDEX CONCURRENTLY to change
 TABLESPACE

---
 doc/src/sgml/ref/reindex.sgml | 23 ++
 src/backend/catalog/index.c   | 99 ---
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 34 +---
 src/backend/commands/tablecmds.c  | 59 --
 src/backend/parser/gram.y | 21 +++--
 src/backend/tcop/utility.c| 16 +++-
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h |  6 +-
 src/include/commands/tablecmds.h  |  2 +
 src/include/nodes/parsenodes.h|  1 +
 src/test/regress/input/tablespace.source  | 31 +++
 src/test/regress/output/tablespace.source | 41 ++
 13 files changed, 279 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a..96c9363ad9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ]
 
  
 
@@ -165,6 +166,28 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURR
 

 
+   
+TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Can be used only with REINDEX INDEX and
+  REINDEX TABLE, since the system indexes are not
+  movable, but SCHEMA, DATABASE or
+  SYSTEM very likely will has one. 
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 098732cc4a..b2fed5dc75 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1239,7 +1239,8 @@ index_create(Relation heapRelation,
  * on.  This is called during concurrent reindex processing.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
+			   Oid 

Re: dropdb --force

2019-09-24 Thread Amit Kapila
On Sat, Sep 21, 2019 at 10:09 PM Pavel Stehule  wrote:
>
> Thank you for check. I am sending updated patch
>

Alvaro has up thread suggested some alternative syntax [1] for this
patch, but I don't see any good argument to not go with what he has
proposed.  In other words, why we should prefer the current syntax as
in the patch over what Alvaro has proposed?

IIUC,  the current syntax implemented by the patch is:
Drop Database [(options)] [If Exists] name
Alvaro suggested using options at the end (and use optional keyword
WITH) based on what other Drop commands does.  I see some merits to
that idea which are (a) if tomorrow we want to introduce new options
like CASCADE, RESTRICT then it will be better to have all the options
at the end as we have for other Drop commands, (b) It will resemble
more with Create Database syntax.

Now, I think the current syntax is also not bad and we already do
something like that for other commands like Vaccum where options are
provided before object_name, but I think in this case putting at the
end is more appealing unless there are some arguments against that.

One other minor comment:
+
+  This will also fail, if the connections do not terminate in 5 seconds.
+ 

Is there any implementation in the patch for the above note?

[1] - 
https://www.postgresql.org/message-id/20190903164633.GA16408%40alvherre.pgsql

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-24, Michael Paquier wrote:

> I looked at that over the last couple of days, and done as attached.
> Well, the actual module is in 0003.  I have added more comments to
> document the basic AM calls so as it can easier be used as a template
> for some other work, and tweaked a couple of things.  0001 and 0002
> are just the patches from the other thread to address the issues with
> the lock mode of custom reloptions.

0003 looks useful, thanks for completing it.  I think it would be a good
idea to test invalid values for each type of reloption too (passing
floating point to integers, strings to floating point, and so on).
If you can get this pushed, I'll push the enum reloptions on top.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Index Skip Scan

2019-09-24 Thread Alvaro Herrera
On 2019-Sep-24, Kyotaro Horiguchi wrote:

> Sorry, it's not a boolean. A tristate value. From the definition
> (Back, NoMove, Forward) = (-1, 0, 1), (dir1 == -dir2) if
> NoMovement did not exist. If it is not guranteed,
> 
> (dir1 != 0 && dir1 == -dir2) ?

Maybe just add ScanDirectionIsOpposite(dir1, dir2) with that
definition? :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Memory Accounting

2019-09-24 Thread Tomas Vondra

On Tue, Sep 24, 2019 at 02:21:40PM +0900, Michael Paquier wrote:

On Wed, Jul 24, 2019 at 11:52:28PM +0200, Tomas Vondra wrote:

I think Heikki was asking about places with a lot of sub-contexts, which a
completely different issue. It used to be the case that some aggregates
created a separate context for each group - like array_agg. That would
make Jeff's approach to accounting rather inefficient, because checking
how much memory is used would be very expensive (having to loop over a
large number of contexts).


The patch has been marked as ready for committer for a week or so, but
it seems to me that this comment has not been addressed, no?  Are we
sure that we want this method if it proves to be inefficient when
there are many sub-contexts and shouldn't we at least test such
scenarios with a worst-case, customly-made, function?


I don't think so.

Aggregates creating many memory contexts (context for each group) was
discussed extensively in the thread about v11 [1] in 2015. And back then
the conclusion was that that's a pretty awful pattern anyway, as it uses
much more memory (no cross-context freelists), and has various other
issues. In a way, those aggregates are wrong and should be fixed just
like we fixed array_agg/string_agg (even without the memory accounting).

The way I see it we can do either eager or lazy accounting. Eager might
work better for aggregates with many contexts, but it does increase the
overhead for the "regular" aggregates with just one or two contexts.
Considering how rare those many-context aggregates are (I'm not aware of
any such aggregate at the moment), it seems reasonable to pick the lazy
accounting.

(Note: Another factor affecting the lazy vs. eager efficiency is the
number of palloc/pfree calls vs. calls to determine amount of mem used,
but that's mostly orthogonal and we cn ignore it here).

So I think the approach Jeff ended up with sensible - certainly as a
first step. We may improve it in the future, of course, once we have
more practical experience.

Barring objections, I do plan to get this committed by the end of this
CF (i.e. sometime later this week).

[1] 
https://www.postgresql.org/message-id/1434311039.4369.39.camel%40jeff-desktop

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Excessive disk usage in WindowAgg

2019-09-24 Thread Andrew Gierth
This one just came up on IRC:

create table tltest(a integer, b text, c text, d text);
insert into tltest
  select i, repeat('foo',100), repeat('foo',100), repeat('foo',100)
from generate_series(1,10) i;
set log_temp_files=0;
set client_min_messages=log;

select count(a+c) from (select a, count(*) over () as c from tltest s1) s;
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp82513.3", size 9260

Using 92MB of disk for one integer seems excessive; the reason is clear
from the explain:

QUERY PLAN  
  
--
 Aggregate  (cost=16250.00..16250.01 rows=1 width=8) (actual 
time=1236.260..1236.260 rows=1 loops=1)
   Output: count((tltest.a + (count(*) OVER (?
   ->  WindowAgg  (cost=0.00..14750.00 rows=10 width=12) (actual 
time=1193.846..1231.216 rows=10 loops=1)
 Output: tltest.a, count(*) OVER (?)
 ->  Seq Scan on public.tltest  (cost=0.00..13500.00 rows=10 
width=4) (actual time=0.006..14.361 rows=10 loops=1)
   Output: tltest.a, tltest.b, tltest.c, tltest.d

so the whole width of the table is being stored in the tuplestore used
by the windowagg.

In create_windowagg_plan, we have:

/*
 * WindowAgg can project, so no need to be terribly picky about child
 * tlist, but we do need grouping columns to be available
 */
subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);

Obviously we _do_ need to be more picky about this; it seems clear that
using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many cases.
Opinions?

-- 
Andrew (irc:RhodiumToad)




Re: Optimze usage of immutable functions as relation

2019-09-24 Thread rmrodriguez
Hi,

This commit is breaking some Postgis tests with custom types.

Here is a minimal repro (Postgis not required)

```
-- test custom types
create type t_custom_type AS (
valid bool,
reason varchar,
location varchar
);

create or replace function f_immutable_custom_type(i integer)
returns t_custom_type as
$$ declare oCustom t_custom_type;
begin
select into oCustom true as valid, 'OK' as reason, NULL as location;
return oCustom;
end; $$ language plpgsql immutable;

select valid, reason, location from f_immutable_custom_type(3);

drop function f_immutable_custom_type;
drop type t_custom_type;
```

Expected (PG12):
```
valid | reason | location
---++--
 t | OK |
(1 row)
```

Instead with master/HEAD (eb57bd9c1d) we are getting:
```
ERROR:  could not find attribute 2 in subquery targetlist
```

Reverting 8613eda50488c27d848f8e8caa493c9d8e1b5271 fixes it,
but I haven't looked into the reason behind the bug.

Regards
-- 
Raúl Marín Rodríguez
carto.com




Re: abort-time portal cleanup

2019-09-24 Thread Dilip Kumar
On Fri, Sep 13, 2019 at 2:13 AM Robert Haas  wrote:
>
/*
* Otherwise, do nothing to cursors held over from a previous
* transaction.
*/
if (portal->createSubid == InvalidSubTransactionId)
continue;

/*
* Do nothing to auto-held cursors.  This is similar to the case of a
* cursor from a previous transaction, but it could also be that the
* cursor was auto-held in this transaction, so it wants to live on.
*/
if (portal->autoHeld)
continue;

I have one doubt that why do we need the second check. Because before
setting portal->autoHeld to true we always call HoldPortal therein we
set portal->createSubid to InvalidSubTransactionId.  So it seems to me
that the second condition will never reach.  Am I missing something?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Victor Wagner
On Tue, 24 Sep 2019 18:49:17 +0900
Michael Paquier  wrote:

> On Tue, Sep 24, 2019 at 10:18:59AM +0300, Victor Wagner wrote:
> > PostgreSQL 12 documentation states, that minimum required version of
> > OpenSSL is 0.9.8. However, I was unable to сompile current
> > PGPRO_12_STABLE with OpenSSL 0.9.8j (from SLES 11sp4).
> 
> I can reproduce that with REL_12_STABLE and the top of
> OpenSSL_0_9_8-stable fromx OpenSSL's git.
> 
> > Replacing all 
> > 
> > #ifdef TLS1_1_VERSION
> > 
> > with
> > 
> > #if defined(TLS1_1_VERSION) && TLS1_1_VERSION <= TLS_MAX_VERSION
> > 
> > and analogue for TLS1_2_VERSION fixes the problem.
> 
> That sounds like a plan.  
[skip] 
> > ...
> > (line 1290). In this case check for TLS1_1_VERSION <=
> > TLS_MAX_VERSION seems to be more self-explanatory, than check for
> > somewhat unrelated symbol SSL_OP_NO_TLSv1_1
> 
> That sounds right.  Victor, would you like to write a patch?

I'm attaching patch which uses solution mentioned above.
It seems that chedk for SSL_OP_NO_TLSvX_Y is redundant if 
we are checking for TLS_MAX_VERSION.
-- 
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index c97c811..e24d7de 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1287,19 +1287,19 @@ ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel)
 		case PG_TLS1_VERSION:
 			return TLS1_VERSION;
 		case PG_TLS1_1_VERSION:
-#ifdef TLS1_1_VERSION
+#if defined(TLS1_1_VERSION) && TLS1_1_VERSION <= TLS_MAX_VERSION
 			return TLS1_1_VERSION;
 #else
 			break;
 #endif
 		case PG_TLS1_2_VERSION:
-#ifdef TLS1_2_VERSION
+#if defined(TLS1_2_VERSION) &&  TLS1_2_VERSION <= TLS_MAX_VERSION
 			return TLS1_2_VERSION;
 #else
 			break;
 #endif
 		case PG_TLS1_3_VERSION:
-#ifdef TLS1_3_VERSION
+#if defined(TLS1_3_VERSION)  &&  TLS1_2_VERSION <= TLS_MAX_VERSION
 			return TLS1_3_VERSION;
 #else
 			break;
@@ -1335,11 +1335,11 @@ SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version)
 
 	if (version > TLS1_VERSION)
 		ssl_options |= SSL_OP_NO_TLSv1;
-#ifdef TLS1_1_VERSION
+#if defined(TLS1_1_VERSION) && TLS1_1_VERSION <= TLS_MAX_VERSION
 	if (version > TLS1_1_VERSION)
 		ssl_options |= SSL_OP_NO_TLSv1_1;
 #endif
-#ifdef TLS1_2_VERSION
+#if defined(TLS1_2_VERSION) && TLS1_2_VERSION <= TLS_MAX_VERSION
 	if (version > TLS1_2_VERSION)
 		ssl_options |= SSL_OP_NO_TLSv1_2;
 #endif
@@ -1356,11 +1356,11 @@ SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version)
 
 	AssertArg(version != 0);
 
-#ifdef TLS1_1_VERSION
+#if defined(TLS1_1_VERSION) && TLS1_1_VERSION <= TLS_MAX_VERSION
 	if (version < TLS1_1_VERSION)
 		ssl_options |= SSL_OP_NO_TLSv1_1;
 #endif
-#ifdef TLS1_2_VERSION
+#if defined(TLS1_2_VERSION) && TLS1_2_VERSION <= TLS_MAX_VERSION
 	if (version < TLS1_2_VERSION)
 		ssl_options |= SSL_OP_NO_TLSv1_2;
 #endif


pgpTlATx598L1.pgp
Description: OpenPGP digital signature


Re: Option to dump foreign data in pg_dump

2019-09-24 Thread Luis Carril
On Fri, Sep 20, 2019 at 6:20 PM Luis Carril 
mailto:luis.car...@swarm64.com>> wrote:
Hello,
   thanks for the comments!

* + if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)

filter condition is not implemented completely yet so the logic only work on 
foreign table so I think its better to handle it separately

Note that there is another if condition that actually applies the the 
filtercondition if provided, also for a we need to do a COPY SELECT instead of 
a COPY TO

but we can't supplied where clause in pg_dump yet so filtercondtion is always 
NULL and the logic became true only on foreign table.

* I don’t understand the need for changing SELECT query .we can use the same 
SELECT query syntax for both regular table and foreign table

To which query do you refer? In the patch there are three queries: 1 retrieves 
foreign servers, another is the SELECT in the COPY that now it applies in case 
of a filter condition of a foreign table, and a third that retrieves the oid of 
a given foreign server.


SELECT on COPY

regards
Surafel
If we have a non-foreign table and filtercond is NULL, then we can do a `COPY 
table columns TO stdout`.
But if the table is foreign, the `COPY foreign-table columns TO stdout` is not 
supported by Postgres, so we have to do a `COPY (SELECT columns FROM 
foreign-table) TO sdout`

Now if in any case the filtercond is non-NULL, ie we have a WHERE clause, then 
for non-foreign and foreign tables we have to do a:
`COPY (SELECT columns FROM table) TO sdout`

So the COPY of a foreign table has to be done using the sub-SELECT just as a 
non-foreign table with filtercond, not like a non-foreign table without 
filtercond.

Cheers

Luis M Carril



Re: PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Michael Paquier
On Tue, Sep 24, 2019 at 10:18:59AM +0300, Victor Wagner wrote:
> PostgreSQL 12 documentation states, that minimum required version of
> OpenSSL is 0.9.8. However, I was unable to сompile current
> PGPRO_12_STABLE with OpenSSL 0.9.8j (from SLES 11sp4).

I can reproduce that with REL_12_STABLE and the top of
OpenSSL_0_9_8-stable fromx OpenSSL's git.

> It is not so. Here is exempt from tls1.h header file from the openssl
> 0.9.8j
> 
> #define TLS1_VERSION0x0301
> #define TLS1_1_VERSION  0x0302
> #define TLS1_2_VERSION  0x0303
> /* TLS 1.1 and 1.2 are not supported by this version of OpenSSL, so
>  * TLS_MAX_VERSION indicates TLS 1.0 regardless of the above
>  * definitions. (s23_clnt.c and s23_srvr.c have an OPENSSL_assert()
>  * check that would catch the error if TLS_MAX_VERSION was too low.)
>  */
> #define TLS_MAX_VERSION TLS1_VERSION

Indeed, we rely currently on a false assumption that the version is
supported if the object is defined.  That's clearly wrong.

> Replacing all 
> 
> #ifdef TLS1_1_VERSION
> 
> with
> 
> #if defined(TLS1_1_VERSION) && TLS1_1_VERSION <= TLS_MAX_VERSION
> 
> and analogue for TLS1_2_VERSION fixes the problem.

That sounds like a plan.  

> Really, problem is that symbol SSL_OP_NO_TLSv1_1 (and 1_2 accordingly)
> might be undefined even if TLS1_1_VERSION defined. 
> 
> Replacing 
> 
> #ifdef TLS1_1_VERSION
> 
> with 
> 
> #ifdef SSL_OP_NO_TLSv1_1

Hmm.  Wouldn't it be better to check if the maximum version of TLS is
supported and if SSL_OP_NO_TLSv1_1 is defined (same for 1.2)?

> But there is third (first from start of file) one.
> ...
> case PG_TLS1_1_VERSION:
> #ifdef TLS1_1_VERSION
> return TLS1_1_VERSION;
> #else
> break;
> #endif
> ...
> (line 1290). In this case check for TLS1_1_VERSION <= TLS_MAX_VERSION
> seems to be more self-explanatory, than check for somewhat unrelated 
> symbol SSL_OP_NO_TLSv1_1

That sounds right.  Victor, would you like to write a patch?
--
Michael


signature.asc
Description: PGP signature


log message in proto.c

2019-09-24 Thread Fujii Masao
Hi,

src/backend/replication/logical/proto.c
action = pq_getmsgbyte(in);
if (action != 'N')
elog(ERROR, "expected new tuple but got %d",
action);

"%d" in the above message should be "%c" because the type of
the variable "action" is char? There are other log messages that
"%c" is used for such variable, in proto.c. Seems the above is
only message that "%d" is used for such variable.

Regards,

-- 
Fujii Masao


improve-elog-message-in-proto.patch
Description: Binary data


Re: abort-time portal cleanup

2019-09-24 Thread Amit Kapila
On Fri, Sep 13, 2019 at 2:13 AM Robert Haas  wrote:
>
> I've been studying At{Sub,}{Abort,Cleanup}_Portals() for the last few
> days and have come to the conclusion that the code is not entirely up
> to our usual standards. I believe that a good deal of the reason for
> this is attributable to the poor quality of the code comments in this
> area, although there are perhaps some other contributing factors as
> well, such as bullheadedness on my part and perhaps others.
>
> The trouble starts with the header comment for AtAbort_Portals, which
> states that "At this point we run the cleanup hook if present, but we
> can't release the portal's memory until the cleanup call." At the time
> this logic was introduced in commit
> de28dc9a04c4df5d711815b7a518501b43535a26 (2003-05-02),
> AtAbort_Portals() affected all non-held portals without caring whether
> they were active or not, and, UserAbortTransactionBlock() called
> AbortTransaction() directly, so typing "ROLLBACK;" would cause
> AtAbort_Portals() to be reached from within PortalRun().  Even if
> PortalRun() managed to return without crashing, the caller would next
> try to call PortalDrop() on what was now an invalid pointer. However,
> commit 8f9f1986034a2273e09ad10671e10d1adda21d1f (2004-09-16) changed
> things so that UserAbortEndTransaction() just sets things up so that
> the subsequent call to CommitTransactionCommand() would call
> AbortTransaction() instead of trying to do it right away, and that
> doesn't happen until after we're done with the portal.  As far as I
> can see, that change made this comment mostly false, but the comment
> has nevertheless managed to survive for another ~15 years. I think we
> can, and in fact should, just drop the portal right here.
>
> As far as actually making that work, there are a few wrinkles. The
> first is that we might be in the middle of FATAL. In that case, unlike
> the ROLLBACK case, a call to PortalRun() is still on the stack, but
> we'll exit the process rather than returning, so the fact that we've
> created a dangling pointer for the caller won't matter. However, as
> shown by commit ad9a274778d2d88c46b90309212b92ee7fdf9afe (2018-02-01)
> and the report that led up to it at
> https://www.postgresql.org/message-id/20180128034531.h6o4w3727ifof3jy%40alap3.anarazel.de
> it's not a good idea to try to clean up the portal in that case,
> because we might've already started shutting down critical systems.
> It seems not only risky but also unnecessary: our process-local state
> is about to go away, and the executor shouldn't need to clean up any
> shared memory state that won't also get cleaned up by some other
> mechanism. So, it seems to me that if we reach AtAbort_Portals()
> during FATAL processing, we should either (1) do nothing at all and
> just return or (2) forget about all the existing portals without
> cleaning them up and then return.  The second option seems a little
> safer to me, because it guarantees that if we somehow reach code that
> might try to look up a portal later, it won't find anything. But I
> think it's arguable.
>

I agree with your position on this.

>
> Attached is a patch that (1) removes At(Sub)Cleanup_Portals() entirely
> in favor of dropping portals on the spot in At(Sub)Abort_Portals(),
> (2) overhauls the comments in this area, and (3) makes
> AtSubAbort_Portals() consistent with AtAbort_Portals().

The overall idea seems good to me, but I have a few comments on the changes.

1.
@@ -2756,7 +2756,6 @@ CleanupTransaction(void)
  /*
  * do abort cleanup processing
  */
- AtCleanup_Portals(); /* now safe to release portal
memory */
  AtEOXact_Snapshot(false, true); /* and release the transaction's
snapshots */

  CurrentResourceOwner = NULL; /* and resource owner */
@@ -5032,8 +5031,6 @@ CleanupSubTransaction(void)
  elog(WARNING, "CleanupSubTransaction while in %s state",
  TransStateAsString(s->state));

- AtSubCleanup_Portals(s->subTransactionId);
-

After this cleanup, I think we don't need At(Sub)Abort_Portals in
AbortOutOfAnyTransaction() for the states TBLOCK_(SUB)ABORT and
friends. This is because AbortTransaction itself would have zapped the
portal.

2. You seem to forgot removing AtCleanup_Portals() from portal.h

3.
  /*
- * If it was created in the current transaction, we
can't do normal
- * shutdown on a READY portal either; it might refer to
objects
- * created in the failed transaction.  See comments in
- * AtSubAbort_Portals.
- */
- if (portal->status == PORTAL_READY)
- MarkPortalFailed(portal);
-

Why it is safe to remove this check?  It has been explained in commit
7981c342 why we need that check.  I don't see any explanation in email
or patch which justifies this code removal.  Is it because you removed
PortalCleanup?  If so, that is still called from PortalDrop?

4.
-AtCleanup_Portals(void)
-{
- HASH_SEQ_STATUS status;
- PortalHashEnt *hentry;
-
- hash_seq_init(, PortalHashTable);
-
- while ((hentry = (PortalHashEnt *) hash_seq_search()) !=
NULL)
- {
- 

Re: Index Skip Scan

2019-09-24 Thread Kyotaro Horiguchi
At Tue, 24 Sep 2019 17:35:47 +0900 (Tokyo Standard Time), Kyotaro Horiguchi 
 wrote in 
<20190924.173547.226622711.horikyota@gmail.com>
> At Sun, 22 Sep 2019 23:02:04 -0300, Alvaro Herrera  
> wrote in <20190923020204.GA2781@alvherre.pgsql>
> > On 2019-Sep-22, Dmitry Dolgov wrote:
> > 
> > > > I think multiplying two ScanDirections to watch for a negative result is
> > > > pretty ugly:
> > > 
> > > Probably, but the only alternative I see to check if directions are 
> > > opposite is
> > > to check that directions come in pairs (back, forth), (forth, back). Is 
> > > there
> > > an easier way?
> > 
> > Maybe use the ^ operator?
> 
> It's not a logical operator but a bitwise arithmetic operator,
> which cannot be used if the operands is guaranteed to be 0 or 1
> (in integer).  In a-kind-of-standard, but hacky way, "(!a != !b)"
> works as desired since ! is a logical operator.
> 
> Wouldn't we use (a && !b) || (!a && b)? Compiler will optimize it
> some good way.

Sorry, it's not a boolean. A tristate value. From the definition
(Back, NoMove, Forward) = (-1, 0, 1), (dir1 == -dir2) if
NoMovement did not exist. If it is not guranteed,

(dir1 != 0 && dir1 == -dir2) ?


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Speedup truncates of relation forks

2019-09-24 Thread Fujii Masao
On Thu, Sep 19, 2019 at 9:42 AM Jamison, Kirk  wrote:
>
> On Wednesday, September 18, 2019 8:38 PM, Fujii Masao wrote:
> > On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk 
> > wrote:
> > >
> > > On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:
> > > > On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera
> > > > 
> > > > wrote:
> > > > >
> > > > > On 2019-Sep-13, Fujii Masao wrote:
> > > > >
> > > > > > On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk
> > > > > > 
> > > > wrote:
> > > > >
> > > > > > > > Please add a preliminary patch that removes the function.
> > > > > > > > Dead code is good, as long as it is gone.  We can get it
> > > > > > > > pushed ahead of
> > > > the rest of this.
> > > > > > >
> > > > > > > Alright. I've attached a separate patch removing the
> > smgrdounlinkfork.
> > > > > >
> > > > > > Per the past discussion, some people want to keep this "dead"
> > > > > > function for some reasons. So, in my opinion, it's better to
> > > > > > just enclose the function with #if NOT_USED and #endif, to keep
> > > > > > the function itself as it is, and then to start new discussion
> > > > > > on hackers about the removal of that separatedly from this patch.
> > > > >
> > > > > I searched for anybody requesting to keep the function.  I
> > > > > couldn't find anything.  Tom said in 2012:
> > > > > https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.u
> > > > > s
> > > >
> > > > Yes. And I found Andres.
> > > > https://www.postgresql.org/message-id/20180621174129.hogefyopje4xazn
> > > > u@al
> > > > ap3.anarazel.de
> > > >
> > > > > > As committed, the smgrdounlinkfork case is actually dead code;
> > > > > > it's never called from anywhere.  I left it in place just in
> > > > > > case we want it someday.
> > > > >
> > > > > but if no use has appeared in 7 years, I say it's time to kill it.
> > > >
> > > > +1
> > >
> > > The consensus is we remove it, right?
> > > Re-attaching the patch that removes the deadcode: smgrdounlinkfork().
> > >
> > > ---
> > > I've also fixed Fujii-san's comments below in the latest attached speedup
> > truncate rel patch (v8).
> >
> > Thanks for updating the patch!
> >
> > + block = visibilitymap_truncate_prepare(rel, 0); if
> > + (BlockNumberIsValid(block))
> >   {
> > - xl_smgr_truncate xlrec;
> > + fork = VISIBILITYMAP_FORKNUM;
> > + smgrtruncate(rel->rd_smgr, , 1, );
> > +
> > + if (RelationNeedsWAL(rel))
> > + {
> > + xl_smgr_truncate xlrec;
> >
> > I don't think this fix is right. Originally, WAL is generated even in the
> > case where visibilitymap_truncate_prepare() returns InvalidBlockNumber. But
> > the patch unexpectedly changed the logic so that WAL is not generated in 
> > that
> > case.
> >
> > + if (fsm)
> > + FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
> > + InvalidBlockNumber);
> >
> > This code means that FreeSpaceMapVacuumRange() is called if FSM exists even
> > if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
> > This seems not right. Originally, FreeSpaceMapVacuumRange() is not called
> > in the case where InvalidBlockNumber is returned.
> >
> > So I updated the patch based on yours and fixed the above issues.
> > Attached. Could you review this one? If there is no issue in that, I'm 
> > thinking
> > to commit that.
>
> Oops. Thanks for the catch to correct my fix and revision of some 
> descriptions.
> I also noticed you reordered the truncation of forks,  by which main fork 
> will be
> truncated first instead of FSM. I'm not sure if the order matters now given 
> that
> we're truncating the forks simultaneously, so I'm ok with that change.

I changed that order so that DropRelFileNodeBuffers() can scan shared_buffers
more efficiently. Usually the number of buffers for MAIN fork is larger than
the others, in shared_buffers. So it's better to compare MAIN fork first for
performance, during full scan of shared_buffers.

> Just one minor comment:
> + * Return the number of blocks of new FSM after it's truncated.
>
> "after it's truncated" is quite confusing.
> How about, "as a result of previous truncation" or just end the sentence 
> after new FSM?

Thanks for the comment!
I adopted the latter and committed the patch. Thanks!

Regards,

-- 
Fujii Masao




Re: Index Skip Scan

2019-09-24 Thread Kyotaro Horiguchi
At Sun, 22 Sep 2019 23:02:04 -0300, Alvaro Herrera  
wrote in <20190923020204.GA2781@alvherre.pgsql>
> On 2019-Sep-22, Dmitry Dolgov wrote:
> 
> > > I think multiplying two ScanDirections to watch for a negative result is
> > > pretty ugly:
> > 
> > Probably, but the only alternative I see to check if directions are 
> > opposite is
> > to check that directions come in pairs (back, forth), (forth, back). Is 
> > there
> > an easier way?
> 
> Maybe use the ^ operator?

It's not a logical operator but a bitwise arithmetic operator,
which cannot be used if the operands is guaranteed to be 0 or 1
(in integer).  In a-kind-of-standard, but hacky way, "(!a != !b)"
works as desired since ! is a logical operator.

Wouldn't we use (a && !b) || (!a && b)? Compiler will optimize it
some good way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Sort policies and triggers by table name in pg_dump.

2019-09-24 Thread Benjie Gillam
> Could you provide a simple example of schema (tables with some
> policies and triggers), with the difference this generates for
> pg_dump, which shows your point?

Certainly; I've attached a bash script that can reproduce the issue
and the diff that it produces, here's the important part:

CREATE TRIGGER a BEFORE INSERT ON foo
FOR EACH ROW EXECUTE PROCEDURE qux();
CREATE POLICY a ON foo FOR SELECT USING (true);

CREATE TRIGGER a BEFORE INSERT ON bar
FOR EACH ROW EXECUTE PROCEDURE qux();
CREATE POLICY a ON bar FOR SELECT USING (true);

Here we create two identically named triggers and two identically
named policies on tables foo and bar. If instead we ran these
statements in a different order (or if the object IDs were to wrap)
the order of the pg_dump would be different even though the
databases are identical other than object IDs. The attached
patch eliminates this difference.

> Your patch has two warnings because you are trying to map a policy
> info pointer to a trigger info pointer:

Ah, thank you for the pointer (aha); I've attached an updated patch
that addresses this copy/paste issue.
From 42c6a84f0e8608b8161ff320628f541f330ac2d0 Mon Sep 17 00:00:00 2001
From: Benjie Gillam 
Date: Mon, 23 Sep 2019 21:18:24 +0100
Subject: [PATCH] Sort policies and triggers by table name in pg_dump.
To: pgsql-hack...@postgresql.org

---
 src/bin/pg_dump/pg_dump_sort.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 31fc06a255..08ab9c6b95 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -207,6 +207,28 @@ DOTypeNameCompare(const void *p1, const void *p2)
 		if (cmpval != 0)
 			return cmpval;
 	}
+	else if (obj1->objType == DO_POLICY)
+	{
+		PolicyInfo *pobj1 = *(PolicyInfo *const *) p1;
+		PolicyInfo *pobj2 = *(PolicyInfo *const *) p2;
+
+		/* Sort by table name */
+		cmpval = strcmp(pobj1->poltable->dobj.name,
+		pobj2->poltable->dobj.name);
+		if (cmpval != 0)
+			return cmpval;
+	}
+	else if (obj1->objType == DO_TRIGGER)
+	{
+		TriggerInfo *tobj1 = *(TriggerInfo *const *) p1;
+		TriggerInfo *tobj2 = *(TriggerInfo *const *) p2;
+
+		/* Sort by table name */
+		cmpval = strcmp(tobj1->tgtable->dobj.name,
+		tobj2->tgtable->dobj.name);
+		if (cmpval != 0)
+			return cmpval;
+	}
 
 	/* Usually shouldn't get here, but if we do, sort by OID */
 	return oidcmp(obj1->catId.oid, obj2->catId.oid);
-- 
2.17.1



reproduce.sh
Description: application/shellscript
CREATE FUNCTION
CREATE TABLE
CREATE TABLE
CREATE TRIGGER
CREATE POLICY
CREATE TRIGGER
CREATE POLICY
CREATE FUNCTION
CREATE TABLE
CREATE TABLE
CREATE TRIGGER
CREATE POLICY
CREATE TRIGGER
CREATE POLICY
--- /dev/fd/60	2019-09-24 08:33:03.574700824 +0100
+++ /dev/fd/45	2019-09-24 08:33:03.574700824 +0100
@@ -70,31 +70,31 @@
 
 
 --
--- Name: foo a; Type: TRIGGER; Schema: public; Owner: benjie
+-- Name: bar a; Type: TRIGGER; Schema: public; Owner: benjie
 --
 
-CREATE TRIGGER a BEFORE INSERT ON public.foo FOR EACH ROW EXECUTE PROCEDURE public.qux();
+CREATE TRIGGER a BEFORE INSERT ON public.bar FOR EACH ROW EXECUTE PROCEDURE public.qux();
 
 
 --
--- Name: bar a; Type: TRIGGER; Schema: public; Owner: benjie
+-- Name: foo a; Type: TRIGGER; Schema: public; Owner: benjie
 --
 
-CREATE TRIGGER a BEFORE INSERT ON public.bar FOR EACH ROW EXECUTE PROCEDURE public.qux();
+CREATE TRIGGER a BEFORE INSERT ON public.foo FOR EACH ROW EXECUTE PROCEDURE public.qux();
 
 
 --
--- Name: foo a; Type: POLICY; Schema: public; Owner: benjie
+-- Name: bar a; Type: POLICY; Schema: public; Owner: benjie
 --
 
-CREATE POLICY a ON public.foo FOR SELECT USING (true);
+CREATE POLICY a ON public.bar FOR SELECT USING (true);
 
 
 --
--- Name: bar a; Type: POLICY; Schema: public; Owner: benjie
+-- Name: foo a; Type: POLICY; Schema: public; Owner: benjie
 --
 
-CREATE POLICY a ON public.bar FOR SELECT USING (true);
+CREATE POLICY a ON public.foo FOR SELECT USING (true);
 
 
 --


Re: Hypothetical indexes using BRIN broken since pg10

2019-09-24 Thread Julien Rouhaud
On Mon, Sep 9, 2019 at 5:03 PM Tom Lane  wrote:
>
> Alvaro Herrera from 2ndQuadrant  writes:
> > On 2019-Sep-02, Tom Lane wrote:
> >> The right answer IMO is basically for the brinGetStats call to go
> >> away from brincostestimate and instead happen during plancat.c's
> >> building of the IndexOptInfo.  In the case of a hypothetical index,
> >> it'd fall to the get_relation_info_hook to fill in suitable fake
> >> data.
>
> > So I'm not clear on what the suggested strategy is, here.  Do we want
> > that design change to occur in the bugfix that would be backpatched, or
> > do we want the backbranches to use the patch as posted and then we apply
> > the above design on master only?
>
> The API change I'm proposing is surely not back-patchable.
>
> Whether we should bother back-patching a less capable stopgap fix
> is unclear to me.  Yeah, it's a bug that an index adviser can't
> try a hypothetical BRIN index; but given that nobody noticed till
> now, it doesn't seem like there's much field demand for it.
> And I'm not sure that extension authors would want to deal with
> testing minor-release versions to see if the fix is in, so
> even if there were a back-patch, it might go unused.

FWIW I maintain such an extension and testing for minor release
version is definitely not a problem.




PostgreSQL12 and older versions of OpenSSL

2019-09-24 Thread Victor Wagner
Dear hackers,

PostgreSQL 12 documentation states, that minimum required version of
OpenSSL is 0.9.8. However, I was unable to сompile current
PGPRO_12_STABLE with OpenSSL 0.9.8j (from SLES 11sp4).


-fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include  -D_GNU_SOURCE 
-I/usr/include/libxml2   -c -o be-secure-openssl.o be-secure-openssl.c
be-secure-openssl.c: In function ‘SSL_CTX_set_min_proto_version’:
be-secure-openssl.c:1340: error: ‘SSL_OP_NO_TLSv1_1’ undeclared (first use in 
this function)
be-secure-openssl.c:1340: error: (Each undeclared identifier is reported only 
once
be-secure-openssl.c:1340: error: for each function it appears in.)
be-secure-openssl.c:1344: error: ‘SSL_OP_NO_TLSv1_2’ undeclared (first use in 
this function)
be-secure-openssl.c: In function ‘SSL_CTX_set_max_proto_version’:
be-secure-openssl.c:1361: error: ‘SSL_OP_NO_TLSv1_1’ undeclared (first use in 
this function)
be-secure-openssl.c:1365: error: ‘SSL_OP_NO_TLSv1_2’ undeclared (first use in 
this function)
make: *** [be-secure-openssl.o] Error 1


Problem is that some code in src/backend/libpq/be-secure-openssl.c
assumes that if preprocessor symbols TLS1_1_VERSION and TLS1_2_VERSION
are defined in the openssl headers, corresponding versions of TLS are
supported by the library.

It is not so. Here is exempt from tls1.h header file from the openssl
0.9.8j

#define TLS1_VERSION0x0301
#define TLS1_1_VERSION  0x0302
#define TLS1_2_VERSION  0x0303
/* TLS 1.1 and 1.2 are not supported by this version of OpenSSL, so
 * TLS_MAX_VERSION indicates TLS 1.0 regardless of the above
 * definitions. (s23_clnt.c and s23_srvr.c have an OPENSSL_assert()
 * check that would catch the error if TLS_MAX_VERSION was too low.)
 */
#define TLS_MAX_VERSION TLS1_VERSION

Replacing all 

#ifdef TLS1_1_VERSION

with

#if defined(TLS1_1_VERSION) && TLS1_1_VERSION <= TLS_MAX_VERSION

and analogue for TLS1_2_VERSION fixes the problem.

Really, problem is that symbol SSL_OP_NO_TLSv1_1 (and 1_2 accordingly)
might be undefined even if TLS1_1_VERSION defined. 

Replacing 

#ifdef TLS1_1_VERSION

with 

#ifdef SSL_OP_NO_TLSv1_1 

seems to be correct solution for two of three #ifdef TLS1_1_VERSION
statements in be-secure-openssl.c, because this symbol is used inside
#ifdef block.

But there is third (first from start of file) one.
...
case PG_TLS1_1_VERSION:
#ifdef TLS1_1_VERSION
return TLS1_1_VERSION;
#else
break;
#endif
...
(line 1290). In this case check for TLS1_1_VERSION <= TLS_MAX_VERSION
seems to be more self-explanatory, than check for somewhat unrelated 
symbol SSL_OP_NO_TLSv1_1
 

-- 





Re: pg_upgrade check fails on Solaris 10

2019-09-24 Thread Marina Polyakova

On 2019-09-23 19:41, Alvaro Herrera wrote:

On 2019-Sep-23, Marina Polyakova wrote:
The branch REL9_4_STABLE (commit 
8a17afe84be6fefe76d0d2f4d26c5ee075e64487)

has the same issue - according to the release table [2] it is still
supported, isn't it?...


Yeah, but pg_upgrade is in contrib/ in 9.4, so nowhere as good as from
9.5 onwards; and it's going to die in a couple of months anyway, so I'm
not thrilled about fixing this there.

If you *need* to have this fixed in 9.4, we can do that, but do you?


No, we don't. I just noticed :-)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Cache lookup errors with functions manipulation object addresses

2019-09-24 Thread Kyotaro Horiguchi
At Tue, 24 Sep 2019 08:58:50 +0900, Michael Paquier  wrote 
in <20190923235850.ga2...@paquier.xyz>
> On Mon, Sep 23, 2019 at 09:15:24PM +0200, Dmitry Dolgov wrote:
> Please feel free to use the updated versions attached.  These can
> apply on top of HEAD at 30d1379.

In 0003, empty string and NULL are not digtinguishable in psql
text output. It'd be better that the regression test checks that
the return is actually NULL using "is NULL" or "\pset null
''" or something like.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center