date:20190122

On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera  wrote:
> Is it possible to avoid the special case for 0 columns by using the
> UNION ALL syntax I showed?

It would be possible, but my thoughts are that we're moving away from
the SQL standard by doing so.

Looking at the standard I see:

 ::=
 [ {   }... ]

so it appears that multirow VALUES clauses are allowed.

INSERT INTO ... DEFAULT VALUES; is standard too, but looking at
SELECT, neither the target list or FROM clause is optional.

You could maybe argue that 0-column tables are not standard anyway.
Going by DROP COLUMN I see "4) C shall be a column of T and C shall
not be the only column of T.". Are we the only database to break that?

I think since pg_dump --inserts is meant to be for importing data into
other databases then we should likely keep it as standard as possible.

Another argument against is that we've only supported empty SELECT
clauses since 9.4, so it may not help anyone who mysteriously wanted
to import data into an old version. Maybe that's a corner case, but
I'm sure 0 column tables are too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

postgres on a non-journaling filesystem

2019-01-22 Thread maayan mordehai

hello,

I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a  journaling filesystem. and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?
and how does it work in the data level? if some of the 8k block is written
but not all of it, and then there is a crash, how postgres deals with it?

Thanks in advance

Re: Typo: llvm.cpp files identified as llvm.c

2019-01-22 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 [IDENTIFICATION]

 Andres> I think we should just rip them out. It's useless noise.

+1

-- 
Andrew (irc:RhodiumToad)

RE: Typo: pgbench.c

2019-01-22 Thread Moon, Insung

> -Original Message-
> From: Michael Paquier [mailto:mich...@paquier.xyz]
> Sent: Wednesday, January 23, 2019 3:01 PM
> To: Moon, Insung
> Cc: 'Pg Hackers'
> Subject: Re: Typo: pgbench.c
> 
> On Wed, Jan 23, 2019 at 02:18:49PM +0900, Moon, Insung wrote:
> > Attached fix it.
> 
> Thanks, pushed.  Please note that the indentation was not correct after 
> applying your patch.

Thank you. 
I'll read your advice and the PostgreSQL manual and 
be careful when posting the future patch.

Regards.
Moon.

> --
> Michael

Re: Typo: pgbench.c

On Wed, Jan 23, 2019 at 02:18:49PM +0900, Moon, Insung wrote:
> Attached fix it.

Thanks, pushed.  Please note that the indentation was not correct
after applying your patch.
--
Michael


signature.asc
Description: PGP signature

Typo: pgbench.c

2019-01-22 Thread Moon, Insung

Dear hackers.

I found a minor typo in the comment pgbench.c.

-* always allocate one more in order to accomodate the NULL 
terminator
+* always allocate one more in order to accommodate the NULL 
terminator

Attached fix it.

Regards.
Moon.

Moon, insung
NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center




typo-pgbench.patch
Description: Binary data

Re: Strange query behaviour

Isaac Morland  writes:
> What is confusing me is why the planner can't convert "[entire row] IS
> NULL" into a test for existence of the matching row (assuming there is at
> least one NOT NULL column).

The reasons why the planner does very little with row-level IS [NOT] NULL
conditions are (1) so few people use them that it doesn't really seem
worth expending cycles or development effort on such cases, and (2) the
SQL spec is vague enough about the semantics of these predicates that
we've never been entirely sure whether we implement them correctly.  Thus
it didn't seem worth expending a lot of effort developing deduction logic
that might turn out to be completely wrong.

I suspect (1) is not unrelated to (2) ...

The semantic vaguenesses are also twofold:

(a) It's not quite clear whether the spec intends to draw a distinction
between a composite value that is in itself NULL and one that is a tuple
of all NULL fields.  There is certainly a physical difference, but it
looks like the IS [NOT] NULL predicates are designed not to be able to
tell the difference.

(b) It's not at all clear whether these predicates are meant to be
recursive for nested composite types.  Depending on how you read it,
it could be that a composite field that is NULL satisfies an IS NULL
predicate on the parent row, but a composite field that is ROW(NULL,
NULL, ...) does not.  That is in fact how we implement it, but it
sure seems weird given (a).

So personally, I've got zero confidence in these predicates and don't
especially wish to sink development effort into something that
critically depends on having the right semantics for them.  The shortage
of field demand for doing better doesn't help.

Circling back to your interest in using a "row IS NULL" predicate to
conclude that a left join is actually an antijoin, these questions are
really critical, because if the join column is itself composite, the
deduction would hold *only* if our theory that "row IS NULL" is
non-recursive is correct.  If our theory is wrong, and the spec
intends that ROW(NULL, ROW(NULL, NULL), NULL) IS NULL should be TRUE,
then it wouldn't be correct to draw the inference that the join has
to be an antijoin.  And that is also connected to the fact that
record_cmp considers ROW(NULL, NULL) to be equal to ROW(NULL, NULL),
which is (or at least seems to be) wrong per spec, but tough: we have
to have a total order for composite values, or they wouldn't be
indexable or sortable.

If your head's not hurting yet, you just need to think harder
about these issues.  It's all a mess, and IMO we're best off
not adding any more dependencies on these semantics than we
have to.

regards, tom lane

Re: jsonpath

2019-01-22 Thread Alexander Korotkov

On Tue, Jan 22, 2019 at 12:13 PM Oleg Bartunov  wrote:
>
> On Tue, Jan 22, 2019 at 8:21 AM Alexander Korotkov
>  wrote:
> >
> > The next revision is attached.
> >
> > Nikita made code and documentation improvements, renamed functions
> > from "jsonpath_" prefix to "jsonb_path_" prefix.  He also renamed
> > jsonpath_predicate() to jsonb_path_match() (that looks better for me
> > too).
> >
> > I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and
> > changed that behavior to always wrap result into array.
>
> agree with new names
>
> so it mimic the behaviour of JSON_QUERY with UNCONDITIONAL WRAPPER option

Good, thank you for feedback!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Typo: llvm.cpp files identified as llvm.c

Michael Paquier  writes:
> I am not sure if anybody uses them for anything automatically, still I
> find myself from time to time looking at them to remember on which
> path the file is located when opened in emacs.  So I still like having
> those references, perhaps I am just a minority.

FWIW, I like having them too, though I agree it's not super important.

regards, tom lane

Re: Undo worker and transaction rollback

2019-01-22 Thread Amit Kapila

On Fri, Jan 11, 2019 at 9:39 AM Dilip Kumar  wrote:
>
> On Mon, Dec 31, 2018 at 11:04 AM Dilip Kumar  wrote:
>>
>> On Tue, Dec 4, 2018 at 3:13 PM Dilip Kumar  wrote:
>>
>> After the latest change in undo interface patch[1], this patch need to
>> be rebased.  Attaching the rebased version of the patch.
>>
>> [1]https://www.postgresql.org/message-id/CAFiTN-tfquvm6tnWFaJNYYz-vSY%3D%2BSQTMv%2BFv1jPozTrW4mtKw%40mail.gmail.com
>>
>
> I have rebased this patch on top of latest version of undolog and 
> undo-interface patch set [1]
>

I have started reviewing your patch and below are some initial
comments.  To be clear to everyone, there is some part of this patch
for which I have also written code and I am also involved in the
high-level design of this work, but I have never done the detailed
review of this patch which I am planning to do now.   I think it will
be really good if some other hackers also review this patch especially
the interaction with transaction machinery and how the error rollbacks
work.  I have few comments below in that regard as well and I have
requested to add more comments to explain that part of the patch so
that it will be easier to understand how this works.

Few comments:

1.
+void
+undoaction_desc(StringInfo buf, XLogReaderState *record)
+{
+ char*rec = XLogRecGetData(record);
+ uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+ if (info == XLOG_UNDO_PAGE)
+ {
+ uint8 *flags = (uint8 *) rec;
+
+ appendStringInfo(buf, "page_contains_tpd_slot: %c ",
+ (*flags & XLU_PAGE_CONTAINS_TPD_SLOT) ? 'T' : 'F');
+ appendStringInfo(buf, "is_page_initialized: %c ",
+ (*flags & XLU_INIT_PAGE) ? 'T' : 'F');
+ if (*flags & XLU_PAGE_CONTAINS_TPD_SLOT)

Do we need handling of TPD in this patch?  This is mainly tied to
zheap, so, I think we can exclude it from this patch.

2.
const char *
+undoaction_identify(uint8 info)
+{
+ const char *id = NULL;
+
+ switch (info & ~XLR_INFO_MASK)
+ {
+ case XLOG_UNDO_APPLY_PROGRESS:
+ id = "UNDO APPLY PROGRESS";
+ break;
+ }
+
+ return id;
+}

Don't we need to handle the case of XLOG_UNDO_PAGE in this function?

3.
@@ -1489,6 +1504,34 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
{
..
+ /*
+ * Perform undo actions, if there are undologs for this transaction.
+ * We need to perform undo actions while we are still in transaction.
+ * Never push rollbacks of temp tables to undo worker.
+ */
+ for (i = 0; i < UndoPersistenceLevels; i++)
+ {
+ if (end_urec_ptr[i] != InvalidUndoRecPtr && !isCommit)
+ {
+ bool result = false;
+ uint64 rollback_size = 0;
+
+ if (i != UNDO_TEMP)
+ rollback_size = end_urec_ptr[i] - start_urec_ptr[i];
+
+ if (rollback_size >= rollback_overflow_size * 1024 * 1024)
+ result = PushRollbackReq(end_urec_ptr[i], start_urec_ptr[i], InvalidOid);

The comment and code don't seem to be completely in sync.  It seems
for rollback_overflow_size as 0, we will push the undo for temp tables
as well. I think you should have a stronger check here.

4.
+ /*
+ * We need the locations of start and end undo record pointers when rollbacks
+ * are to be performed for prepared transactions using zheap relations.
+ */
+ UndoRecPtr start_urec_ptr[UndoPersistenceLevels];
+ UndoRecPtr end_urec_ptr[UndoPersistenceLevels];
 } TwoPhaseFileHeader;

I think you can expand this comment a bit by telling the reason why
you need to store these in the file.  It seems it is because you want
to rollback prepared transactions after recovery.

5.
@@ -284,10 +284,21 @@ SetTransactionIdLimit(TransactionId
oldest_datfrozenxid, Oid oldest_datoid)
  TransactionId xidStopLimit;
  TransactionId xidWrapLimit;
  TransactionId curXid;
+ TransactionId oldestXidHavingUndo;

  Assert(TransactionIdIsNormal(oldest_datfrozenxid));

  /*
+ * To determine the last safe xid that can be allocated, we need to
+ * consider oldestXidHavingUndo.  The oldestXidHavingUndo will be only
+ * valid for zheap storage engine, so it won't impact any other storage
+ * engine.
+ */
+ oldestXidHavingUndo = pg_atomic_read_u32(>oldestXidHavingUndo);
+ if (TransactionIdIsValid(oldestXidHavingUndo))
+ oldest_datfrozenxid = Min(oldest_datfrozenxid, oldestXidHavingUndo);
+

Here, I think we need to mention the reason as well in the comments
why we need to care about oldestXidHavingUndo.

6.
+/* Location in undo log from where to start applying the undo actions. */
+static UndoRecPtr UndoActionStartPtr[UndoPersistenceLevels] =
+
{InvalidUndoRecPtr,
+
 InvalidUndoRecPtr,
+
 InvalidUndoRecPtr};
+
+/* Location in undo log up to which undo actions need to be applied. */
+static UndoRecPtr UndoActionEndPtr[UndoPersistenceLevels] =
+
{InvalidUndoRecPtr,
+
 InvalidUndoRecPtr,
+
 InvalidUndoRecPtr};
+
+/* Do we need to perform any undo actions? */
+static bool PerformUndoActions = false;

Normally, we track to and from undo locations for each transaction
state.  I think these are used for error rollbacks which have
different handling.  If so, can we write a few comments here to
explain

Re: A few new options for vacuumdb

On Tue, Jan 22, 2019 at 11:21:32PM +, Bossart, Nathan wrote:
> On 1/21/19, 10:08 PM, "Michael Paquier"  wrote:
> > On Mon, Jan 21, 2019 at 10:09:09PM +, Bossart, Nathan wrote:
> >> Here's a new patch set that should address the feedback in this
> >> thread.  The changes in this version include:
> >> 
> >>  - 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
> >>documentation.  My suggestion is to keep it short and simple like
> >>--full, --freeze, --skip-locked, --verbose, and --analyze.  The
> >>DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
> >>documentation, and IMO it is reasonably obvious that such vacuumdb
> >>options correspond to the VACUUM options.
> >
> > Okay, committed this one.
> 
> Thanks.
> 
> > Regarding the search_path business, there is actually similar business
> > in expand_table_name_patterns() for pg_dump if you look close by, so
> > as users calling --table don't have to specify the schema, and just
> > stick with patterns.
> 
> I see.  The "WHERE c.relkind..." clause looks a bit different than
> what I have, so I've altered vacuumdb's catalog query to match.  This
> was changed in expand_table_name_patterns() as part of 582edc369cd for
> a security fix, so we should probably do something similar here.
> 
> > +   /*
> > +* Prepare the list of tables to process by querying the catalogs.
> > +*
> > +* Since we execute the constructed query with the default search_path
> > +* (which could be unsafe), it is important to fully qualify everything.
> > +*/
> > It is not only important, but also absolutely mandatory, so let's make
> > the comment insisting harder on that point.  From this point of view,
> > the query that you are building is visibly correct.
> 
> I've updated the comment as suggested.
> 
> > +   appendStringLiteralConn(_query, just_table, conn);
> > +   appendPQExpBuffer(_query, "::pg_catalog.regclass\n");
> > +
> > +   pg_free(just_table);
> > +
> > +   cell = cell->next;
> > +   if (cell == NULL)
> > +   appendPQExpBuffer(_query, " )\n");
> > Hm.  It seems to me that you are duplicating what
> > processSQLNamePattern() does, so we ought to use it.  And it is
> > possible to control the generation of the different WHERE clauses with
> > a single query based on the number of elements in the list.  Perhaps I
> > am missing something?  It looks unnecessary to export
> > split_table_columns_spec() as well.
> 
> I'm sorry, I don't quite follow this.  AFAICT processSQLNamePattern()
> is for generating WHERE clauses based on a wildcard-pattern string for
> psql and pg_dump.  This portion of the patch is generating a WHERE
> clause to filter only for the tables listed via --table, and I don't
> think the --table option currently accepts patterns.  Since you can
> also provide a list of columns with the --table option, I am using
> split_table_columns_spec() to retrieve only the table name for the OID
> lookup.

Bah, of course you are right.  vacuumdb does not support patterns but
I thought it was able to handle that.  With column names that would be
sort of funny anyway.

>> Or we could just use "ANALYZE \.;/", perhaps patching it first.
>> Perhaps my regexp here is incorrect, but I just mean to check for an
>> ANALYZE query which begins by "ANALYZE " and finishes with ";",
>> without caring about what is in-between.  This would make the tests
>> more portable.
> 
> Sure, I've split this out into a separate patch.

Thanks, I have committed this part to ease the follow-up work.

> I think it's already pretty unlikely that any matching relations would
> be found, but I've updated these values to be within the range where
> any matching database has already stopped accepting commands to
> prevent wraparound.

Thanks.

I have been looking at 0002, and I found a problem with the way
ANALYZE queries are generated.  For example take this table:
CREATE TABLE aa1 (a int);

Then if I try to run ANALYZE with vacuumdb it just works:
$ vacuumdb -z --table 'aa1(b)'
vacuumdb: vacuuming database "ioltas"

Note that this fails with HEAD, but passes with your patch.  The
problem is that the query generated misses the lists of columns when
processing them through split_table_columns_spec(), as what is
generated is that:
VACUUM (ANALYZE) public.aa1;

So the result is actually incorrect because all the columns get
processed.

This patch moves the check about the existence of the relation when
querying the catalogs, perhaps we would want the same for columns for
consistency?  Or not.  That's a bit harder for sure, not impossible
visibly, still that would mean duplicating a bunch of logic that the
backend is doing by itself, so we could live without it I think.

Attached is a first patch to add more tests in this area, which passes
on HEAD but fails with your patch.  It looks like a good idea to tweak
the tests first as there is no coverage yet for the queries generated
with complete and partial column

Re: Typo: llvm.cpp files identified as llvm.c

On Tue, Jan 22, 2019 at 05:49:46PM -0800, Andres Freund wrote:
> On 2019-01-23 14:43:15 +1300, Thomas Munro wrote:
>> The function name comments are similar, though less consistent so I'm
>> too lazy to write a script to find one that is actually wrong (with
>> which to trigger Andres's let's-delete-them-all response :-D).

I am not sure if anybody uses them for anything automatically, still I
find myself from time to time looking at them to remember on which
path the file is located when opened in emacs.  So I still like having
those references, perhaps I am just a minority.  Being in the minority
is usually a cool thing, still if you wish ripping all these out it's
not like I'll cry for that, so please feel free to do as you see fit.

> I wish function comment styles were more consistent, but there's *SO*
> many styles, that I think it's hard to nicely automate it. And it's much
> more likely to cause conflicts than removing IDENTIFICATION. So...

Yes, I am usually more annoyed by the inconsistency of the function
upper blocks than IDENTIFICATION...  So I just try to stick with
keeping any new code consistent with the surroundings.  Making
back-patching harder than it is now is not really appealing, so I'd be
-1 for doing any consistency work.  Patching six branches for the same
patch is already a lot of work.
--
Michael

signature.asc
Description: PGP signature

RE: Log a sample of transactions

2019-01-22 Thread Kuroda, Hayato

Dear Adrien,

> Hum, I am not an english native speaker, I choosed "rate" because it is 
> already used for auto_explain.sample_rate and for log_statement_sample_rate

If the community agree those name, renaming parameters are not needed.
Consistency is the most important in a DBMS. :-)

I read your source code and I thought it is generally good.
Here are some minor suggestions, but you don't have to follow strictly:

 config.sgml 
You must document the behavior when users change the parameter during a 
transaction.
あやしい・・・
 
 postgres.c 
I give you three comments.

> /* flag for logging statements in this transaction */
I think "a" or the plural form should be used instead of "this"

* xact_is_sampled is left at the end of a transaction.
Should the parameter be set to false at the lowest layer of the transaction 
system?
I understand it is unnecessary for the functionality, but it have more symmetry.

* check_log_duration should be used only when postgres check the duration.
But I'm not sure a new function such as check_is_sampled is needed because A 
processing in new function will be as almost same as check_log_duration.


Best Regards,
Hayato Kuroda
Fujitsu LIMITED

Re: Typo: llvm.cpp files identified as llvm.c

2019-01-22 Thread Amit Langote

Hi Thomas,

On 2019/01/23 9:37, Thomas Munro wrote:
> On Wed, Jan 23, 2019 at 1:16 PM Amit Langote
>  wrote:
>> On 2019/01/23 4:51, Andres Freund wrote:
>>> On 2019-01-22 13:43:32 +0900, Amit Langote wrote:
 Attached find a patch to fix $subject.
>>>
>>> Thanks, pushed to master and 11.
>>
>> Thank you.
> 
> It's not only the ending that's wrong.  Here are some more source
> files whose IDENTIFICATION heading doesn't exactly match their path:
> 
> $ git grep -A 1 IDENTIFICATION | grep -v IDENTIFICATION | grep -v --
> -- | sed 's/-[^a-z][^a-z]*/ /' | awk '{ if ($1 != $2) print; }'
> doc/src/sgml/lobj.sgml src/test/examples/testlo.c
> src/backend/catalog/pg_publication.c pg_publication.c
> src/backend/commands/publicationcmds.c publicationcmds.c
> src/backend/commands/subscriptioncmds.c subscriptioncmds.c
> src/backend/jit/llvm/llvmjit_inline.cpp
> src/backend/lib/llvmjit/llvmjit_inline.cpp
> src/backend/jit/llvm/llvmjit_wrap.cpp src/backend/lib/llvm/llvmjit_wrap.cpp
> src/backend/optimizer/util/appendinfo.c 
> src/backend/optimizer/path/appendinfo.c
> src/backend/optimizer/util/inherit.c src/backend/optimizer/path/inherit.c

(I introduced this one.)

> src/backend/replication/logical/logicalfuncs.c
> src/backend/replication/logicalfuncs.c
> src/backend/replication/logical/reorderbuffer.c
> src/backend/replication/reorderbuffer.c
> src/backend/replication/logical/snapbuild.c 
> src/backend/replication/snapbuild.c
> src/backend/replication/pgoutput/Makefile src/backend/replication/pgoutput
> src/backend/utils/adt/version.c
> src/include/replication/pgoutput.h pgoutput.h

Oops.  In fact, my colleague just pointed that out to me about the files I
sent a patch upthread for.

> This could be really confusing for erm, future people reading a dot
> matrix print-out of the source code?

Heh.  Actually, I'm not sure which existing tools these IDENTIFICATION
lines are for, but maybe someone somewhere relies on these.  Or maybe not,
because they'd have noticed by now. :)

Thanks,
Amit

RE: Typo: llvm.cpp files identified as llvm.c

2019-01-22 Thread Moon, Insung

Dear Hackers.

> -Original Message-
> From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> Sent: Wednesday, January 23, 2019 9:38 AM
> To: Amit Langote
> Cc: Andres Freund; Pg Hackers
> Subject: Re: Typo: llvm*.cpp files identified as llvm*.c
> 
> On Wed, Jan 23, 2019 at 1:16 PM Amit Langote  
> wrote:
> > On 2019/01/23 4:51, Andres Freund wrote:
> > > On 2019-01-22 13:43:32 +0900, Amit Langote wrote:
> > >> Attached find a patch to fix $subject.
> > >
> > > Thanks, pushed to master and 11.
> >
> > Thank you.
> 
> It's not only the ending that's wrong.  Here are some more source files whose 
> IDENTIFICATION heading doesn't exactly match
> their path:

I found the same problem as you while checking the typo patch.
So I attached a patch that modified the correct file path in IDENTIFICATION.

Regards.
Moon.

> 
> $ git grep -A 1 IDENTIFICATION | grep -v IDENTIFICATION | grep -v --
> -- | sed 's/-[^a-z][^a-z]*/ /' | awk '{ if ($1 != $2) print; }'
> doc/src/sgml/lobj.sgml src/test/examples/testlo.c 
> src/backend/catalog/pg_publication.c pg_publication.c
> src/backend/commands/publicationcmds.c publicationcmds.c 
> src/backend/commands/subscriptioncmds.c subscriptioncmds.c
> src/backend/jit/llvm/llvmjit_inline.cpp
> src/backend/lib/llvmjit/llvmjit_inline.cpp
> src/backend/jit/llvm/llvmjit_wrap.cpp src/backend/lib/llvm/llvmjit_wrap.cpp
> src/backend/optimizer/util/appendinfo.c 
> src/backend/optimizer/path/appendinfo.c
> src/backend/optimizer/util/inherit.c src/backend/optimizer/path/inherit.c
> src/backend/replication/logical/logicalfuncs.c
> src/backend/replication/logicalfuncs.c
> src/backend/replication/logical/reorderbuffer.c
> src/backend/replication/reorderbuffer.c
> src/backend/replication/logical/snapbuild.c 
> src/backend/replication/snapbuild.c
> src/backend/replication/pgoutput/Makefile src/backend/replication/pgoutput 
> src/backend/utils/adt/version.c
> src/include/replication/pgoutput.h pgoutput.h
> 
> This could be really confusing for erm, future people reading a dot matrix 
> print-out of the source code?
> 
> --
> Thomas Munro
> http://www.enterprisedb.com



typo-identification-filepath.patch
Description: Binary data

Re: speeding up planning with partitions

2019-01-22 Thread Amit Langote

On 2019/01/22 18:47, David Rowley wrote:
> On Tue, 22 Jan 2019 at 20:01, Imai, Yoshikazu
>> What I understand so far is about 10,000 while loops at total 
>> (4098+4098+some extra) is needed in hash_seq_search() in EXECUTE query after 
>> the creation of the generic plan.
>> 10,000 while loops takes about 10 microsec (of course, we can't estimate 
>> correct time), and the difference of the latency between 5th and 7th EXECUTE 
>> is about 8 microsec, I currently think this causes the difference.
>  >
>> I don't know this problem relates to Amit-san's patch, but I'll continue to 
>> investigate it.
> 
> I had another thought... when you're making a custom plan you're only
> grabbing locks on partitions that were not pruned (just 1 partition in
> your case), but when making the generic plan, locks will be acquired
> on all partitions (all 4000 of them). This likely means that when
> building the generic plan for the first time that the
> LockMethodLocalHash table is expanded to fit all those locks, and
> since we never shrink those down again, it'll remain that size for the
> rest of your run.  I imagine the reason for the slowdown is that
> during LockReleaseAll(), a sequential scan is performed over the
> entire hash table. I see from looking at the hash_seq_search() code
> that the value of max_bucket is pretty critical to how it'll perform.
> The while ((curElem = segp[segment_ndx]) == NULL) loop will need to
> run fewer times with a lower max_bucket.

I too think that that might be behind that slight drop in performance.
So, it's good to know what one of the performance bottlenecks is when
dealing with large number of relations in queries.

Thanks,
Amit

Re: Typo: llvm.cpp files identified as llvm.c

On Wed, Jan 23, 2019 at 01:37:41PM +1300, Thomas Munro wrote:
> It's not only the ending that's wrong.  Here are some more source
> files whose IDENTIFICATION heading doesn't exactly match their path:

Good point.

> $ git grep -A 1 IDENTIFICATION | grep -v IDENTIFICATION | grep -v --
> -- | sed 's/-[^a-z][^a-z]*/ /' | awk '{ if ($1 != $2) print; }'
> doc/src/sgml/lobj.sgml src/test/examples/testlo.c

Some noise?

> src/backend/catalog/pg_publication.c pg_publication.c
> src/backend/commands/publicationcmds.c publicationcmds.c
> src/backend/commands/subscriptioncmds.c subscriptioncmds.c
> src/backend/jit/llvm/llvmjit_inline.cpp
> src/backend/lib/llvmjit/llvmjit_inline.cpp
> src/backend/jit/llvm/llvmjit_wrap.cpp src/backend/lib/llvm/llvmjit_wrap.cpp
> src/backend/replication/logical/logicalfuncs.c
> src/backend/replication/logicalfuncs.c
> src/backend/replication/logical/reorderbuffer.c
> src/backend/replication/reorderbuffer.c
> src/backend/replication/logical/snapbuild.c 
> src/backend/replication/snapbuild.c
> src/backend/replication/pgoutput/Makefile src/backend/replication/pgoutput
> src/include/replication/pgoutput.h pgoutput.h

Logical decoding, JIT and logical replication.

> src/backend/optimizer/util/appendinfo.c 
> src/backend/optimizer/path/appendinfo.c
> src/backend/optimizer/util/inherit.c src/backend/optimizer/path/inherit.c

These two are recent, from b60c397.

> src/backend/utils/adt/version.c

This complains about the indentation?

> This could be really confusing for erm, future people reading a dot
> matrix print-out of the source code?

Yes, it would be nice to make all that consistent.  Perhaps the
authors of the related commits would prefer fix that themselves?
--
Michael


signature.asc
Description: PGP signature

Re: Typo: llvm.cpp files identified as llvm.c

On 2019-01-23 14:43:15 +1300, Thomas Munro wrote:
> The function name comments are similar, though less consistent so I'm
> too lazy to write a script to find one that is actually wrong (with
> which to trigger Andres's let's-delete-them-all response :-D).

I wish function comment styles were more consistent, but there's *SO*
many styles, that I think it's hard to nicely automate it. And it's much
more likely to cause conflicts than removing IDENTIFICATION. So...

Greetings,

Andres Freund

Re: Typo: llvm.cpp files identified as llvm.c

On Wed, Jan 23, 2019 at 2:02 PM Andres Freund  wrote:
> On 2019-01-23 09:55:22 +0900, Michael Paquier wrote:
> > On Wed, Jan 23, 2019 at 01:37:41PM +1300, Thomas Munro wrote:
> > > This could be really confusing for erm, future people reading a dot
> > > matrix print-out of the source code?
>
> I think we should just rip them out. It's useless noise.

+1

IDENTIFICATION used to show magic CVS keywords, which was standard
practice back then and probably useful.  Now it's just a useless path,
because Magnus removed the keywords 10 years ago in commit
9f2e2113869.

The function name comments are similar, though less consistent so I'm
too lazy to write a script to find one that is actually wrong (with
which to trigger Andres's let's-delete-them-all response :-D).  At
least in the planner code, I think those might originally have been
documentation of the names of the Lisp functions that each C function
was ported from (for example in 4.2 source, sort_inner_and_outer's
comment says "sort-inner-and-outer", a Lispy spelling).  Bruce
delispified some of them 20 years ago in commit 6724a507874.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Typo: llvm.cpp files identified as llvm.c

Hi,

On 2019-01-23 09:55:22 +0900, Michael Paquier wrote:
> On Wed, Jan 23, 2019 at 01:37:41PM +1300, Thomas Munro wrote:
> > This could be really confusing for erm, future people reading a dot
> > matrix print-out of the source code?

I think we should just rip them out. It's useless noise.

> Yes, it would be nice to make all that consistent.  Perhaps the
> authors of the related commits would prefer fix that themselves?

That sounds like a waste of effort. This is fixing up useless
anachronisms, I don't understand what'd be gained by splitting this up
over multiple people.

Greetings,

Andres Freund

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

On Tue, Jan 22, 2019 at 06:11:23PM -0300, Alvaro Herrera wrote:
> On 2019-Jan-22, Andres Freund wrote:
>> Largely because I think it's an independent patch from the CXXOPT need
>> from Christopher / Debian packaging.  It's a larger patch, that needs
>> more docs etc.  If whoever applies that wants to backpatch it - I'm not
>> going to protest, I just wouldn't myself, unless somebody pipes up that
>> it'd help them.
> 
> Ah, I see.  No arguments against.

Thanks Andres and Alvaro for the input.  No issues with the way of
Andres proposes.  

The new PGXS flags would be I think useful to make sure
that things for CFLAGS and LDFLAGS get propagated without having to
hijack the original flags, so I can handle that part.  Which one would
be wanted though?
- PG_CXXFLAGS
- PG_LDFLAGS
- PG_CFLAGS

I'd see value in all of them, still everybody has likely a different
opinion, so I would not mind discarding the ones are not thought as
that much useful.  New PGXS infrastructure usually finds only its way
on HEAD, so I'd rather not back-patch that part.  No issues with the
back-patch portion for CXXOPT from me as that helps Debian.

Thanks,
--
Michael

signature.asc
Description: PGP signature

Re: Typo: llvm.cpp files identified as llvm.c

On Wed, Jan 23, 2019 at 1:16 PM Amit Langote
 wrote:
> On 2019/01/23 4:51, Andres Freund wrote:
> > On 2019-01-22 13:43:32 +0900, Amit Langote wrote:
> >> Attached find a patch to fix $subject.
> >
> > Thanks, pushed to master and 11.
>
> Thank you.

It's not only the ending that's wrong.  Here are some more source
files whose IDENTIFICATION heading doesn't exactly match their path:

$ git grep -A 1 IDENTIFICATION | grep -v IDENTIFICATION | grep -v --
-- | sed 's/-[^a-z][^a-z]*/ /' | awk '{ if ($1 != $2) print; }'
doc/src/sgml/lobj.sgml src/test/examples/testlo.c
src/backend/catalog/pg_publication.c pg_publication.c
src/backend/commands/publicationcmds.c publicationcmds.c
src/backend/commands/subscriptioncmds.c subscriptioncmds.c
src/backend/jit/llvm/llvmjit_inline.cpp
src/backend/lib/llvmjit/llvmjit_inline.cpp
src/backend/jit/llvm/llvmjit_wrap.cpp src/backend/lib/llvm/llvmjit_wrap.cpp
src/backend/optimizer/util/appendinfo.c src/backend/optimizer/path/appendinfo.c
src/backend/optimizer/util/inherit.c src/backend/optimizer/path/inherit.c
src/backend/replication/logical/logicalfuncs.c
src/backend/replication/logicalfuncs.c
src/backend/replication/logical/reorderbuffer.c
src/backend/replication/reorderbuffer.c
src/backend/replication/logical/snapbuild.c src/backend/replication/snapbuild.c
src/backend/replication/pgoutput/Makefile src/backend/replication/pgoutput
src/backend/utils/adt/version.c
src/include/replication/pgoutput.h pgoutput.h

This could be really confusing for erm, future people reading a dot
matrix print-out of the source code?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: COPY FROM WHEN condition

2019-01-22 Thread Tomas Vondra



On 1/22/19 10:00 AM, Surafel Temesgen wrote:
> 
> 
> On Mon, Jan 21, 2019 at 6:22 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> I think the condition can be just
> 
>     if (contain_volatile_functions(cstate->whereClause)) { ... }
> 
> 

I've pushed a fix for the volatility check.

Attached is a patch for the other issue, creating a separate batch
context long the lines outlined in the previous email. It's a bit too
late for me to push it now, especially right before a couple of days
off. So I'll push that in a couple of days.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 03745cca75..41dbcd5b42 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2323,9 +2323,9 @@ CopyFrom(CopyState cstate)
 	ExprContext *econtext;
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
+	MemoryContext batchcontext;
 
 	PartitionTupleRouting *proute = NULL;
-	ExprContext *secondaryExprContext = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
@@ -2639,20 +2639,10 @@ CopyFrom(CopyState cstate)
 		 * Normally, when performing bulk inserts we just flush the insert
 		 * buffer whenever it becomes full, but for the partitioned table
 		 * case, we flush it whenever the current tuple does not belong to the
-		 * same partition as the previous tuple, and since we flush the
-		 * previous partition's buffer once the new tuple has already been
-		 * built, we're unable to reset the estate since we'd free the memory
-		 * in which the new tuple is stored.  To work around this we maintain
-		 * a secondary expression context and alternate between these when the
-		 * partition changes.  This does mean we do store the first new tuple
-		 * in a different context than subsequent tuples, but that does not
-		 * matter, providing we don't free anything while it's still needed.
+		 * same partition as the previous tuple.
 		 */
 		if (proute)
-		{
 			insertMethod = CIM_MULTI_CONDITIONAL;
-			secondaryExprContext = CreateExprContext(estate);
-		}
 		else
 			insertMethod = CIM_MULTI;
 
@@ -2685,6 +2675,14 @@ CopyFrom(CopyState cstate)
 	errcallback.previous = error_context_stack;
 	error_context_stack = 
 
+	/*
+	 * Set up memory context for batches. For cases without batching we could
+	 * use the per-tuple context, but it does not seem worth the complexity.
+	 */
+	batchcontext = AllocSetContextCreate(CurrentMemoryContext,
+		 "batch context",
+		 ALLOCSET_DEFAULT_SIZES);
+
 	for (;;)
 	{
 		TupleTableSlot *slot;
@@ -2692,18 +2690,14 @@ CopyFrom(CopyState cstate)
 
 		CHECK_FOR_INTERRUPTS();
 
-		if (nBufferedTuples == 0)
-		{
-			/*
-			 * Reset the per-tuple exprcontext. We can only do this if the
-			 * tuple buffer is empty. (Calling the context the per-tuple
-			 * memory context is a bit of a misnomer now.)
-			 */
-			ResetPerTupleExprContext(estate);
-		}
+		/*
+		 * Reset the per-tuple exprcontext. We do this after every tuple, to
+		 * clean-up after expression evaluations etc.
+		 */
+		ResetPerTupleExprContext(estate);
 
-		/* Switch into its memory context */
-		MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		/* Switch into per-batch memory context. */
+		MemoryContextSwitchTo(batchcontext);
 
 		if (!NextCopyFrom(cstate, econtext, values, nulls))
 			break;
@@ -2756,7 +2750,7 @@ CopyFrom(CopyState cstate)
 	 */
 	if (nBufferedTuples > 0)
 	{
-		ExprContext *swapcontext;
+		MemoryContext	oldcontext;
 
 		CopyFromInsertBatch(cstate, estate, mycid, hi_options,
 			prevResultRelInfo, myslot, bistate,
@@ -2765,29 +2759,26 @@ CopyFrom(CopyState cstate)
 		nBufferedTuples = 0;
 		bufferedTuplesSize = 0;
 
-		Assert(secondaryExprContext);
-
 		/*
-		 * Normally we reset the per-tuple context whenever
-		 * the bufferedTuples array is empty at the beginning
-		 * of the loop, however, it is possible since we flush
-		 * the buffer here that the buffer is never empty at
-		 * the start of the loop.  To prevent the per-tuple
-		 * context from never being reset we maintain a second
-		 * context and alternate between them when the
-		 * partition changes.  We can now reset
-		 * secondaryExprContext as this is no longer needed,
-		 * since we just flushed any tuples stored in it.  We
-		 * also now switch over to the other context.  This
-		 * does mean that the first tuple in the buffer won't
-		 * be in the same context as the others, but that does
-		 * not matter since we only reset it after the flush.
+		 * The tuple is allocated in the batch context, which we
+		 * want to reset. So to keep the tuple we copy the tuple
+		 *

Re: Typo: llvm.cpp files identified as llvm.c

2019-01-22 Thread Amit Langote

On 2019/01/23 4:51, Andres Freund wrote:
> Hi,
> 
> On 2019-01-22 13:43:32 +0900, Amit Langote wrote:
>> Attached find a patch to fix $subject.
> 
> Thanks, pushed to master and 11.

Thank you.

Regards,
Amit

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

On Wed, 23 Jan 2019 at 03:43, David Rowley  wrote:
> I made another pass over the 0001 patch. I've not read through mcv.c
> again yet. Will try to get to that soon.
>
> 0001-multivariate-MCV-lists-20190117.patch

I started on mcv.c this morning. I'm still trying to build myself a
picture of how it works, but I have noted a few more things while I'm
reading.

24. These macros are still missing parenthesis around the arguments:

#define ITEM_INDEXES(item) ((uint16*)item)
#define ITEM_NULLS(item,ndims) ((bool*)(ITEM_INDEXES(item) + ndims))
#define ITEM_FREQUENCY(item,ndims) ((double*)(ITEM_NULLS(item,ndims) + ndims))

While I don't see any reason to put parenthesis around the macro's
argument when passing it to another macro, since it should do it...
There is a good reason to have the additional parenthesis when it's
not passed to another macro.

Also, there's a number of places, including with these macros that
white space is not confirming to project standard. e.g.
((uint16*)item) should be ((uint16 *) (item))  (including fixing the
missing parenthesis)

25. In statext_mcv_build() I'm trying to figure out what the for loop
does below the comment:

* If we can fit all the items onto the MCV list, do that. Otherwise
* use get_mincount_for_mcv_list to decide which items to keep in the
* MCV list, based on the number of occurences in the sample.

The comment explains only as far as the get_mincount_for_mcv_list()
call so the following is completely undocumented:

for (i = 0; i < nitems; i++)
{
if (mcv_counts[i] < mincount)
{
nitems = i;
break;
}
}

I was attempting to figure out if the break should be there, or if the
code should continue and find the 'i' for the smallest mcv_counts, but
I don't really understand what the code is meant to be doing.

Also:  occurences  -> occurrences

26. Again statext_mcv_build() I'm a bit puzzled to why mcv_counts
needs to exist at all. It's built from:

mcv_counts = (int *) palloc(sizeof(int) * nitems);

for (i = 0; i < nitems; i++)
mcv_counts[i] = groups[i].count;

Then only used in the loop mentioned in #25 above.  Can't you just use
groups[i].count?

(Stopped in statext_mcv_build(). Need to take a break)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Rare SSL failures on eelpout

On Wed, Jan 23, 2019 at 4:07 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Hmm.  Why is psql doing two sendto() calls without reading a response
> > in between, when it's possible for the server to exit after the first,
> > anyway?  Seems like a protocol violation somewhere?
>
> Keep in mind this is all down inside the SSL handshake, so if any
> protocol is being violated, it's theirs not ours.

The sendto() of 1115 bytes is SSL_connect()'s last syscall, just
before it returns 1 to indicate success (even though it wasn't
successful?), without waiting for a further response.  The sendto() of
107 bytes is our start-up packet, which either succeeds and is
followed by reading a "certificate revoked" message from the server,
or fails with ECONNRESET if the socket has already been shut down at
the server end due to the racing exit.

It seems very strange to me that the error report is deferred until we
send our start-up packet.  It seems like a response that belongs to
the connection attempt, not our later data sending.  Bug in OpenSSL?
Unintended consequence of our switch to blocking IO at that point?

I tried to find out how this looked on 1.0.2, but it looks like Debian
has just removed the older version from the buster distro and I'm out
of time to hunt this on other OSes today.

> The whole thing reminds me of the recent bug #15598:
>
> https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk

Yeah, if errors get moved to later exchanges but the server might exit
and close its end of the socket before we can manage to initiate a
later exchange, it starts to look just like that.

A less interesting bug is the appearance of 3 nonsensical "Success"
(glibc) or "No error: 0" (FreeBSD) error messages in the server logs
on systems running OpenSSL 1.1.1, much like this, which I guess might
mean EOF:

https://www.postgresql.org/message-id/CAEepm=3cc5wYv=x4nzy7voukdhbijs9bplzqtqjwxddup5d...@mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Strange query behaviour

2019-01-22 Thread Isaac Morland

On Tue, 22 Jan 2019 at 15:32, Andrew Gierth 
wrote:

> > "Isaac" == Isaac Morland  writes:
>
>  Isaac> So it is as if checking the whole tuple for NULL requires
>  Isaac> reading the PDF bytea columns, but checking just the primary key
>  Isaac> for NULL or even reading the lengths of the PDFs does not.
>
> That is almost certainly exactly what happens. If the PDF columns are of
> type bytea, or if they are of type text and the database encoding is
> single-byte, then length() does not need to detoast the column in order
> to get the size (the byte size of the toasted datum is stored in the
> toast pointer).
>

Thanks very much for the detailed response! I just tested and indeed query
performance has gone back to something like what I would expect. I feel
more confident, however, with your confirmation and elaboration on the
underlying details.

What you should actually use in these cases for your IS NULL check is
> one of the columns of the join condition. That allows the planner to
> detect that the query is in fact an anti-join, and optimize accordingly.
>
> The other, and IMO better, way to write anti-join queries is to use an
> explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
> issues with NULL handling.)
>

Thanks for the hint. As it happens, in the actual situation, I had a view
which is defined something like:

CREATE ... AS
SELECT ..., NOT y IS NULL AS has_y
FROM x LEFT JOIN y USING (primary_key_field);

I used the simpler example when I found it would exhibit the same symptoms.
However, using a join key field as you suggest still seems to be working to
fix my problem.

 Isaac> 1) when checking an entire row for null,
>
> This isn't a very common operation and the SQL-standard semantics for it
> are actually quite weird (for example, x IS NULL is not the opposite
> condition to x IS NOT NULL). So I don't think we need to encourage it.
>

In my use case, I think "y IS NULL" is better code than
"y.primary_key_field IS NULL". In the second snippet, it raises the
question, "why primary_key_field?" The answer is, "because otherwise the
query planner will get confused". The first snippet is what I really mean,
and also happens to be shorter.

This does require people reading the code to understand that "IS NULL" and
"IS NOT NULL" are not logical negations of each other.

> The planner can do even better than this if you apply the IS NULL test
> _specifically to one of the join columns_. When a join condition is
> strict (which it almost always is), then testing the column from the
> nullable side is provably (to the planner) equivalent to testing the
> existence of the matching row, which allows it to transform the join
> from an outer join to an anti-join.
>

What is confusing me is why the planner can't convert "[entire row] IS
NULL" into a test for existence of the matching row (assuming there is at
least one NOT NULL column).

OK, let's put it this way: if I manage to dig into the planner internals
enough to figure out how to make the planner understand this, and write a
decent patch, would I have a good chance of getting it accepted? From here
the figuring out part seems like a long-shot: messing with planner code
scares me a little and I already have a feature request that I want to work
on and a reasonable workaround for this one, but I'd like an assessment
nevertheless. Maybe I'll have more energy this year!

Thanks again for taking the time to provide a detailed response. I very
much appreciate it.

Re: Delay locking partitions during query execution

On Sat, 12 Jan 2019 at 23:42, David Rowley  wrote:
> I've attached a rebase version of this. The previous version
> conflicted with some changes made in b60c397599.

I've attached another rebased version. This one fixes up the conflict
with e0c4ec07284.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v3-0001-Allow-lock-acquisitions-for-partitions-to-be-dela.patch
Description: Binary data

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

Hello

On 2019-Jan-22, Andres Freund wrote:

> On 2019-01-22 17:10:58 -0300, Alvaro Herrera wrote:

> > I don't understand why you don't want to backpatch the PGXS bits.
> 
> Largely because I think it's an independent patch from the CXXOPT need
> from Christopher / Debian packaging.  It's a larger patch, that needs
> more docs etc.  If whoever applies that wants to backpatch it - I'm not
> going to protest, I just wouldn't myself, unless somebody pipes up that
> it'd help them.

Ah, I see.  No arguments against.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Refactoring the checkpointer's fsync request queue

Hi,

On 2019-01-22 14:53:11 -0600, Kevin Grittner wrote:
> On Tue, Jan 22, 2019 at 2:38 PM Andres Freund  wrote:
> 
> > close() doesn't trigger an fsync() in general
> 
> What sort of a performance hit was observed when testing the addition
> of fsync or fdatasync before any PostgreSQL close() of a writable
> file, or have we not yet checked that?

I briefly played with it, and it was so atrocious (as in, less than
something like 0.2x the throughput) that I didn't continue far down that
path.  Two ways I IIRC (and it's really just memory) I tried were:

a) Short lived connections that do a bunch of writes to files each. That
   turns each disconnect into an fsync of most files.
b) Workload with > max_files_per_process files (IIRC I just used a bunch
   of larger tables with a few indexes each) in a read/write workload
   that's a bit larger than shared buffers. That lead to most file
   closes being integrity writes, with obvious downsides.

Greetings,

Andres Freund

Re: Refactoring the checkpointer's fsync request queue

On Tue, Jan 22, 2019 at 2:38 PM Andres Freund  wrote:

> close() doesn't trigger an fsync() in general

What sort of a performance hit was observed when testing the addition
of fsync or fdatasync before any PostgreSQL close() of a writable
file, or have we not yet checked that?

> https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
> is, I think, a good overview, with a bunch of links.

Thanks!  Will review.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: Install JIT headers

2019-01-22 Thread David Fetter

On Tue, Jan 22, 2019 at 11:42:35AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2019-01-22 11:08:40 -0800, Donald Dong wrote:
> > In the standard_planner, we set the proper JIT flags on the resulting
> > PlannedStmt node. We also offered a planer hook so extensions can
> > add customized planners. The JIT flags in jit/jit.h however, is
> > not present on the system. I think it's probably better to
> > install the jit headers?
> > 
> > 
> > diff --git a/src/include/Makefile b/src/include/Makefile
> > index 6bdfd7db91..041702809e 100644
> > --- a/src/include/Makefile
> > +++ b/src/include/Makefile
> > @@ -19,7 +19,7 @@ all: pg_config.h pg_config_ext.h pg_config_os.h
> >  # Subdirectories containing installable headers
> >  SUBDIRS = access bootstrap catalog commands common datatype \
> > executor fe_utils foreign \
> > -   lib libpq mb nodes optimizer parser partitioning postmaster \
> > +   jit lib libpq mb nodes optimizer parser partitioning postmaster \
> > regex replication rewrite \
> > statistics storage tcop snowball snowball/libstemmer tsearch \
> > tsearch/dicts utils port port/atomics port/win32 port/win32_msvc \
> 
> Seems like a good idea. I think we ought to backpatch that to 11?  Will
> do tomorrow if nobody protests.

+1 for back-patching

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Refactoring the checkpointer's fsync request queue

Hi,

On 2019-01-22 14:29:23 -0600, Kevin Grittner wrote:
> On Tue, Jan 22, 2019 at 12:17 PM Andres Freund  wrote:
> 
> > Unfortunately, unless something has changed recently, that patch is
> > *not* sufficient to really solve the issue - we don't guarantee that
> > there's always an fd preventing the necessary information from being
> > evicted from memory:
> 
> But we can't lose an FD without either closing it or suffering an
> abrupt termination that would trigger a PANIC, can we?  And close()
> always calls fsync().  And I thought our "PANIC on fsync" patch paid
> attention to close().  How do you see this happening???

close() doesn't trigger an fsync() in general (although it does on many
NFS implementations), and doing so would be *terrible* for
performance. Given that it's pretty clear how you can get all FDs
closed, right? You just need sufficient open files that files get closed
due to max_files_per_process, and you can run into the issue.  A
thousand open files is pretty easy to reach with forks, indexes,
partitions etc, so this isn't particularly artifical.

> >  Note that we might still lose the error if the inode gets evicted from
> >  the cache before anything can reopen it, but that was the case before
> >  errseq_t was merged. At LSF/MM we had some discussion about keeping
> >  inodes with unreported writeback errors around in the cache for longer
> >  (possibly indefinitely), but that's really a separate problem"
> >
> > And that's entirely possibly in postgres.
> 
> Is it possible for an inode to be evicted while there is an open FD
> referencing it?

No, but we don't guarantee that there's always an FD open, due to the 

> > The commit was dicussed on list too, btw...
> 
> Can you point to a post explaining how the inode can be evicted?

https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
is, I think, a good overview, with a bunch of links.  As is the
referenced lwn article [1] and the commit message you linked.

Greetings,

Andres Freund

[1] https://lwn.net/Articles/752063/

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD



Hello Tom,


Here is a POC which defines an internal interface for a PRNG, and use it
within pgbench, with several possible implementations which default to
rand48.



I seriously dislike this patch.  pgbench's random support is quite
overengineered already IMO, and this proposes to add a whole batch of
new code and new APIs to fix a very small bug.



My intention is rather to discuss postgres' PRNG, in passing. Full success
on this point:-)


Our immediate problem is to fix a portability failure, which we need to
back-patch into at least one released branch, ergo conservatism is
warranted.


Sure, the patch I sent is definitely not for backpatching, it is for 
discussion.



 I had in mind something more like the attached.


Yep.

I'm not too happy that it mixes API levels, and about the int/double/int 
path.


Attached an updated version which relies on pg_jrand48 instead. Also, as 
the pseudo-random state is fully controlled, seeded test results are 
deterministic so the expected value can be fully checked.


I did a few sanity tests which were all ok.

I think that this version is appropriate for backpatching. I also think 
that it would be appropriate to consider having a better PRNG to replace 
rand48 in a future release.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b5c75ce1c6..27aac479e3 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -185,7 +185,7 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
-/* random seed used when calling srandom() */
+/* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
 /*
@@ -287,6 +287,9 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/* Various random sequences are initialized from this one. */
+static RandomState base_random_sequence;
+
 /*
  * Connection state machine states.
  */
@@ -598,6 +601,8 @@ static const BuiltinScript builtin_script[] =
 
 
 /* Function prototypes */
+static void initRandomState(RandomState *random_state);
+static int64 getrand(RandomState *random_state, int64 min, int64 max);
 static void setNullValue(PgBenchValue *pv);
 static void setBoolValue(PgBenchValue *pv, bool bval);
 static void setIntValue(PgBenchValue *pv, int64 ival);
@@ -833,16 +838,28 @@ strtodouble(const char *str, bool errorOK, double *dv)
 
 /*
  * Initialize a random state struct.
+ *
+ * We derive the seed from base_random_sequence, which must be set up already.
  */
 static void
 initRandomState(RandomState *random_state)
 {
-	random_state->xseed[0] = random();
-	random_state->xseed[1] = random();
-	random_state->xseed[2] = random();
+	random_state->xseed[0] = (unsigned short)
+		pg_jrand48(base_random_sequence.xseed) & 0x;
+	random_state->xseed[1] = (unsigned short)
+		pg_jrand48(base_random_sequence.xseed) & 0x;
+	random_state->xseed[2] = (unsigned short)
+		pg_jrand48(base_random_sequence.xseed) & 0x;
 }
 
-/* random number generator: uniform distribution from min to max inclusive */
+/*
+ * Random number generator: uniform distribution from min to max inclusive.
+ *
+ * Although the limits are expressed as int64, you can't generate the full
+ * int64 range in one call, because the difference of the limits mustn't
+ * overflow int64.  In practice it's unwise to ask for more than an int32
+ * range, because of the limited precision of pg_erand48().
+ */
 static int64
 getrand(RandomState *random_state, int64 min, int64 max)
 {
@@ -5126,12 +5143,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
-/* call srandom based on some seed. NULL triggers the default behavior. */
+/*
+ * Set up a random seed according to seed parameter (NULL means default),
+ * and initialize base_random_sequence for use in initializing other sequences.
+ */
 static bool
 set_random_seed(const char *seed)
 {
-	/* srandom expects an unsigned int */
-	unsigned int iseed;
+	uint64		iseed;
 
 	if (seed == NULL || strcmp(seed, "time") == 0)
 	{
@@ -5139,7 +5158,7 @@ set_random_seed(const char *seed)
 		instr_time	now;
 
 		INSTR_TIME_SET_CURRENT(now);
-		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+		iseed = (uint64) INSTR_TIME_GET_MICROSEC(now);
 	}
 	else if (strcmp(seed, "rand") == 0)
 	{
@@ -5155,7 +5174,7 @@ set_random_seed(const char *seed)
 		/* parse seed unsigned int value */
 		char		garbage;
 
-		if (sscanf(seed, "%u%c", , ) != 1)
+		if (sscanf(seed, UINT64_FORMAT "%c", , ) != 1)
 		{
 			fprintf(stderr,
 	"unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
@@ -5165,10 +5184,14 @@ set_random_seed(const char *seed)
 	}
 
 	if (seed != NULL)
-		fprintf(stderr, "setting random seed to %u\n", iseed);
-	srandom(iseed);
-	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+		fprintf(stderr, "setting random seed to " UINT64_FORMAT "\n", iseed);
 	random_seed = iseed;
+
+	/* Fill base_random_sequence with

Re: Refactoring the checkpointer's fsync request queue

On Wed, Jan 23, 2019 at 9:29 AM Kevin Grittner wrote:
> Can you point to a post explaining how the inode can be evicted?

Hi Kevin,

To recap the (admittedly confusing) list of problems with Linux fsync
or our usage:

1. On Linux < 4.13, the error state can be cleared in various
surprising ways so that we never hear about it. Jeff Layton
identified and fixed this problem for 4.13+ by switching from an error
flag to an error counter that is tracked in such a way that every fd
hears about every error in the file.

2. Layton's changes originally assumed that you only wanted to hear
about errors that happened after you opened the file (ie it set the
fd's counter to the inode's current level at open time). Craig Ringer
complained about this. Everyone complained about this. A fix was
then made so that one fd also reports errors that happened before
opening, if no one else has seen them yet. This is the change that
was back-patched as far as Linux 4.14. So long as no third program
comes along and calls fsync on a file that we don't have open
anywhere, thereby eating the "not seen" flag before the checkpointer
gets around to opening the file, all is well.

3. Regardless of the above changes, we also learned that pages are
unceremoniously dropped from the page cache after write-back errors,
so that calling fsync() again after a failure is a bad idea (it might
report success, but your dirty data written before the previous
fsync() call is gone). We handled that by introducing a PANIC after
any fsync failure:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1

So did MySQL, MongoDB, and probably everyone else who spat out their
cornflakes while reading articles like "PostgreSQL's fsync() surprise"
in the Linux Weekly News that resulted from Craig's report:

https://github.com/mysql/mysql-server/commit/8590c8e12a3374eeccb547359750a9d2a128fa6a

https://github.com/wiredtiger/wiredtiger/commit/ae8bccce3d8a8248afa0e4e0cf67674a43dede96

4. Regardless of all of the above changes, there is still one way to
lose track of an error, as Andres mentioned: during a period of time
when neither the writing backend nor the checkpointer has the file
open, the kernel may choose to evict the inode from kernel memory, and
thereby forget about an error that we haven't received yet.

Problems 1-3 are solved by changes to Linux and PostgreSQL.

Problem 4 would be solved by this "fd-passing" scheme (file
descriptors are never closed until after fsync has been called,
existing in the purgatory of Unix socket ancillary data until the
checkpointer eventually deals with them), but it's complicated and not
quite fully baked yet.

It could also be solved by the kernel agreeing not to evict inodes
that hold error state, or to promote the error to device level, or
something like that. IIUC those kinds of ideas were rejected so far.

(It can also be solved by using FreeBSD and/or ZFS, so you don't have
problem 3 and therefore don't have the other problems.)

I'm not sure how likely that failure mode actually is, but I guess you
need a large number of active files, a low PostgreSQL max_safe_fds so
we close descriptors aggressively, a kernel that is low on memory or
has a high vfs_cache_pressure setting so that it throws out recently
used inodes aggressively, enough time between checkpoints for all of
the above to happen, and then some IO errors when the kernel is
writing back dirty data asynchronously while you don't have the file
open anywhere.

--
Thomas Munro
http://www.enterprisedb.com

Re: Strange query behaviour

2019-01-22 Thread Andrew Gierth

> "Isaac" == Isaac Morland  writes:

 Isaac> So it is as if checking the whole tuple for NULL requires
 Isaac> reading the PDF bytea columns, but checking just the primary key
 Isaac> for NULL or even reading the lengths of the PDFs does not.

That is almost certainly exactly what happens. If the PDF columns are of
type bytea, or if they are of type text and the database encoding is
single-byte, then length() does not need to detoast the column in order
to get the size (the byte size of the toasted datum is stored in the
toast pointer).

However, constructing a whole-row datum does require detoasting any
externally-stored columns (this used not to be the case, but that caused
a lot of bugs).

 Isaac> For the moment I'm going to fix it by just using
 Isaac> "y.primary_key_column IS NULL" instead of "y IS NULL" where I
 Isaac> want to check whether I have a row from y corresponding to a
 Isaac> given row in x.

What you should actually use in these cases for your IS NULL check is
one of the columns of the join condition. That allows the planner to
detect that the query is in fact an anti-join, and optimize accordingly.

The other, and IMO better, way to write anti-join queries is to use an
explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
issues with NULL handling.)

 Isaac> 1) when checking an entire row for null,

This isn't a very common operation and the SQL-standard semantics for it
are actually quite weird (for example, x IS NULL is not the opposite
condition to x IS NOT NULL). So I don't think we need to encourage it.

 Isaac> start with a primary key field or other NOT NULL field. In the
 Isaac> common case of checking what happened with a left join, this is
 Isaac> all that needs to be done - either there is a row, in which case
 Isaac> the field cannot be NULL, or there is no row and all the other
 Isaac> fields must also be NULL.

The planner can do even better than this if you apply the IS NULL test
_specifically to one of the join columns_. When a join condition is
strict (which it almost always is), then testing the column from the
nullable side is provably (to the planner) equivalent to testing the
existence of the matching row, which allows it to transform the join
from an outer join to an anti-join.

-- 
Andrew (irc:RhodiumToad)

Re: Refactoring the checkpointer's fsync request queue

On Tue, Jan 22, 2019 at 12:17 PM Andres Freund  wrote:

> Unfortunately, unless something has changed recently, that patch is
> *not* sufficient to really solve the issue - we don't guarantee that
> there's always an fd preventing the necessary information from being
> evicted from memory:

But we can't lose an FD without either closing it or suffering an
abrupt termination that would trigger a PANIC, can we?  And close()
always calls fsync().  And I thought our "PANIC on fsync" patch paid
attention to close().  How do you see this happening???

>  Note that we might still lose the error if the inode gets evicted from
>  the cache before anything can reopen it, but that was the case before
>  errseq_t was merged. At LSF/MM we had some discussion about keeping
>  inodes with unreported writeback errors around in the cache for longer
>  (possibly indefinitely), but that's really a separate problem"
>
> And that's entirely possibly in postgres.

Is it possible for an inode to be evicted while there is an open FD
referencing it?

> The commit was dicussed on list too, btw...

Can you point to a post explaining how the inode can be evicted?
--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

Hi,

On 2019-01-22 17:10:58 -0300, Alvaro Herrera wrote:
> On 2019-Jan-22, Andres Freund wrote:
> 
> > I think its plain wrong to add COPT to CXXFLAGS. Re PROFILE I'm on the
> > fence.  I personally think the pgxs stuff is a bit separate, and I'm
> > doubtful we ought to backpatch that.  I'm basically planning to apply
> > https://www.postgresql.org/message-id/20190107091734.GA1582%40msg.credativ.de
> > to 11-, minus the PGXS stuff. If we want that, we ought to apply it to
> > master only IMO.
> 
> I don't understand why you don't want to backpatch the PGXS bits.

Largely because I think it's an independent patch from the CXXOPT need
from Christopher / Debian packaging.  It's a larger patch, that needs
more docs etc.  If whoever applies that wants to backpatch it - I'm not
going to protest, I just wouldn't myself, unless somebody pipes up that
it'd help them.

Greetings,

Andres Freund

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

On 2019-Jan-22, Andres Freund wrote:

> I think its plain wrong to add COPT to CXXFLAGS. Re PROFILE I'm on the
> fence.  I personally think the pgxs stuff is a bit separate, and I'm
> doubtful we ought to backpatch that.  I'm basically planning to apply
> https://www.postgresql.org/message-id/20190107091734.GA1582%40msg.credativ.de
> to 11-, minus the PGXS stuff. If we want that, we ought to apply it to
> master only IMO.

I don't understand why you don't want to backpatch the PGXS bits.  Is
there something working today that would be broken by it?  I think
you're worried about places that invoke makefiles with PG_CXXFLAGS set
and expecting the value not to be propagated.  Is that a scenario we
need to worry about?

The patch neglects to update extend.sgml with the new pgxs variable,
though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fwd: Google Summer Of Code

2019-01-22 Thread Stephen Frost

Greetings,

* Vikramsingh Kushwaha (vskushw...@mitaoe.ac.in) wrote:
> I, Vikramsingh Kushwaha, currently studying in B.Tech 3rd year computer
> engineering in MIT Pune, India. I am very much interested to contribute in
> the open source projects. But i am new to this so I needed some guidance.
> Even i wanted to participate in Google Summer Of Code, so i wanted your
> organization to be my mentor.

This isn't how GSoC works.  I encourage you to read up on GSoC here:

https://summerofcode.withgoogle.com/how-it-works/

Note that students will need to write up a project proposal of
specifically what they're interested in working on, the timeline,
details about the approach they're going to use, etc.

However, before all of that, Google has to select the organizations and
their selection won't be announced until February 26.  While you're
welcome to work on your proposal, there's no guarantee that any
particular organization will be selected to participate in 2019.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

On 2019-01-22 15:26:21 +0900, Michael Paquier wrote:
> On Thu, Jan 17, 2019 at 02:55:39PM +0900, Michael Paquier wrote:
> > Personally I see pgxs as something completely different than what COPT
> > and PROFILE are as we are talking about two different facilities: one
> > which is part of the core installation, and the other which can be
> > used for extension modules, so having PG_CFLAGS, PG_CXXFLAGS and
> > PG_LDFLAGS, but leaving CXXFLAGS out of COPT and PROFILE looks like
> > the better long-term move in terms of pluggability.  My 2c.
> 
> It's been a couple of days since this message, and while my opinion
> has not really changed, there are many other opinions.  So perhaps we
> could reduce the proposal to a strict minimum and find an agreement
> for the options that we think are absolutely worth adding?  Even if we
> cannot agree on what COPT of PROFILE should do more, perhaps we could
> still agree with only a portion of the flags we think are worth it?

I think its plain wrong to add COPT to CXXFLAGS. Re PROFILE I'm on the
fence.  I personally think the pgxs stuff is a bit separate, and I'm
doubtful we ought to backpatch that.  I'm basically planning to apply
https://www.postgresql.org/message-id/20190107091734.GA1582%40msg.credativ.de
to 11-, minus the PGXS stuff. If we want that, we ought to apply it to
master only IMO.

Greetings,

Andres Freund

Fwd: Google Summer Of Code

2019-01-22 Thread Vikramsingh Kushwaha

-- Forwarded message -
From: Vikramsingh Kushwaha 
Date: Mon, Jan 21, 2019 at 11:56 PM
Subject: Fwd: Google Summer Of Code
To: 

-- Forwarded message -
From: Vikramsingh Kushwaha 
Date: Mon, Jan 21, 2019 at 11:54 PM
Subject: Fwd: Google Summer Of Code
To: 

-- Forwarded message -
From: Vikramsingh Kushwaha 
Date: Mon, Jan 21, 2019 at 11:51 PM
Subject: Google Summer Of Code
To: 

Respected sir/madam
I, Vikramsingh Kushwaha, currently studying in B.Tech 3rd year computer
engineering in MIT Pune, India. I am very much interested to contribute in
the open source projects. But i am new to this so I needed some guidance.
Even i wanted to participate in Google Summer Of Code, so i wanted your
organization to be my mentor.
Kindly, be my mentor, *i am ready for any challenge or task, test whatever
you want to take.* I shall be sharing my github and codechef profile. I am
an average coder but a dedicated hard worker.
Kindly guide me and be my mentor for Google Summer Of Code.

Github Profile: https://github.com/vikramsinghkushwaha
Codechef Profile: https://www.codechef.com/users/vskushwaha_15

Regards,
Vikramsingh Kushwaha
MIT Academy Of Engineering
Pune

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD



It's not demonstrably slower than 2.5 either.  (1.1 is measurably slower; 
probably not by enough for anyone to care, but 1.5 is good enough for me.)


Good if it fails quick enough for you.


Attached a patch with the zipf doc update & the TAP test parameter change.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 15ee7c0f2b..10285d655b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1613,6 +1613,14 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
   frequently values to the beginning of the interval are drawn.
   The closer to 0 parameter is,
   the flatter (more uniform) the access distribution.
+  The distribution is such that, assuming the range starts from 1,
+  the ratio of probability of drawing k versus
+  drawing k+1 is
+  ((k+1)/k)**parameter.
+  For instance random_zipfian(1, ..., 2.5) draws
+  value 1 about (2/1)**2.5 = 5.66 times more frequently
+  than 2, which itself is drawn (3/2)*2.5 = 2.76 times more
+  frequently than 3, and so on.
  
 

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c87748086a..c0cdfbf5f7 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -471,7 +471,7 @@ for my $i (1, 2)
 \set ur random(1000, 1999)
 \set er random_exponential(2000, 2999, 2.0)
 \set gr random_gaussian(3000, 3999, 3.0)
-\set zr random_zipfian(4000, 4999, 2.5)
+\set zr random_zipfian(4000, 4999, 1.5)
 INSERT INTO seeded_random(seed, rand, val) VALUES
   (:random_seed, 'uniform', :ur),
   (:random_seed, 'exponential', :er),

Re: Typo: llvm.cpp files identified as llvm.c

Hi,

On 2019-01-22 13:43:32 +0900, Amit Langote wrote:
> Attached find a patch to fix $subject.

Thanks, pushed to master and 11.

Greetings,

Andres Freund

Re: Install JIT headers

Hi,

On 2019-01-22 11:08:40 -0800, Donald Dong wrote:
> In the standard_planner, we set the proper JIT flags on the resulting
> PlannedStmt node. We also offered a planer hook so extensions can
> add customized planners. The JIT flags in jit/jit.h however, is
> not present on the system. I think it's probably better to
> install the jit headers?
> 
> 
> diff --git a/src/include/Makefile b/src/include/Makefile
> index 6bdfd7db91..041702809e 100644
> --- a/src/include/Makefile
> +++ b/src/include/Makefile
> @@ -19,7 +19,7 @@ all: pg_config.h pg_config_ext.h pg_config_os.h
>  # Subdirectories containing installable headers
>  SUBDIRS = access bootstrap catalog commands common datatype \
> executor fe_utils foreign \
> -   lib libpq mb nodes optimizer parser partitioning postmaster \
> +   jit lib libpq mb nodes optimizer parser partitioning postmaster \
> regex replication rewrite \
> statistics storage tcop snowball snowball/libstemmer tsearch \
> tsearch/dicts utils port port/atomics port/win32 port/win32_msvc \

Seems like a good idea. I think we ought to backpatch that to 11?  Will
do tomorrow if nobody protests.

Greetings,

Andres Freund

Re: [HACKERS] proposal: schema variables

2019-01-22 Thread Pavel Stehule

Hi

fresh rebased patch, no other changes

Pavel


schema-variables-20190122-01.patch.gz
Description: application/gzip

Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-22 Thread Tomas Vondra

On 1/22/19 7:36 PM, Arthur Zakirov wrote:
> пн, 21 янв. 2019 г. в 19:42, Arthur Zakirov :
>>
>> On 21.01.2019 17:56, Tomas Vondra wrote:
>>> I wonder if we could devise some simple cache eviction policy. We don't
>>> have any memory limit GUC anymore, but maybe we could use unload
>>> dictionaries that were unused for sufficient amount of time (a couple of
>>> minutes or so). Of course, the question is when exactly would it happen
>>> (it seems far too expensive to invoke on each dict access, and it should
>>> happen even when the dicts are not accessed at all).
>>
>> Yes, I thought about such feature too. Agree, it could be expensive
>> since we need to scan pg_ts_dict table to get list of dictionaries (we
>> can't scan dshash_table).
>>
>> I haven't a good solution yet. I just had a thought to return
>> max_shared_dictionaries_size. Then we can unload dictionaries (and scan
>> the pg_ts_dict table) that were accessed a lot time ago if we reached
>> the size limit.
>> We can't set exact size limit since we can't release the memory
>> immediately. So max_shared_dictionaries_size can be renamed to
>> shared_dictionaries_threshold. If it is equal to "0" then PostgreSQL has
>> unlimited space for dictionaries.
> 
> I want to propose to clean up segments during vacuum/autovacuum. I'm not
> aware of the politics of cleaning up objects besides relations during
> vacuum/autovacuum. Could be it a good idea?
> 

I doubt that's a good idea, for a couple of reasons. For example, would
it be bound to autovacuum on a particular object or would it happen as
part of each vacuum run? If the dict cleanup happens only when vacuuming
a particular object, then which one? If it happens on each autovacuum
run, then that may easily be far too frequent (it essentially makes the
cases with too frequent autovacuum runs even worse).

But also what happens when there only minimal write activity and thus no
regular autovacuum runs? Surely we should still do the dict cleanup.

> Vacuum might unload dictionaries when total size of loaded dictionaries
> exceeds a threshold. When it happens vacuum scans loaded dictionaries and
> unloads (unpins segments and removes hash table entries) those dictionaries
> which isn't mapped to any backend process (it happens because
> dsm_pin_segment() is called) anymore.
> 

Then why to bound that to autovacuum at all? Why not just make it part
of loading the dictionary?

> max_shared_dictionaries_size can be renamed to
> shared_dictionaries_cleanup_threshold.
> 

That really depends on what exactly the threshold does. If it only
triggers cleanup but does not enforce maximum amount of memory used by
dictionaries, then this name seems OK. If it ensures max amount of
memory, the max_..._size name would be better.

I think there are essentially two ways:

(a) Define max amount of memory available for shared dictionarires, and
come up with an eviction algorithm. This will be tricky, because when
the frequently-used dictionaries need a bit more memory than the limit,
this will result in trashing (evict+load over and over).

(b) Define what "unused" means for dictionaries, and unload dictionaries
that become unused. For example, we could track timestamp of the last
time each dict was used, and decide that dictionaries unused for 5 or
more minutes are unused. And evict those.

The advantage of (b) is that it adopts automatically, more or less. When
you have a bunch of frequently used dictionaries, the amount of shared
memory increases. If you stop using them, it decreases after a while.
And rarely used dicts won't force eviction of the frequently used ones.

cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Allowing extensions to find out the OIDs of their member objects

Hi,

On 2019-01-21 19:41:26 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > It'd be more
> > realistic to create a new zone at UINT32_MAX - something, but that'd
> > likely still conflict in plenty installations (thanks to toast and WITH
> > OIDS tables).   I'm curious as to how to solve that, if you have a
> > sketch - less because of this, and more because I think it's not
> > unlikely that we'll encounter the need for this at some point not too
> > far away.
> 
> I have no idea how we'd move table or type OIDs, given that those are
> potentially on-disk.  (Actually ... are table OIDs really on-disk
> anywhere in user data?  Types yes, but tables?)

Not quite the same, but toast table oids are on-disk, inside toast
datums.

Greetings,

Andres Freund

Install JIT headers

2019-01-22 Thread Donald Dong

Hi,

In the standard_planner, we set the proper JIT flags on the resulting
PlannedStmt node. We also offered a planer hook so extensions can
add customized planners. The JIT flags in jit/jit.h however, is
not present on the system. I think it's probably better to
install the jit headers?


diff --git a/src/include/Makefile b/src/include/Makefile
index 6bdfd7db91..041702809e 100644
--- a/src/include/Makefile
+++ b/src/include/Makefile
@@ -19,7 +19,7 @@ all: pg_config.h pg_config_ext.h pg_config_os.h
 # Subdirectories containing installable headers
 SUBDIRS = access bootstrap catalog commands common datatype \
executor fe_utils foreign \
-   lib libpq mb nodes optimizer parser partitioning postmaster \
+   jit lib libpq mb nodes optimizer parser partitioning postmaster \
regex replication rewrite \
statistics storage tcop snowball snowball/libstemmer tsearch \
tsearch/dicts utils port port/atomics port/win32 port/win32_msvc \

 
Regards,
Donald Dong

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD




The first value is taken about 75% of the time for N=1000 and s=2.5, which
means that a non deterministic implementation would succeed about 0.75² ~
56% of the time on that one.


Right, that's about what we've been seeing on OpenBSD.


Also, the drawing procedure is less efficient when the parameter is close
to 1 because it is more likely to loop,


That might be something to fix, but I agree it's a reason not to go
overboard trying to flatten the test case's distribution right now.


Probably you would have to invent a new method to draw a zipfian 
distribution for that, which would be nice.



If you want something more drastic, using 1.5 instead of 2.5 would reduce
the probability of accidentaly passing the test by chance to about 20%, so
it would fail 80% of the time.


I think your math is off;


Argh. Although I confirm my computation, ISTM that with 1.5 the first 
value as 39% chance of getting out so collision on 15% of cases, second 
value 14% so collision on 2%, ... total cumulated probability about 18%.


1.5 works quite well here.  I saw one failure to produce distinct values 
in 20 attempts.


For 3 failure expected, that is possible.

It's not demonstrably slower than 2.5 either.  (1.1 is measurably 
slower; probably not by enough for anyone to care, but 1.5 is good 
enough for me.)


Good if it fails quick enough for you.

--
Fabien.

Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-22 Thread Arthur Zakirov

пн, 21 янв. 2019 г. в 19:42, Arthur Zakirov :
>
> On 21.01.2019 17:56, Tomas Vondra wrote:
> > I wonder if we could devise some simple cache eviction policy. We don't
> > have any memory limit GUC anymore, but maybe we could use unload
> > dictionaries that were unused for sufficient amount of time (a couple of
> > minutes or so). Of course, the question is when exactly would it happen
> > (it seems far too expensive to invoke on each dict access, and it should
> > happen even when the dicts are not accessed at all).
>
> Yes, I thought about such feature too. Agree, it could be expensive
> since we need to scan pg_ts_dict table to get list of dictionaries (we
> can't scan dshash_table).
>
> I haven't a good solution yet. I just had a thought to return
> max_shared_dictionaries_size. Then we can unload dictionaries (and scan
> the pg_ts_dict table) that were accessed a lot time ago if we reached
> the size limit.
> We can't set exact size limit since we can't release the memory
> immediately. So max_shared_dictionaries_size can be renamed to
> shared_dictionaries_threshold. If it is equal to "0" then PostgreSQL has
> unlimited space for dictionaries.

I want to propose to clean up segments during vacuum/autovacuum. I'm not
aware of the politics of cleaning up objects besides relations during
vacuum/autovacuum. Could be it a good idea?

Vacuum might unload dictionaries when total size of loaded dictionaries
exceeds a threshold. When it happens vacuum scans loaded dictionaries and
unloads (unpins segments and removes hash table entries) those dictionaries
which isn't mapped to any backend process (it happens because
dsm_pin_segment() is called) anymore.

max_shared_dictionaries_size can be renamed to
shared_dictionaries_cleanup_threshold.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: COPY FROM WHEN condition

Hi,

On 2019-01-22 18:35:21 +0100, Tomas Vondra wrote:
> On 1/21/19 11:15 PM, Tomas Vondra wrote:
> > On 1/21/19 7:51 PM, Andres Freund wrote:
> >> I'm *not* convinced by this. I think it's bad enough that we do this for
> >> normal COPY, but for WHEN, we could end up *never* resetting before the
> >> end. Consider a case where a single tuple is inserted, and then *all*
> >> rows are filtered.  I think this needs a separate econtext that's reset
> >> every round. Or alternatively you could fix the code not to rely on
> >> per-tuple not being reset when tuples are buffered - that actually ought
> >> to be fairly simple.
> >>
> > 
> > I think separating the per-tuple and per-batch contexts is the right
> > thing to do, here. It seems the batching was added somewhat later and
> > using the per-tuple context is rather confusing.
> > 
> 
> OK, here is a WIP patch doing that. It creates a new "batch" context,
> and allocates tuples in it (instead of the per-tuple context). The
> per-tuple context is now reset always, irrespectedly of nBufferedTuples.
> And the batch context is reset every time the batch is emptied.
> 
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.
> 
> Overall, separating the contexts makes it quite a bit clearer. I'm not
> entirely happy about the per-tuple context being "implicit" (hidden in
> executor context) while the batch context being explicitly created, but
> there's not much I can do about that.

I think the extra copy is probably OK for now - as part of the pluggable
storage series I've converted COPY to use slots, which should make that
less of an issue - I've not done that, but I actually assume we could
remove the whole batch context afterwards.

Greetings,

Andres Freund

Re: Refactoring the checkpointer's fsync request queue

Hi,,

On 2019-01-22 08:27:48 -0600, Kevin Grittner wrote:
> With the help of VMware's Dirk Hohndel (VMware's Chief Open Source
> Officer, a VP position near the top of the organization, and a
> personal friend of Linus), I have been fortunate enough to make
> contact directly with Linus Torvalds to discuss this issue.  In emails
> to me he has told me that this patch is no longer provisional:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fff75eb2a08c2ac96404a2d79685668f3cf5a7a3

Unfortunately, unless something has changed recently, that patch is
*not* sufficient to really solve the issue - we don't guarantee that
there's always an fd preventing the necessary information from being
evicted from memory:

 Note that we might still lose the error if the inode gets evicted from
 the cache before anything can reopen it, but that was the case before
 errseq_t was merged. At LSF/MM we had some discussion about keeping
 inodes with unreported writeback errors around in the cache for longer
 (possibly indefinitely), but that's really a separate problem"

And that's entirely possibly in postgres.  The commit was dicussed on
list too, btw...

Greetings,

Andres Freund

Re: proposal - plpgsql unique statement id

2019-01-22 Thread Pavel Stehule

út 22. 1. 2019 v 14:55 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> There are still a handful of places where a statement is created but no
> stmtid is assigned.  Can you check those?
>
> src/pl/plpgsql/src/pl_exec.c:2815
> src/pl/plpgsql/src/pl_exec.c:4580
>

These statements are "fake" very short life statements without processing
via statement switch.  Should not be counted, because are dynamically
created, dropped, and stmtid should be 0 or -1 (that means it should be int
again).
Now, for both cases, the stmtid is 0, due memset to 0.


> src/pl/plpgsql/src/pl_gram.y:907
>

It was missing, fixed, thank you for check

Regards

Pavel



> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 2454386af8..e7de54fc93 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -368,6 +368,8 @@ do_compile(FunctionCallInfo fcinfo,
 
 	function->fn_prokind = procStruct->prokind;
 
+	function->nstatements = 0;
+
 	/*
 	 * Initialize the compiler, particularly the namespace stack.  The
 	 * outermost namespace contains function parameters and other special
@@ -911,6 +913,8 @@ plpgsql_compile_inline(char *proc_source)
  true);
 	function->found_varno = var->dno;
 
+	function->nstatements = 0;
+
 	/*
 	 * Now parse the function's text
 	 */
@@ -1020,6 +1024,7 @@ add_dummy_return(PLpgSQL_function *function)
 
 		new = palloc0(sizeof(PLpgSQL_stmt_block));
 		new->cmd_type = PLPGSQL_STMT_BLOCK;
+		new->stmtid = ++function->nstatements;
 		new->body = list_make1(function->action);
 
 		function->action = new;
@@ -1031,6 +1036,7 @@ add_dummy_return(PLpgSQL_function *function)
 
 		new = palloc0(sizeof(PLpgSQL_stmt_return));
 		new->cmd_type = PLPGSQL_STMT_RETURN;
+		new->stmtid = ++function->nstatements;
 		new->expr = NULL;
 		new->retvarno = function->out_param_varno;
 
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 59063976b3..20aded93aa 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -414,6 +414,7 @@ pl_block		: decl_sect K_BEGIN proc_sect exception_sect K_END opt_label
 
 		new->cmd_type	= PLPGSQL_STMT_BLOCK;
 		new->lineno		= plpgsql_location_to_lineno(@2);
+		new->stmtid		= ++plpgsql_curr_compile->nstatements;
 		new->label		= $1.label;
 		new->n_initvars = $1.n_initvars;
 		new->initvarnos = $1.initvarnos;
@@ -905,6 +906,7 @@ stmt_perform	: K_PERFORM expr_until_semi
 		new = palloc0(sizeof(PLpgSQL_stmt_perform));
 		new->cmd_type = PLPGSQL_STMT_PERFORM;
 		new->lineno   = plpgsql_location_to_lineno(@1);
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
 		new->expr  = $2;
 
 		$$ = (PLpgSQL_stmt *)new;
@@ -918,6 +920,7 @@ stmt_call		: K_CALL
 		new = palloc0(sizeof(PLpgSQL_stmt_call));
 		new->cmd_type = PLPGSQL_STMT_CALL;
 		new->lineno = plpgsql_location_to_lineno(@1);
+		new->stmtid = ++plpgsql_curr_compile->nstatements;
 		new->expr = read_sql_stmt("CALL ");
 		new->is_call = true;
 
@@ -932,6 +935,7 @@ stmt_call		: K_CALL
 		new = palloc0(sizeof(PLpgSQL_stmt_call));
 		new->cmd_type = PLPGSQL_STMT_CALL;
 		new->lineno = plpgsql_location_to_lineno(@1);
+		new->stmtid = ++plpgsql_curr_compile->nstatements;
 		new->expr = read_sql_stmt("DO ");
 		new->is_call = false;
 
@@ -947,6 +951,7 @@ stmt_assign		: assign_var assign_operator expr_until_semi
 		new = palloc0(sizeof(PLpgSQL_stmt_assign));
 		new->cmd_type = PLPGSQL_STMT_ASSIGN;
 		new->lineno   = plpgsql_location_to_lineno(@1);
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
 		new->varno = $1->dno;
 		new->expr  = $3;
 
@@ -962,6 +967,7 @@ stmt_getdiag	: K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';'
 		new = palloc0(sizeof(PLpgSQL_stmt_getdiag));
 		new->cmd_type = PLPGSQL_STMT_GETDIAG;
 		new->lineno   = plpgsql_location_to_lineno(@1);
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
 		new->is_stacked = $2;
 		new->diag_items = $4;
 
@@ -1149,6 +1155,7 @@ stmt_if			: K_IF expr_until_then proc_sect stmt_elsifs stmt_else K_END K_IF ';'
 		new = palloc0(sizeof(PLpgSQL_stmt_if));
 		new->cmd_type	= PLPGSQL_STMT_IF;
 		new->lineno		= plpgsql_location_to_lineno(@1);
+		new->stmtid		= ++plpgsql_curr_compile->nstatements;
 		new->cond		= $2;
 		new->then_body	= $3;
 		new->elsif_list = $4;
@@ -1253,6 +1260,7 @@ stmt_loop		: opt_loop_label K_LOOP loop_body
 		new = palloc0(sizeof(PLpgSQL_stmt_loop));
 		new->cmd_type = PLPGSQL_STMT_LOOP;
 		new->lineno   = plpgsql_location_to_lineno(@2);
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
 		new->label	  = $1;
 		new->body	  = $3.stmts;
 
@@ -1270,6 +1278,7 @@ stmt_while		: opt_loop_label K_WHILE

Re: Thread-unsafe coding in ecpg




On January 22, 2019 9:50:19 AM PST, Andrew Dunstan 
 wrote:
>
>On 1/21/19 10:00 PM, Andrew Dunstan wrote:
>> On 1/21/19 3:25 PM, Tom Lane wrote:
>>> "Joshua D. Drake"  writes:
 On 1/21/19 12:05 PM, Tom Lane wrote:
> Is there a newer version of mingw that does have this
>functionality?
 Apparently this can be done with thee 64bit version:

>https://stackoverflow.com/questions/33647271/how-to-use-configthreadlocale-in-mingw
>>> Hmm, the followup question makes it sound like it still didn't work
>:-(.
>>>
>>> However, since the mingw build is autoconf-based, seems like we can
>>> install a configure check instead of guessing.  Will make it so.
>>>
>>> Task for somebody else: run a MinGW64 buildfarm member.
>>>
>>> 
>>
>> I could set that up in just about two shakes of a lamb's tail - I
>have a
>> script to do so all tested on vagrant/aws within the last few months.
>>
>>
>> What I don't have is resources. My Windows resources are pretty much
>> tapped out. I would need either some modest hardware or a Windows VM
>> somewhere in the cloud - could be anywhere but I'm most at home on
>AWS.
>>
>
>
>Incidentally, jacana *is* running mingw64, but possibly not a young
>enough version. I just ran a test on an up to date version and it found
>the config setting happily, and passed the make stage.
>
>
>I can look at upgrading jacana to a more modern compiler. The version
>it's running is 4.9.1, apparently built around Oct 2014, so it's not
>that old.

The thread about the introduction is from 2016, so that's kind of expected.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Thread-unsafe coding in ecpg

2019-01-22 Thread Andrew Dunstan



On 1/21/19 10:00 PM, Andrew Dunstan wrote:
> On 1/21/19 3:25 PM, Tom Lane wrote:
>> "Joshua D. Drake"  writes:
>>> On 1/21/19 12:05 PM, Tom Lane wrote:
 Is there a newer version of mingw that does have this functionality?
>>> Apparently this can be done with thee 64bit version:
>>> https://stackoverflow.com/questions/33647271/how-to-use-configthreadlocale-in-mingw
>> Hmm, the followup question makes it sound like it still didn't work :-(.
>>
>> However, since the mingw build is autoconf-based, seems like we can
>> install a configure check instead of guessing.  Will make it so.
>>
>> Task for somebody else: run a MinGW64 buildfarm member.
>>
>>  
>
> I could set that up in just about two shakes of a lamb's tail - I have a
> script to do so all tested on vagrant/aws within the last few months.
>
>
> What I don't have is resources. My Windows resources are pretty much
> tapped out. I would need either some modest hardware or a Windows VM
> somewhere in the cloud - could be anywhere but I'm most at home on AWS.
>


Incidentally, jacana *is* running mingw64, but possibly not a young
enough version. I just ran a test on an up to date version and it found
the config setting happily, and passed the make stage.


I can look at upgrading jacana to a more modern compiler. The version
it's running is 4.9.1, apparently built around Oct 2014, so it's not
that old.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Strange query behaviour

2019-01-22 Thread Isaac Morland

I'm finding a massive difference in query execution time between two
queries that should be identical:

Very slow:
select ... from x natural left join y where y is null

Fast:
select ... from x natural left join y where y.primary_key_column is null

A fact that I suspect is important is that y has a column whose contents is
PDFs with a total size of  35608033659. However, I can query that size
using a query that looks like this:

select sum (length (pdf_field_1) + length (pdf_field_2)) from y

This runs very fast (2.8ms for 2324 rows).

So it is as if checking the whole tuple for NULL requires reading the PDF
bytea columns, but checking just the primary key for NULL or even reading
the lengths of the PDFs does not.

For the moment I'm going to fix it by just using "y.primary_key_column IS
NULL" instead of "y IS NULL" where I want to check whether I have a row
from y corresponding to a given row in x. But this seems like strange
behaviour. I can think of a couple of potential enhancements that this
suggests:

1) when checking an entire row for null, start with a primary key field or
other NOT NULL field. In the common case of checking what happened with a
left join, this is all that needs to be done - either there is a row, in
which case the field cannot be NULL, or there is no row and all the other
fields must also be NULL.

2) when checking a field for NULL, is it really necessary to load the field
contents? It feels like whether or not a value is NULL should be possible
to determine without de-toasting (if I have the right terminology).

Any ideas anybody might have would be much appreciated.

Re: COPY FROM WHEN condition

2019-01-22 Thread Tomas Vondra



On 1/21/19 11:15 PM, Tomas Vondra wrote:
> 
> 
> On 1/21/19 7:51 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2019-01-21 16:22:11 +0100, Tomas Vondra wrote:
>>>
>>>
>>> On 1/21/19 4:33 AM, Tomas Vondra wrote:


 On 1/21/19 3:12 AM, Andres Freund wrote:
> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>
>>>
>>> On 1/20/19 8:24 PM, Andres Freund wrote:
 Hi,

 On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>
>>>
>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>> >> > wrote:
>>>
>>>
>>>  Can you also update the docs to mention that the functions 
>>> called from
>>>  the WHERE clause does not see effects of the COPY itself?
>>>
>>>
>>> /Of course, i  also add same comment to insertion method selection
>>> /
>>
>> FWIW I've marked this as RFC and plan to get it committed this week.
>>
>
> Pushed, thanks for the patch.

 While rebasing the pluggable storage patch ontop of this I noticed that
 the qual appears to be evaluated in query context. Isn't that a bad
 idea? ISMT it should have been evaluated a few lines above, before the:

/* Triggers and stuff need to be invoked in query 
 context. */
MemoryContextSwitchTo(oldcontext);

 Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?

>>>
>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but 
>>> I'll
>>> fix that tomorrow.
>>
>> NP. On second thought, the problem is probably smaller than I thought at
>> first, because ExecQual() switches to the econtext's per-tuple memory
>> context. But it's only reset once for each batch, so there's some
>> wastage. At least worth a comment.
>
> I'm tired, but perhaps its actually worse - what's being reset currently
> is the ESTate's per-tuple context:
>
>   if (nBufferedTuples == 0)
>   {
>   /*
>* Reset the per-tuple exprcontext. We can only do this 
> if the
>* tuple buffer is empty. (Calling the context the 
> per-tuple
>* memory context is a bit of a misnomer now.)
>*/
>   ResetPerTupleExprContext(estate);
>   }
>
> but the quals are evaluated in the ExprContext's:
>
> ExecQual(ExprState *state, ExprContext *econtext)
> ...
>   ret = ExecEvalExprSwitchContext(state, econtext, );
>
>
> which is created with:
>
> /* Get an EState's per-output-tuple exprcontext, making it if first use */
> #define GetPerTupleExprContext(estate) \
>   ((estate)->es_per_tuple_exprcontext ? \
>(estate)->es_per_tuple_exprcontext : \
>MakePerTupleExprContext(estate))
>
> and creates its own context:
>   /*
>* Create working memory for expression evaluation in this context.
>*/
>   econtext->ecxt_per_tuple_memory =
>   AllocSetContextCreate(estate->es_query_cxt,
> "ExprContext",
> 
> ALLOCSET_DEFAULT_SIZES);
>
> so this is currently just never reset.

 Actually, no. The ResetPerTupleExprContext boils down to

 MemoryContextReset((econtext)->ecxt_per_tuple_memory)

 and ExecEvalExprSwitchContext does this

 MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

 So it's resetting the right context, although only on batch boundary.
>>
> Seems just using ExecQualAndReset() ought to be sufficient?
>

 That may still be the right thing to do.

>>>
>>> Actually, no, because that would reset the context far too early (and
>>> it's easy to trigger segfaults). So the reset would have to happen after
>>> processing the row, not this early.
>>
>> Yea, sorry, I was too tired yesterday evening. I'd spent 10h splitting
>> up the pluggable storage patch into individual pieces...
>>
>>
>>> But I think the current behavior is actually OK, as it matches what we
>>> do for defexprs. And the comment before ResetPerTupleExprContext says this:
>>>
>>> /*
>>>  * Reset the per-tuple exprcontext. We can only do this if the
>>>  * tuple buffer is empty. (Calling the context the per-tuple
>>>  * memory context is a bit of a misnomer now.)
>>>  */
>>>
>>> So the per-tuple context is not quite per-tuple anyway. Sure, we might
>>> rework that but I don't

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

Fabien COELHO  writes:
>> I'm not following this argument.  The test case is basically useless
>> for its intended purpose with that parameter, because it's highly
>> likely that the failure mode it's supposedly checking for will be
>> masked by the "random" function's tendency to spit out the same
>> value all the time.

> The first value is taken about 75% of the time for N=1000 and s=2.5, which 
> means that a non deterministic implementation would succeed about 0.75² ~ 
> 56% of the time on that one.

Right, that's about what we've been seeing on OpenBSD.

> Also, the drawing procedure is less efficient when the parameter is close 
> to 1 because it is more likely to loop,

That might be something to fix, but I agree it's a reason not to go
overboard trying to flatten the test case's distribution right now.

> If you want something more drastic, using 1.5 instead of 2.5 would reduce 
> the probability of accidentaly passing the test by chance to about 20%, so 
> it would fail 80% of the time.

I think your math is off; 1.5 works quite well here.  I saw one failure
to produce distinct values in 20 attempts.  It's not demonstrably slower
than 2.5 either.  (1.1 is measurably slower; probably not by enough for
anyone to care, but 1.5 is good enough for me.)

regards, tom lane

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-22 Thread James Coleman

> > This comment seems wrong:
> >
> > + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
> > + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply 
> > non-falsity.
> >
> > "non-falsity does not imply non-falsity"?  I suppose one of those
> > negations should be different ...
>
> Earlier in the file weak implication (comments above
> predicate_implied_by) is defined as "non-falsity of A implies
> non-falsity of B". In this example we have NULL for A (non-false) but
> false for B, so that definition doesn't hold. So I think the comment
> is accurate, but I can reword if you have an idea of what you'd like
> to see (I've tweaked a bit in the attached patch to start).

I forgot to update in v8 so attaching v9.

James Coleman


saop_is_not_null-v9.patch
Description: Binary data

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-22 Thread James Coleman

On Tue, Jan 22, 2019 at 4:26 AM Alvaro Herrera  wrote:
>
> Hello, I gave this patch a very quick scan.  I didn't check the actual
> logic behind it.
>
> This comment seems wrong:
>
> + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
> + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply 
> non-falsity.
>
> "non-falsity does not imply non-falsity"?  I suppose one of those
> negations should be different ...

Earlier in the file weak implication (comments above
predicate_implied_by) is defined as "non-falsity of A implies
non-falsity of B". In this example we have NULL for A (non-false) but
false for B, so that definition doesn't hold. So I think the comment
is accurate, but I can reword if you have an idea of what you'd like
to see (I've tweaked a bit in the attached patch to start).

> I think the name clause_proved_for_null_test() is a bit weird, being in
> the past tense.  I'd maybe change "proved" to "proves".

Changed.

> s/exppresions/expresions/ in the test files.

Fixed.

James Coleman

saop_is_not_null-v8.patch
Description: Binary data

Re: Delay locking partitions during INSERT and UPDATE

2019-01-22 Thread John Naylor

The cfbot reports this patch no longer applies.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_dump multi VALUES INSERT




Hello Surafel,


okay .thank you for explaining. i attach a patch corrected as such


About this v9: applies cleanly, compiles, global and local "make check" 
ok.


The option is not exercise in the TAP tests. I'd suggest that it should be 
tested on a small table with zero, 1, more than the value set number of 
rows. Maybe use -t and other options to reduce the output to the minimum.


About the documentation:

 +   When using --rows-per-insert, this allows the maximum 
number
 +   of rows per INSERT statement to be specified.

I'd suggest a more direct and simple style, something like:

  Set the maximum number of rows per INSERT statement.
  This option implies --inserts.
  Default to 1.

About the help message, the new option expects an argument, but it does 
not show:


 +  printf(_("  --rows-per-insertnumber of row per INSERT command\n"));

About the code, maybe avoid using an int as a bool, eg:

... && !dopt.dump_inserts_multiple)
 -> ... && dopt.dump_inserts_multiple == 0)

Spacing around operators, eg: "!=1)" -> "!= 1)"

ISTM that the "dump_inserts_multiple" field is useless, you can reuse 
"dump_inserts" instead, i.e. --inserts sets it to 1 *if zero*, and 
--rows-per-inserts=XXX sets it to XXX. That would simplify the code 
significantly.


ISTM that there are indentation issues, eg on added

  if (dopt->dump_inserts_multiple == 1) {

The old code is not indented properly.

--
Fabien.

Re: pg_dump multi VALUES INSERT

Nice stuff.

Is it possible to avoid the special case for 0 columns by using the
UNION ALL syntax I showed?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rare SSL failures on eelpout

Thomas Munro  writes:
> Hmm.  Why is psql doing two sendto() calls without reading a response
> in between, when it's possible for the server to exit after the first,
> anyway?  Seems like a protocol violation somewhere?

Keep in mind this is all down inside the SSL handshake, so if any
protocol is being violated, it's theirs not ours.

My gut reaction is that this probably indicates that in the "certificate
verify failed" code path, we're exiting the server too soon without
letting openssl finish out its handshake fully.  But that could be all
wet, or even if true it might not be convenient to postpone exit (e.g.,
we'd have to save the SSL error code somewhere, I suspect).

The whole thing reminds me of the recent bug #15598:

https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk

regards, tom lane

Re: TestForOldSnapshot() seems to be in the wrong place

On Fri, Jan 18, 2019 at 3:13 PM Andres Freund  wrote:

> TestForOldSnapshot()

> to me it seems wrong from a layering POV to have this in
> bufmgr.[ch]. Due to the inline functions this makes bufmgr.h have a
> dependency on snapmgr.h and tqual.h, which to me seems wrong from a
> layer POV.  Why isn't this declared in snapmgr.h or a new header file?

I will take a look.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

On Fri, 18 Jan 2019 at 10:03, Tomas Vondra  wrote:
> thanks for the review. The attached patches address most of the issues
> mentioned in the past several messages, both in the MCV and histogram parts.

I made another pass over the 0001 patch. I've not read through mcv.c
again yet. Will try to get to that soon.

0001-multivariate-MCV-lists-20190117.patch

1. The following mentions "multiple functions", but lists just 1 function.

To inspect statistics defined using CREATE STATISTICS
command, PostgreSQL provides multiple functions.

2. There's a mix of usages of MCV and
MCV around the docs. Should these be the same?

3. analyze_mcv_list() is modified to make it an external function, but
it's not used anywhere out of analyze.c

4. The following can be simplified further:

* We can also leave the record as it is if there are no statistics
* including the datum values, like for example MCV lists.
*/
if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
reset_stats = true;

/*
* If we can leave the statistics as it is, just do minimal cleanup
* and we're done.
*/
if (!reset_stats)
{
ReleaseSysCache(oldtup);
return;
}

to just:

/*
* When none of the defined statistics types contain datum values
* from the table's columns then there's no need to reset the stats.
* Functional dependencies and ndistinct stats should still hold true.
*/
if (!statext_is_kind_built(oldtup, STATS_EXT_MCV))
{
ReleaseSysCache(oldtup);
return;
}

5. "so that we can ignore them below." seems misplaced now since
you've moved all the code below into clauselist_selectivity_simple().
   Maybe you can change it to "so that we can inform
clauselist_selectivity_simple about clauses that it should ignore" ?

* filled with the 0-based list positions of clauses used that way, so
* that we can ignore them below.

6. README.mcv: multi-variate -> multivariate

are large the list may be quite large. This is especially true for multi-variate

7. README.mcv: similar -> a similar

it impossible to use anyarrays. It might be possible to produce similar

8. I saw you added IS NOT NULL to README.mcv, but README just mentions:

(b) MCV lists - equality and inequality clauses (AND, OR, NOT), IS NULL

Should that mention IS NOT NULL too?

9. The header comment for build_attnums_array() claims that it
"transforms an array of AttrNumber values into a bitmap", but it does
the opposite.

 * Transforms an array of AttrNumber values into a bitmap.

10. The following Assert is not entirely useless.  The bitmapset could
have a 0 member, but it can't store negative values.

while ((j = bms_next_member(attrs, j)) >= 0)
{
/* user-defined attributes only */
Assert(AttrNumberIsForUserDefinedAttr(j));

Just checking you thought of that when you added it?

11. XXX comments are normally reserved for things we may wish to
reconsider later, but the following seems more like a "Note:"

 * XXX All the memory is allocated in a single chunk, so that the caller
 * can simply pfree the return value to release all of it.

12. In statext_is_compatible_clause_internal() there's still a comment
that mentions "Var op Const", but Const op Var is also okay too.

13. This is not fall-through.  Generally, such a comment is reserved
to confirm that the "break;" is meant to be missing.

default:
/* fall-through */
return false;

https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/
mentions various comment patterns that are used for that case.  Your
case seems misplaced since it's right about a return, and not another
case.

14. The header comment for statext_is_compatible_clause() is not
accurate.  It mentions only opexprs with equality operations are
allowed, but none of those are true.

 * Only OpExprs with two arguments using an equality operator are supported.
 * When returning True attnum is set to the attribute number of the Var within
 * the supported clause.

15. statext_clauselist_selectivity(): "a number" -> "the number" ?

 * Selects the best extended (multi-column) statistic on a table (measured by
 * a number of attributes extracted from the clauses and covered by it), and

16. I understand you're changing this to a bitmask in the 0002 patch,
but int is the wrong type here;
/* we're interested in MCV lists */
int types = STATS_EXT_MCV;

Maybe just pass the STATS_EXT_MCV directly, or at least make it a char.

17. bms_membership(clauses_attnums) != BMS_MULTIPLE seems better here.
It can stop once it finds 2. No need to count them all.

/* We need at least two attributes for MCV lists. */
if (bms_num_members(clauses_attnums) < 2)
return 1.0;

18. The following comment in statext_is_compatible_clause_internal()
does not seem to be true.  I see OpExprs are supported and NULL test,
including others too.

/* We only support plain Vars for now */

19. The header comment for clauselist_selectivity_simple() does not
mention what estimatedclauses is for.

20. New line. Also, missing "the" before "maximum"

+ * We
+ * iteratively search for

Re: Refactoring the checkpointer's fsync request queue

On Tue, Jan 22, 2019 at 8:27 AM Kevin Grittner  wrote:

> With the help of VMware's Dirk Hohndel (VMware's Chief Open Source
> Officer, a VP position near the top of the organization, and a
> personal friend of Linus), I have been fortunate enough to make
> contact directly with Linus Torvalds to discuss this issue.  In emails
> to me he has told me that this patch is no longer provisional:
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fff75eb2a08c2ac96404a2d79685668f3cf5a7a3
>
> Linus has given me permission to quote him, so here is a quote from an
> email he sent 2019-01-17:

> > That commit (b4678df184b3: "errseq: Always report a writeback error
> > once") was already backported to the stable trees (4.14 and 4.16), so
> > yes, everything should be fine. We did indeed miss old errors for a
> > while.

Sorry, but somehow I got the parent link rather that the intended
commit.  Linus got it right, of course.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b4678df184b314a2bd47d2329feca2c2534aa12b
> --
> Kevin Grittner
> VMware vCenter Server
> https://www.vmware.com/



--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: Refactoring the checkpointer's fsync request queue

With the help of VMware's Dirk Hohndel (VMware's Chief Open Source
Officer, a VP position near the top of the organization, and a
personal friend of Linus), I have been fortunate enough to make
contact directly with Linus Torvalds to discuss this issue.  In emails
to me he has told me that this patch is no longer provisional:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fff75eb2a08c2ac96404a2d79685668f3cf5a7a3

Linus has given me permission to quote him, so here is a quote from an
email he sent 2019-01-17:

> That commit (b4678df184b3: "errseq: Always report a writeback error
> once") was already backported to the stable trees (4.14 and 4.16), so
> yes, everything should be fine. We did indeed miss old errors for a
> while.
>
> > The latest information I could find on this said this commit was 
> > "provisional" but
> > also that it might be back-patched to 4.13 and on.  Can you clarify the 
> > status of
> > this patch in either respect?
>
> It was definitely backported to both 4.14 and 4.16, I see it in my
> email archives.
>
> The bug may remain in 4.13, but that isn't actually maintained any
> more, and I don't think any distro uses it (distros tend to use the
> long-term stable kernels that are maintained, or sometimes maintain
> their own patch queue).

I think that eliminates the need for this patch.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: proposal - plpgsql unique statement id

2019-01-22 Thread Peter Eisentraut

There are still a handful of places where a statement is created but no
stmtid is assigned.  Can you check those?

src/pl/plpgsql/src/pl_exec.c:2815
src/pl/plpgsql/src/pl_exec.c:4580
src/pl/plpgsql/src/pl_gram.y:907

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_dump multi VALUES INSERT

2019-01-22 Thread Surafel Temesgen

On Tue, Jan 22, 2019 at 3:35 PM David Rowley 
wrote:

> On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen 
> wrote:
> > if you specified --inserts option you already specified the number of
> rows per statement which is 1 .
> > if more than one rows per statement needed it must be specified using
> --rows-per-insert
> > and specifying one row per statement using --inserts option at the same
> time specify
> > different number of rows per statement with --rows-per-insert option
> seems conflicting to me.
>
> So you're saying an INSERT, where you insert multiple rows in a single
> statement is not an insert? That logic surprises me.  --inserts makes
> pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
> INSERTs. I'd love to know why you think there's some conflict with
> that.
>
> By your logic, you could say --column-inserts and --inserts should
> also conflict, but they don't. --column-inserts happens to be coded to
> imply --inserts. I really suggest we follow the lead from that. Doing
> it this way reduces the complexity of the code where we build the
> INSERT statement.  Remember that a patch that is overly complex has
> much less chance of making it.  I'd really suggest you keep this as
> simple as possible.
>
>
okay i understand it now .Fabien also comment about it uptread i
misunderstand it as
using separate new option.

It also seems perfectly logical to me to default --rows-per-insert to
> 1 so that when --inserts is specified we do 1 row per INSERT. If the
> user changes that value to something higher then nothing special needs
> to happen as the function building the INSERT statement will always be
> paying attention to whatever the --rows-per-insert value is set to.
> That simplifies the logic meaning you don't need to check if --inserts
> was specified.
>
>
okay .thank you for explaining. i attach a patch corrected as such
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 9e0bb93f08..4195fb81a2 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --rows-per-insert
+  
+   
+When using --rows-per-insert, this allows the maximum number
+of rows per INSERT statement to be specified.
+   
+  
+ 
+
  
   --load-via-partition-root
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 4a2e122e2d..73a243ecb0 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -72,6 +72,7 @@ typedef struct _restoreOptions
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
@@ -144,6 +145,7 @@ typedef struct _dumpOptions
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2b1a94733b..e23f5cc70f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -313,6 +313,7 @@ main(int argc, char **argv)
 	int			plainText = 0;
 	ArchiveFormat archiveFormat = archUnknown;
 	ArchiveMode archiveMode;
+	char   *p;
 
 	static DumpOptions dopt;
 
@@ -359,6 +360,7 @@ main(int argc, char **argv)
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"if-exists", no_argument, _exists, 1},
 		{"inserts", no_argument, _inserts, 1},
+		{"rows-per-insert", required_argument, NULL, 8},
 		{"lock-wait-timeout", required_argument, NULL, 2},
 		{"no-tablespaces", no_argument, , 1},
 		{"quote-all-identifiers", no_argument, _all_identifiers, 1},
@@ -557,6 +559,27 @@ main(int argc, char **argv)
 dosync = false;
 break;
 
+			case 8:			/* inserts row number */
+errno = 0;
+dopt.dump_inserts_multiple = strtol(optarg, , 10);
+if (p == optarg || *p != '\0')
+{
+	write_msg(NULL, "argument of --rows-per-insert must be a number\n");
+	exit_nicely(1);
+}
+if (errno == ERANGE)
+{
+	write_msg(NULL, "argument of --rows-per-insert exceeds integer range.\n");
+	exit_nicely(1);
+}
+if (dopt.dump_inserts_multiple <= 0)
+{
+	write_msg(NULL, "argument of --rows-per-insert must be positive number\n");
+	exit_nicely(1);
+}
+
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -584,6 +607,9 @@ main(int argc, char **argv)
 	if (dopt.column_inserts)
 		dopt.dump_inserts = 1;
 
+	if (dopt.dump_inserts && !dopt.dump_inserts_multiple)
+		dopt.dump_inserts_multiple = 1;
+
 	/*
 	 * Binary upgrade mode implies dumping sequence data even in schema-only
 	 * mode.  This is not exposed as a separate option, but kept separate
@@ -607,8 +633,9 @@ main(int argc, char **argv)
 	if (dopt.if_exists &&

Re: [HACKERS] Block level parallel vacuum

2019-01-22 Thread Haribabu Kommi

On Fri, Jan 18, 2019 at 11:42 PM Masahiko Sawada 
wrote:

> On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
>  wrote:
> >
> >
> > On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada 
> wrote:
> >>
> >>
> >> Rebased.
> >
> >
> > I started reviewing the patch, I didn't finish my review yet.
> > Following are some of the comments.
>
> Thank you for reviewing the patch.
>
> >
> > +PARALLEL  class="parameter">N
> > +
> > + 
> > +  Execute index vacuum and cleanup index in parallel with
> >
> > I doubt that user can understand the terms index vacuum and cleanup
> index.
> > May be it needs some more detailed information.
> >
>
> Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
> phases. So maybe adding the referencint to it would work.
>

OK.


> >
> > -typedef enum VacuumOption
> > +typedef enum VacuumOptionFlag
> >  {
> >
> > I don't find the new name quite good, how about VacuumFlags?
> >
>
> Agreed with removing "Option" from the name but I think VacuumFlag
> would be better because this enum represents only one flag. Thoughts?
>

OK.


>
> > postgres=# vacuum (parallel 2, verbose) tbl;
> >
> > With verbose, no parallel workers related information is available.
> > I feel giving that information is required even when it is not parallel
> > vacuum also.
> >
>
> Agreed. How about the folloiwng verbose output? I've added the number
> of launched, planned and requested vacuum workers and purpose (vacuum
> or cleanup).
>
> postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
> 'test' has 3 indexes
> INFO:  vacuuming "public.test"
> INFO:  launched 2 parallel vacuum workers for index vacuum (planned:
> 2, requested: 30)
> INFO:  scanned index "test_idx1" to remove 2000 row versions
> DETAIL:  CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
> INFO:  scanned index "test_idx2" to remove 2000 row versions by
> parallel vacuum worker
> DETAIL:  CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
> INFO:  scanned index "test_idx3" to remove 2000 row versions by
> parallel vacuum worker
> DETAIL:  CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
> INFO:  "test": removed 2000 row versions in 10 pages
> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> INFO:  launched 2 parallel vacuum workers for index cleanup (planned:
> 2, requested: 30)
> INFO:  index "test_idx1" now contains 991151 row versions in 2745 pages
> DETAIL:  2000 index row versions were removed.
> 24 index pages have been deleted, 18 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO:  index "test_idx2" now contains 991151 row versions in 2745 pages
> DETAIL:  2000 index row versions were removed.
> 24 index pages have been deleted, 18 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO:  index "test_idx3" now contains 991151 row versions in 2745 pages
> DETAIL:  2000 index row versions were removed.
> 24 index pages have been deleted, 18 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO:  "test": found 2000 removable, 367 nonremovable row versions in
> 41 out of 4425 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 500
> There were 6849 unused item pointers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
> VACUUM
>

The verbose output is good.

Since the previous patch conflicts with 285d8e12 I've attached the
> latest version patch that incorporated the review comment I got.
>

Thanks for the latest patch. I have some more minor comments.

+  Execute index vacuum and cleanup index in parallel with

Better to use vacuum index and cleanup index? This is in same with
the description of vacuum phases. It is better to follow same notation
in the patch.


+ dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);

With the change, the lazy_space_alloc takes care of initializing the
parallel vacuum, can we write something related to that in the comments.


+ initprog_val[2] = dead_tuples->max_dead_tuples;

dead_tuples variable may need rename for better reading?



+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);
+ else
+ copy_result = true;


I don't see a need for copy_result variable, how about directly using
the updated flag to decide whether to copy or not? Once the result is
copied update the flag.


+use Test::More tests => 34;

I don't find any new tetst are added in this patch.

I am thinking of performance penalty if we use the parallel option of
vacuum on a small sized table?

Regards,
Haribabu Kommi
Fujitsu Australia

Re: Alternative to \copy in psql modelled after \g





  BEGIN;
  UPDATE pgbench_branches SET bbalance = bbalance + 1 RETURNING * \g /bad
  // the update is performed, the transaction is not rollbacked,
  // *but* the output file was not written, "COMMIT" save changes.


if PQexec() could not store the results for lack of memory, it would
yield a PGRES_FATAL_ERROR, then :ERROR would be set to true, and yet
the server-side operation would have been performed. Additionally, If
that BEGIN was not there, the statement would also have been
committed, so its effect would be durable independently of the value
of :ERROR.


Indeed, OOM is a special case when the client is unable to know whether 
the command was actually executed. However ISTM that the connection is 
likely to be aborted, so if there is an explicit transaction it would be 
aborted as well.



My point is that client-side issues already affect :ERROR,
so it can't be assumed that :ERROR=true implies that the SQL
statement did not have effects on the server.


Not from my reading of the doc (which I probably wrote, and did the 
implementation without a clear view of the different client-specific error 
conditions, so all this is really my fault BTW), where I understand that 
ERROR as true says that the SQL failed.



In that sense, the patch in its current state does not break this
guarantee, since it does not exist in the first place.


Sure, but the patch in its current state creates an inconsistency between 
an SQL command with "\g /BAD" and the same command in "COPY (...) TO 
STDOUT \g /BAD". I think that it must at the minimum be consistent.


OTOH I hope that :ERROR = false is a true guarantee that there have been 
no problem whatsoever in the execution of the last statement.



on the contrary I'm basically ok with changing ERROR
documentation and implementation (what I called option 1),


The doc only says:

  ERROR
 true if the last SQL query failed, false if it succeeded.
 See also SQLSTATE.

If the failure to retrieve results is included in "query failed", which
seems a reasonable interpretation to me, it doesn't need to be changed.


I understand "SQL query failed" as a server issue, especially as SQLSTATE 
is the reported command status at the PQ protocol level.



What's not right is "SELECT ... \g /bad" having a different effect on
:ERROR than "COPY... \g /bad".


Yep, I've been saying that.


I plan to follow up on that because there are other problems with
SELECT ... \g anyway, for instance, when a disk full occurs,
it's not reported at all. But that problem is not in the code path
of COPY.


Sure. As there are several bugs (doubtful features) uncovered, a first 
patch could fix "COPY ...TO STDOUT \g file", but probably replicate ERROR 
current behavior however debatable it is (i.e. your patch without the 
ERROR change, which is unrelated to the bug being fixed), and then another 
patch should fix/modify the behavior around ERROR (everywhere and 
consistently), and probably IMHO add an SQL_ERROR.


--
Fabien.

Re: pg_dump multi VALUES INSERT

On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen  wrote:
> if you specified --inserts option you already specified the number of rows 
> per statement which is 1 .
> if more than one rows per statement needed it must be specified using 
> --rows-per-insert
> and specifying one row per statement using --inserts option at the same time 
> specify
> different number of rows per statement with --rows-per-insert option seems 
> conflicting to me.

So you're saying an INSERT, where you insert multiple rows in a single
statement is not an insert? That logic surprises me.  --inserts makes
pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
INSERTs. I'd love to know why you think there's some conflict with
that.

By your logic, you could say --column-inserts and --inserts should
also conflict, but they don't. --column-inserts happens to be coded to
imply --inserts. I really suggest we follow the lead from that. Doing
it this way reduces the complexity of the code where we build the
INSERT statement.  Remember that a patch that is overly complex has
much less chance of making it.  I'd really suggest you keep this as
simple as possible.

It also seems perfectly logical to me to default --rows-per-insert to
1 so that when --inserts is specified we do 1 row per INSERT. If the
user changes that value to something higher then nothing special needs
to happen as the function building the INSERT statement will always be
paying attention to whatever the --rows-per-insert value is set to.
That simplifies the logic meaning you don't need to check if --inserts
was specified.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Alternative to \copy in psql modelled after \g

2019-01-22 Thread Daniel Verite

Fabien COELHO wrote:

> > Now if you download data with SELECT or COPY and we can't even
> > create the file, how is that a good idea to intentionally have the
> > script fail to detect it? What purpose does it satisfy?
> 
> It means that the client knows that the SQL command, and possible 
> associated side effects, were executed server-side, and that if we are in 
> a transaction it is still going on:
> 
>   BEGIN;
>   UPDATE pgbench_branches SET bbalance = bbalance + 1 RETURNING * \g /bad
>   // the update is performed, the transaction is not rollbacked,
>   // *but* the output file was not written, "COMMIT" save changes.

if PQexec() could not store the results for lack of memory, it would
yield a PGRES_FATAL_ERROR, then :ERROR would be set to true, and yet
the server-side operation would have been performed. Additionally, If
that BEGIN was not there, the statement would also have been
committed, so its effect would be durable independently of the value
of :ERROR.

My point is that client-side issues already affect :ERROR,
so it can't be assumed that :ERROR=true implies that the SQL
statement did not have effects on the server.

In that sense, the patch in its current state does not break this
guarantee, since it does not exist in the first place.

OTOH I hope that :ERROR = false is a true guarantee that there
have been no problem whatsoever in the execution of
the last statement.

> on the contrary I'm basically ok with changing ERROR
> documentation and implementation (what I called option 1),

The doc only says:

   ERROR
  true if the last SQL query failed, false if it succeeded. 
  See also SQLSTATE. 

If the failure to retrieve results is included in "query failed", which
seems a reasonable interpretation to me, it doesn't need to be changed.

What's not right is "SELECT ... \g /bad" having a different effect on
:ERROR than "COPY... \g /bad".
I plan to follow up on that because there are other problems with
SELECT ... \g anyway, for instance, when a disk full occurs,
it's not reported at all. But that problem is not in the code path
of COPY.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: add_partial_path() may remove dominated path but still in use

2019-01-22 Thread Kohei KaiGai

Let me remind the thread.
If no more comments, objections, or better ideas, please commit this fix.

Thanks,

2019年1月17日(木) 18:29 Kyotaro HORIGUCHI :
>
> Hello, sorry for the absence.
>
> At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas  wrote 
> in 
> > On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai  wrote:
> > > 2019年1月11日(金) 5:52 Robert Haas :
> > > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai  
> > > > wrote:
> > > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > > front of the generate_gather_paths?
> > > > > If we have no use case for the second hook, here is little necessity
> > > > > to have the post_rel_pathlist_hook() here.
> > > > > (At least, PG-Strom will use the first hook only.)
> > > >
> > > > +1.  That seems like the best way to be consistent with the principle
> > > > that we need to have all the partial paths before generating any
> > > > Gather paths.
> > > >
> > > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > > Please check it.
> >
> > Seems reasonable to me.
>
> Also seems reasonable to me.  The extension can call
> generate_gather_paths redundantly as is but it almost doesn't
> harm, so it is acceptable even in a minor release.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Allowing extensions to find out the OIDs of their member objects

2019-01-22 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> 1. easier to read and maintain

 Tom> The SQL-level API that I'm imagining would look roughly like
 Tom> a command like this at the end of an extension's script:

 Tom> ALTER EXTENSION extname SET MAP
 Tom>   OBJECT 1 IS FUNCTION foo(int, int),
 Tom>   OBJECT 2 IS OPERATOR +(float, float), ...

That's what I thought and I had something similar in mind except not
with numbers.

This is obviously the same situation we have with operator and function
numbers in opclasses right now, which is something I personally find
annoying: the fact that (for example) GiST operator members are assigned
some non-self-documenting number that you can only resolve by looking at
the opclass implementation to find out what it thinks the numbers mean.

-- 
Andrew (irc:RhodiumToad)

Rare SSL failures on eelpout

(Moving this discussion from the pgsql-committers thread "pgsql:
Update ssl test certificates and keys", which is innocent.)

On Wed, Jan 16, 2019 at 10:37 AM Thomas Munro
 wrote:
> On Fri, Jan 4, 2019 at 10:08 AM Thomas Munro
>  wrote:
> > Thanks.  FWIW I've just updated eelpout (a Debian testing BF animal
> > that runs all the extra tests including SSL) to use libssl-dev
> > (instead of libssl1.0-dev), and cleared its accache files.  Let's see
> > if that works...
>
> Since that upgrade (to libssl 1.1.1a-1), there are have been a few
> intermittent failures in the SSL tests on that animal (thanks Tom for
> pointing that out off-list).

I reproduced this manually.  From time to time (less than 1% of the
times I tried), the server hangs up without sending a goodbye message,
thereby causing a failure to match the expected psql error message.
>From the kernel's perspective on the psql side, a working-as-expected
case looks like this:

sendto(3, 
"\27\3\3\2\356\313\334\372\201\353\201h\204\353\240A\4\355\232\340\375\0X\220\326V\374\253\222i\2724"...,
1115, MSG_NOSIGNAL, NULL, 0) = 1115
ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3,
revents=POLLOUT}])
sendto(3, 
"\27\3\3\0f\335\313\224\263\256r\371\215\177\273,N\345\342\200\257\r\321\6\323@\316\241\316\17\204\26"...,
107, MSG_NOSIGNAL, NULL, 0) = 107
ppoll([{fd=3, events=POLLIN|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3,
revents=POLLIN}])
recvfrom(3, "\27\3\3\0\23", 5, 0, NULL, NULL) = 5
recvfrom(3, "I\"\t;\3006\276\347\344]7>\2\234m\340]B\241", 19, 0,
NULL, NULL) = 19
close(3)= 0
write(2, "psql: SSL error: sslv3 alert cer"..., 49psql: SSL error:
sslv3 alert certificate revoked
) = 49

... and the unexpected failure case looks like this:

sendto(3, 
"\27\3\3\2\356/\254\307\277\342G?\321\f\314\245\22\246U\337\263;\203f\302s\306\37\276"...,
1115, MSG_NOSIGNAL, NULL, 0) = 1115
ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3,
revents=POLLOUT|POLLERR|POLLHUP}])
sendto(3, 
"\27\3\3\0f\177\335\20\305\353\234\306\211#\343\321\336\22111J\312\323F\210\6U\331\264GAr"...,
107, MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer)
write(2, "psql: server closed the connecti"..., 199psql: server closed
the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
could not send startup packet: Connection reset by peer
) = 199

To the kernel on the server side, a good case looks like this:

[pid 13740] sendto(8,
"\27\3\3\0\23\252\21\227\2\232s\354\21\243\236\207\301\3B\341\253\306k\346",
24, 0, NULL, 0) = 24
[pid 13740] write(2, "2019-01-22 09:23:04.425 UTC [137"...,
1112019-01-22 09:23:04.425 UTC [13740] [unknown] LOG:  could not
accept SSL connection: certificate verify failed
) = 111
[pid 13740] exit_group(0)   = ?
[pid 13740] +++ exited with 0 +++

I didn't manage to trace a bad case on the server side (maybe the
strace interferes with the required timing), but I noticed that
SSL_shutdown()'s return code (which we don't check) is always -1, and
I noticed that SSL_get_error() after that is always SSL_ERROR_SSL.  I
wondered if we might somehow be reaching exit() when the alert
response is still buffered inside OpenSSL, but that doesn't seem right
-- the port has noblock=0 at that stage.

Hmm.  Why is psql doing two sendto() calls without reading a response
in between, when it's possible for the server to exit after the first,
anyway?  Seems like a protocol violation somewhere?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Pluggable Storage - Andres's take

2019-01-22 Thread Amit Khandekar

On Tue, 22 Jan 2019 at 15:29, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > On Mon, Jan 21, 2019 at 9:33 AM Amit Khandekar  
> > wrote:
> >
> > Regression tests that use \d+ to show the table details might
> > not be interested specifically in table access method. But these will
> > fail if run with a modified default access method.
>
> I see your point, but if a test is not interested specifically in a table am,
> then I guess it wouldn't use a custom table am in the first place, right?

Right. It wouldn't use a custom table am. But I mean, despite not
using a custom table am, the test would fail if the regression runs
with a changed default access method, because the regression output
file has only one particular am value output.

> Anyway, I don't have strong opinion here, so if everyone agrees that 
> HIDE_TABLEAM
> will show/hide access method unconditionally, I'm fine with that.

Yeah, I agree it's subjective.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: [PATCH] pgbench tap tests fail if the path contains a perl special character




  - works around pgbench command splitting on spaces,
   if postgres sources are in a strangely named directory…
   I tested within a directory named "pg .* dir".


I've also tested it with the initial case (including a +) and it works.


Good, thanks for checking!

--
Fabien.

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Hello, I gave this patch a very quick scan.  I didn't check the actual
logic behind it.

This comment seems wrong:

+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply 
non-falsity.

"non-falsity does not imply non-falsity"?  I suppose one of those
negations should be different ...

I think the name clause_proved_for_null_test() is a bit weird, being in
the past tense.  I'd maybe change "proved" to "proves".

s/exppresions/expresions/ in the test files.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] pgbench tap tests fail if the path contains a perl special character

2019-01-22 Thread Raúl Marín Rodríguez

Hi,

>   - works around pgbench command splitting on spaces,
>if postgres sources are in a strangely named directory…
>I tested within a directory named "pg .* dir".

I've also tested it with the initial case (including a +) and it works.


-- 
Raúl Marín Rodríguez
carto.com

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD



Hello Tom,


BTW, did you look at the question of the range of zipfian?


Yep.

I confirmed here that as used in the test case, it's generating a range way 
smaller than the other ones: repeating the insertion snippet 1000x produces 
stats like this: [...]



I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.


Zipf distribution is highly skewed, somehow close to an exponential. To 
reduce the decreasing probability the parameter must be closer to 1, eg 1.05 
or something. However as far as the test is concerned I do not see this as a 
significant issue. I was rather planning to submit a documentation 
improvement to provide more precise hints about how the distribution behaves 
depending on the parameter, and possibly reduce the parameter used in the 
test in passing, but I see this as not very urgent.


Attached a documentation patch and a scripts to check the distribution 
(here for N = 10 & s = 2.5), the kind of thing I used when checking the 
initial patch:


  sh> psql < zipf_init.sql
  sh> pgbench -t 50 -c 2 -M prepared -f zipf_test.sql -P 1
  -- close to 29000 tps on my laptop
  sh> psql < zipf_end.sql
 ┌┬┬┬┐
 │ i  │  cnt   │   ratio│expected│
 ├┼┼┼┤
 │  1 │ 756371 │  • │  • │
 │  2 │ 133431 │ 5.6686302283577280 │ 5.65685424949238019521 │
 │  3 │  48661 │ 2.7420521567579787 │ 2.7556759606310754 │
 │  4 │  23677 │ 2.0552012501583816 │ 2.0528009571186693 │
 │  5 │  13534 │ 1.7494458401063987 │ 1.7469281074217107 │
 │  6 │   8773 │ 1.5426877920893651 │ 1.5774409656148784 │
 │  7 │   5709 │ 1.5366964442108951 │ 1.4701680288054869 │
 │  8 │   4247 │ 1.3442429950553332 │ 1.3963036312159316 │
 │  9 │   3147 │ 1.3495392437241818 │ 1.3423980299088363 │
 │ 10 │   2450 │ 1.2844897959183673 │ 1.3013488313450120 │
 └┴┴┴┘
  sh> psql < zipf_clean.sql

Given these results, I do not think that it is useful to change 
random_zipfian TAP test parameter from 2.5 to something else.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 15ee7c0f2b..10285d655b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1613,6 +1613,14 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
   frequently values to the beginning of the interval are drawn.
   The closer to 0 parameter is,
   the flatter (more uniform) the access distribution.
+  The distribution is such that, assuming the range starts from 1,
+  the ratio of probability of drawing k versus
+  drawing k+1 is
+  ((k+1)/k)**parameter.
+  For instance random_zipfian(1, ..., 2.5) draws
+  value 1 about (2/1)**2.5 = 5.66 times more frequently
+  than 2, which itself is drawn (3/2)*2.5 = 2.76 times more
+  frequently than 3, and so on.
  
 



zipf_init.sql
Description: application/sql


zipf_test.sql
Description: application/sql


zipf_end.sql
Description: application/sql


zipf_clean.sql
Description: application/sql

Re: Pluggable Storage - Andres's take

2019-01-22 Thread Dmitry Dolgov

> On Mon, Jan 21, 2019 at 3:01 AM Andres Freund  wrote:
>
> The patchset is now pretty granularly split into individual pieces.

Wow, thanks!

> On Mon, Jan 21, 2019 at 9:33 AM Amit Khandekar  wrote:
>
> Regression tests that use \d+ to show the table details might
> not be interested specifically in table access method. But these will
> fail if run with a modified default access method.

I see your point, but if a test is not interested specifically in a table am,
then I guess it wouldn't use a custom table am in the first place, right? Any
way, I don't have strong opinion here, so if everyone agrees that HIDE_TABLEAM
will show/hide access method unconditionally, I'm fine with that.

Re: speeding up planning with partitions

On Tue, 22 Jan 2019 at 20:01, Imai, Yoshikazu
 wrote:
> I didn't use it yet, but I just used perf to clarify the difference of before 
> and after the creation of the generic plan, and I noticed that usage of 
> hash_seq_search() is increased about 3% in EXECUTE queries after the creation 
> of the generic plan.
>
> What I understand so far is about 10,000 while loops at total (4098+4098+some 
> extra) is needed in hash_seq_search() in EXECUTE query after the creation of 
> the generic plan.
> 10,000 while loops takes about 10 microsec (of course, we can't estimate 
> correct time), and the difference of the latency between 5th and 7th EXECUTE 
> is about 8 microsec, I currently think this causes the difference.
 >
> I don't know this problem relates to Amit-san's patch, but I'll continue to 
> investigate it.

I had another thought... when you're making a custom plan you're only
grabbing locks on partitions that were not pruned (just 1 partition in
your case), but when making the generic plan, locks will be acquired
on all partitions (all 4000 of them). This likely means that when
building the generic plan for the first time that the
LockMethodLocalHash table is expanded to fit all those locks, and
since we never shrink those down again, it'll remain that size for the
rest of your run.  I imagine the reason for the slowdown is that
during LockReleaseAll(), a sequential scan is performed over the
entire hash table. I see from looking at the hash_seq_search() code
that the value of max_bucket is pretty critical to how it'll perform.
The while ((curElem = segp[segment_ndx]) == NULL) loop will need to
run fewer times with a lower max_bucket.

I just did a quick and crude test by throwing the following line into
the end of hash_seq_init():

elog(NOTICE, "%s %u", hashp->tabname, hashp->hctl->max_bucket);

With a 5000 partition table I see:

postgres=# set plan_cache_mode = 'auto';
postgres=# prepare q1 (int) as select * from listp where a = $1;
postgres=# explain analyze execute q1(1);
NOTICE:  Portal hash 15
NOTICE:  LOCALLOCK hash 15
NOTICE:  LOCALLOCK hash 15
...

postgres=# explain analyze execute q1(1);
NOTICE:  LOCALLOCK hash 5002
NOTICE:  Portal hash 15
NOTICE:  LOCALLOCK hash 5002
NOTICE:  LOCALLOCK hash 5002
   QUERY PLAN

 Append  (cost=0.00..41.94 rows=13 width=4) (actual time=0.005..0.005
rows=0 loops=1)
   ->  Seq Scan on listp1  (cost=0.00..41.88 rows=13 width=4) (actual
time=0.005..0.005 rows=0 loops=1)
 Filter: (a = 1)
 Planning Time: 440.822 ms
 Execution Time: 0.017 ms
(5 rows)

I've not gone as far to try to recreate the performance drop you've
mentioned but I believe there's a very high chance that this is the
cause, and if so it's not for Amit to do anything with on this patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: jsonpath

2019-01-22 Thread Oleg Bartunov

On Tue, Jan 22, 2019 at 8:21 AM Alexander Korotkov
 wrote:
>
> The next revision is attached.
>
> Nikita made code and documentation improvements, renamed functions
> from "jsonpath_" prefix to "jsonb_path_" prefix.  He also renamed
> jsonpath_predicate() to jsonb_path_match() (that looks better for me
> too).
>
> I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and
> changed that behavior to always wrap result into array.

agree with new names

so it mimic the behaviour of JSON_QUERY with UNCONDITIONAL WRAPPER option

 Also, I've
> introduced new function jsonb_query_first(), which returns first item
> from result.
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Restrict the use of temporary namespace in two-phase transaction

2019-01-22 Thread Masahiko Sawada

On Tue, Jan 22, 2019 at 2:17 PM Michael Paquier  wrote:
>
> On Tue, Jan 22, 2019 at 01:47:05PM +0900, Masahiko Sawada wrote:
> > Or can we make the test script set force_parallel_mode = off? Since
> > the failure case is a very rare in real world I think that it might be
> > better to change the test scripts rather than changing properly of
> > current_schema().
>
> Please see 396676b, which is in my opinion a quick workaround to the
> problem.

Oops, sorry for the too late response. Thank you.

> Even if that's a rare case, it would be confusing to the
> user to see it :(

Indeed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: COPY FROM WHEN condition

2019-01-22 Thread Surafel Temesgen

On Mon, Jan 21, 2019 at 6:22 PM Tomas Vondra 
wrote:

>
> I think the condition can be just
>
> if (contain_volatile_functions(cstate->whereClause)) { ... }
>
>
>
yes it can be.

regards
Surafel

Re: Allowing extensions to find out the OIDs of their member objects

2019-01-22 Thread Komяpa

>
>
> Thoughts?


I have a feeling this is over-engineering in slightly different direction,
solving the way for hack to work instead of original problem.

What's currently happening in PostGIS is that there are functions that need
to perform index-based lookups.

Postgres is unable to plan this for functions, only for operators.

Operators have only two sides, some PostGIS functions have arguments - you
can't squeeze these into operator.
Well, you can squeeze two of your parameters into one, but it will be ugly
too - you'll invent some "query" argument type and alternative grammar
instead of SQL (see tsquery).

ST_DWithin itself is also another way of working around planner limitation
and squeezing something into both sides of operator, since you don't know
which side of your query is going to have an index. It's not perfect either.

A perfect solution will be a way to perform a clean index scan on
ST_Distance(a.geom, b.geom) < 10, which is what ST_DWithin is trying to
express in limited logic of "you only have two sides of operator".

If you need example from another world: imagine jsonb key-value lookup.
It's currently done via

select ... where tags @> '{"highway":"residential"}';

 - which is hard: you have to remember which side the rose should lean
towards, which {} [] to use, how to quote around json and inside and more.

A more intuitive way for many programmers to write this is similar to this:

select ... where (tags->>'highway') = 'residential';

 - but this does not end up with an index lookup.

I'd be happy if we can deprecate ST_DWithin in PostGIS and just allow
ST_Distance(a.geom, b.geom) <  10.

ST_Distance is defined in standard as function, however, there is
equivalent operator <-> that exists for sole purpose of KNN lookups. So,
when you write:

... order by ST_Distance(geom, 'your_position')
 - you're not getting index scan, and when writing

... order by geom <-> 'your_position'

- you're getting index scan but not doing a thing you may intuitively write
by knowing ST_Distance is standard-defined way to measure distance between
two spatial objects.

May it happen to direct you to some other thoughts?


-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: [PATCH] pgbench tap tests fail if the path contains a perl special character



Hello Tom,


I did a little bit of googling on this subject last night, and it
seems that at least some people believe that the answer is to not
use glob, period, but read the directory for yourself.

As a short-term move to un-break the buildfarm, I'm just going to
revert that patch altogether.

We can reapply it once we've figured out how to do the glob part correctly.


Here is a proposal patch which:

 - works around pgbench command splitting on spaces,
   if postgres sources are in a strangely named directory…
   I tested within a directory named "pg .* dir".

 - works aroung "glob" lack of portability by reading the directory
   directly and filtering with a standard re (see list_files).

 - adds a few comments

 - removes a spurious i option on an empty re

--
Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c87748086a..53a6d0c926 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -10,12 +10,19 @@ my $node = get_new_node('main');
 $node->init;
 $node->start;
 
-# invoke pgbench
+# invoke pgbench, with parameters:
+#   $opts: options as a string to be split on spaces
+#   $stat: expected exit status
+#   $out: reference to a regexp list that must match stdout
+#   $err: reference to a regexp list that must match stderr
+#   $name: name of test for error messages
+#   $files: reference to filename/contents dictionnary
+#   @args: further raw options or arguments
 sub pgbench
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	my ($opts, $stat, $out, $err, $name, $files) = @_;
+	my ($opts, $stat, $out, $err, $name, $files, @args) = @_;
 	my @cmd = ('pgbench', split /\s+/, $opts);
 	my @filenames = ();
 	if (defined $files)
@@ -40,6 +47,9 @@ sub pgbench
 			append_to_file($filename, $$files{$fn});
 		}
 	}
+
+	push @cmd, @args;
+
 	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
 
 	# cleanup?
@@ -868,20 +878,32 @@ pgbench(
 		qr{type: .*/001_pgbench_sleep},
 		qr{above the 1.0 ms latency limit: [01]/}
 	],
-	[qr{^$}i],
+	[qr{^$}],
 	'pgbench late throttling',
 	{ '001_pgbench_sleep' => q{\sleep 2ms} });
 
+# return a list of files from directory $dir matching regexpr $re
+# this works around glob portability and escaping issues
+sub list_files
+{
+	my ($dir, $re) = @_;
+	opendir my $dh, $dir or die "cannot opendir $dir: $!";
+	my @files = grep /$re/, readdir $dh;
+	closedir $dh or die "cannot closedir $dir: $!";
+	return map { $dir . '/' . $_ } @files;
+}
+
 # check log contents and cleanup
 sub check_pgbench_logs
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-	my ($prefix, $nb, $min, $max, $re) = @_;
+	my ($dir, $prefix, $nb, $min, $max, $re) = @_;
 
-	my @logs = glob "$prefix.*";
+	# $prefix is simple enough, thus does not need escaping
+	my @logs = list_files($dir, qr{^$prefix\..*$});
 	ok(@logs == $nb, "number of log files");
-	ok(grep(/^$prefix\.\d+(\.\d+)?$/, @logs) == $nb, "file name format");
+	ok(grep(/\/$prefix\.\d+(\.\d+)?$/, @logs) == $nb, "file name format");
 
 	my $log_number = 0;
 	for my $log (sort @logs)
@@ -905,22 +927,25 @@ my $bdir = $node->basedir;
 
 # with sampling rate
 pgbench(
-	"-n -S -t 50 -c 2 --log --log-prefix=$bdir/001_pgbench_log_2 --sampling-rate=0.5",
+	"-n -S -t 50 -c 2 --log --sampling-rate=0.5",
 	0,
 	[ qr{select only}, qr{processed: 100/100} ],
-	[qr{^$}],
-	'pgbench logs');
+	[ qr{^$} ],
+	'pgbench logs',
+	undef,
+	"--log-prefix=$bdir/001_pgbench_log_2");
 
-check_pgbench_logs("$bdir/001_pgbench_log_2", 1, 8, 92,
+check_pgbench_logs($bdir, '001_pgbench_log_2', 1, 8, 92,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
 # check log file in some detail
 pgbench(
-	"-n -b se -t 10 -l --log-prefix=$bdir/001_pgbench_log_3",
-	0, [ qr{select only}, qr{processed: 10/10} ],
-	[qr{^$}], 'pgbench logs contents');
+	"-n -b se -t 10 -l",
+	0, [ qr{select only}, qr{processed: 10/10} ], [ qr{^$} ],
+	'pgbench logs contents', undef,
+	"--log-prefix=$bdir/001_pgbench_log_3");
 
-check_pgbench_logs("$bdir/001_pgbench_log_3", 1, 10, 10,
+check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
 # done

index_build does not need its isprimary argument