Re: Parallel copy

2020-02-15 Thread Andrew Dunstan


On 2/15/20 7:32 AM, Amit Kapila wrote:
> On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner  wrote:
>> On Sat, 15 Feb 2020 at 04:55, Amit Kapila  wrote:
>>> On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner  wrote:
>> ...
 Parsing rows from the raw input (the work done by CopyReadLine()) in a 
 single process would accommodate line returns in quoted fields. I don't 
 think there's a way of getting parallel workers to manage the 
 in-quote/out-of-quote state required.

>>> AFAIU, the whole of this in-quote/out-of-quote state is manged inside
>>> CopyReadLineText which will be done by each of the parallel workers,
>>> something on the lines of what Thomas did in his patch [1].
>>> Basically, we need to invent a mechanism to allocate chunks to
>>> individual workers and then the whole processing will be done as we
>>> are doing now except for special handling for partial tuples which I
>>> have explained in my previous email.  Am, I missing something here?
>>>
>> The problem case that I see is the chunk boundary falling in the
>> middle of a quoted field where
>>  - The quote opens in chunk 1
>>  - The quote closes in chunk 2
>>  - There is an EoL character between the start of chunk 2 and the closing 
>> quote
>>
>> When the worker processing chunk 2 starts, it believes itself to be in
>> out-of-quote state, so only data between the start of the chunk and
>> the EoL is regarded as belonging to the partial line. From that point
>> on the parsing of the rest of the chunk goes off track.
>>
>> Some of the resulting errors can be avoided by, for instance,
>> requiring a quote to be preceded by a delimiter or EoL. That answer
>> fails when fields end with EoL characters, which happens often enough
>> in the wild.
>>
>> Recovering from an incorrect in-quote/out-of-quote state assumption at
>> the start of parsing a chunk just seems like a hole with no bottom. So
>> it looks to me like it's best done in a single process which can keep
>> track of that state reliably.
>>
> Good point and I agree with you that having a single process would
> avoid any such stuff.   However, I will think some more on it and if
> you/anyone else gets some idea on how to deal with this in a
> multi-worker system (where we can allow each worker to read and
> process the chunk) then feel free to share your thoughts.
>


IIRC, in_quote only matters here in CSV mode (because CSV fields can
have embedded newlines). So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: jsonb_object() seems to be buggy. jsonb_build_object() is good.

2020-02-15 Thread Andrew Dunstan


On 2/15/20 12:06 AM, Bryn Llewellyn wrote:
> Thank you both, Vik, and David, for bing so quick to respond. All is
> clear now. It seems to me that the price (giving up the ability to say
> explicitly what primitive JSON values you want) is too great to pay
> for the benefit (being able to build the semantic equivalent of a
> variadic list of actual arguments as text.
>
> So I wrote my own wrapper for jsonb_build_array()
> and jsonb_build_object():
>
> create function my_jsonb_build(
>   kind in varchar,
>   variadic_elements in varchar)
>   returns jsonb
>   immutable
>   language plpgsql
> as $body$
> declare
>   stmt varchar :=
>     case kind
>      when 'array' then
>        'select jsonb_build_array('||variadic_elements||')'
>      when 'object' then
>        'select jsonb_build_object('||variadic_elements||')'
>     end;
>   j jsonb;
> begin
>   execute stmt into j;
>   return j;
> end;
> $body$;
>

Please don't top-post on PostgreSQL lists.  See



The function above has many deficiencies, including lack of error
checking and use of 'execute' which will significantly affect
performance. Still, if it works for you, that's your affair.


These functions were written to accommodate PostgreSQL limitations. We
don't have a heterogenous array type.  So json_object() will return an
object where all the values are strings, even if they look like numbers,
booleans etc. And indeed, this is shown in the documented examples.
jsonb_build_object and jsonb_build_array overcome that issue, but there
the PostgreSQL limitation is that you can't pass in an actual array as
the variadic element, again because we don't have heterogenous arrays.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: pgsql: walreceiver uses a temporary replication slot by default

2020-02-15 Thread Michael Paquier
On Wed, Jan 22, 2020 at 06:58:46PM +0300, Sergei Kornilov wrote:
> I would like to cross-post here a patch with such changes that I posted in 
> "allow online change primary_conninfo" thread.
> This thread is more appropriate for discussion about 
> wal_receiver_create_temp_slot.
> 
> PS: I posted this patch in both threads mostly to make cfbot happy.

Thanks for posting this patch, Sergei.  Here is a review to make
things move on.

-* Create temporary replication slot if no slot name is configured or
-* the slot from the previous run was temporary, unless
-* wal_receiver_create_temp_slot is disabled.  We also need to handle
-* the case where the previous run used a temporary slot but
-* wal_receiver_create_temp_slot was changed in the meantime.  In that
-* case, we delete the old slot name in shared memory.  (This would
+* Create temporary replication slot if requested.  In that
+* case, we update slot name in shared memory. (This would

The set of comments you are removing from walreceiver.c to decide if a
temporary slot needs to be created or not should be moved to
walreceiverfuncs.c as you move the logic from the WAL receiver startup
phase to the moment the WAL receiver spawn is requested.

I agree with the simplifications in WalReceiverMain() as you have
switched wal_receiver_create_temp_slot to be PGC_POSTMASTER, so
modifications are no longer a state that matter.

It would be more consistent with primary_conn_info and
primary_slot_name if wal_receiver_create_temp_slot is passed down as
an argument of RequestXLogStreaming().

As per the discussion done on this thread, let's also switch the
parameter default to be disabled.  Peter, as the committer of 3297308,
it would be good if you could chime in.
--
Michael


signature.asc
Description: PGP signature


Re: New messages from Priscilla Ip

2020-02-15 Thread Dave Cramer
Hi,

Feel free to send out the email blast.

There are a number of other channels. postgres slack, postgres mailing
lists,  @PostgreSQL Hackers , twitter
with Postgres tag

Cheers,


Dave Cramer


On Sat, 15 Feb 2020 at 19:44, 'Meetup Messages' via Meetup <
mee...@postgresql.us> wrote:

> ~~~ Respond by replying directly to this email ~~~
> [image: Meetup]
> 
> Priscilla Ip
> 
> Hi, my name is Priscilla and I am a McGill student in my final year of
> electrical engineering. As a part of my capstone design project, I am
> conducting a user study on the impact of collaborative features in database
> tools along with my partner and the SoftwareREBELs research group at McGill
> University. We are looking for participants for the study and were hoping
> to send out an email blast to your group members if that is possible.
>
> Please see the details below and feel free to contact me with any
> questions. Thank you for your consideration and I look forward to hearing
> from you.
>
> The study will evaluate the impact of collaborative features on
> database-related tasks but participants with all levels of experience with
> databases are welcome! Participation involves a 45 minute in-person session
> on McGill campus where you’ll complete simple database-related tasks. Those
> attending the session will be entered in a raffle for $20 Amazon gift cards
> where the odds of winning are about 1 in 5.
> Sign up for the study here:
> https://docs.google.com/forms/d/e/1FAIpQLScDaa8J...
> 
> ­
> February 15, 2020 4:43 PM
> Reply directly to this email or respond on Meetup.com
> 
>
> You've received this notification because Priscilla Ip
> 
> has contacted you on Meetup.
>
> Never miss a last-minute change. Get the app.
> [image: iPhone App Store]
> 
>  [image:
> 

Re: explain HashAggregate to report bucket and memory stats

2020-02-15 Thread Justin Pryzby
On Mon, Feb 03, 2020 at 06:53:01AM -0800, Andres Freund wrote:
> On 2020-01-03 10:19:26 -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the 
> > > Aggregate
> > > node would report...anything.
> 
> Yea, that'd be amazing. It probably should be something every
> execGrouping.c using node can opt into.

Do you think it should be implemented in execGrouping/TupleHashTableData (as I
did) ?  I also did an experiment moving into the higher level nodes, but I
guess that's not actually desirable.  There's currently different output from
tests between the implementation using execGrouping.c and the one outside it,
so there's at least an issue with grouping sets.

> > +   hashtable->hinstrument.nbuckets_original = nbuckets;
> > +   hashtable->hinstrument.nbuckets = nbuckets;
> > +   hashtable->hinstrument.space_peak = entrysize * 
> > hashtable->hashtab->size;
> 
> That's not actually an accurate accounting of memory, because for filled
> entries a lot of memory is used to store actual tuples:

Thanks - I think I finally understood this.

I updated some existing tests to show the new output.  I imagine that's a
throwaway commit, and should eventually add new tests for each of these node
types under explain analyze.

I've been testing the various nodes like:

--heapscan:
DROP TABLE t; CREATE TABLE t (i int unique) WITH(autovacuum_enabled=off); 
INSERT INTO t SELECT generate_series(1,9); SET enable_seqscan=off; SET 
parallel_tuple_cost=0; SET parallel_setup_cost=0; SET enable_indexonlyscan=off; 
explain analyze verbose SELECT * FROM t WHERE i BETWEEN 999 and ;

--setop:
explain( analyze,verbose) SELECT * FROM generate_series(1,999) EXCEPT (SELECT 
NULL UNION ALL SELECT * FROM generate_series(1,9));
   Buckets: 2048 (originally 256)  Memory Usage: hashtable: 48kB, tuples: 8Kb

--recursive union:
explain analyze verbose WITH RECURSIVE t(n) AS ( SELECT 'foo' UNION SELECT n || 
' bar' FROM t WHERE length(n) < ) SELECT n, n IS OF (text) AS is_text FROM 
t;

--subplan
explain analyze verbose SELECT i FROM generate_series(1,999)i WHERE (i,i) NOT 
IN (SELECT 1,1 UNION ALL SELECT j,j FROM generate_series(1,9)j);
   Buckets: 262144 (originally 131072)  Memory Usage: hashtable: 6144kB, 
tuples: 782Kb
explain analyze verbose select i FROM generate_series(1,999)i WHERE(1,i) NOT in 
(select i,null::int from t) ;

--Agg:
explain (analyze,verbose) SELECT A,COUNT(1) FROM generate_series(1,9)a 
GROUP BY 1;
   Buckets: 262144 (originally 256)  Memory Usage: hashtable: 6144kB, tuples: 
782Kb

explain (analyze, verbose) select i FROM generate_series(1,999)i WHERE(1,1) not 
in (select a,null from (SELECT generate_series(1,9) a)x) ;

explain analyze verbose select * from (SELECT a FROM generate_series(1,99)a)v 
left join lateral (select v.a, four, ten, count(*) from (SELECT b four, 2 ten, 
b FROM generate_series(1,999)b)x group by cube(four,ten)) s on true order by 
v.a,four,ten;

--Grouping sets:
explain analyze verbose   select unique1,
 count(two), count(four), count(ten),
 count(hundred), count(thousand), count(twothousand),
 count(*)
from tenk1 group by grouping sets 
(unique1,twothousand,thousand,hundred,ten,four,two);

-- 
Justin
>From dff7109e4d82fd498ae8493caa0e4c84b0f04c74 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Feb 2020 12:03:11 -0600
Subject: [PATCH v2 1/7] Run some existing tests with explain (ANALYZE)..

..in a separate, earlier patch, to better show what bits are added by later
patches for hashtable instrumentation.
---
 src/test/regress/expected/aggregates.out  |  20 +-
 src/test/regress/expected/groupingsets.out| 298 ++
 src/test/regress/expected/select_parallel.out |  20 +-
 src/test/regress/expected/subselect.out   |  69 ++
 src/test/regress/expected/union.out   |  71 +++---
 src/test/regress/sql/aggregates.sql   |   2 +-
 src/test/regress/sql/groupingsets.sql |  44 ++--
 src/test/regress/sql/select_parallel.sql  |   4 +-
 src/test/regress/sql/subselect.sql|  25 +++
 src/test/regress/sql/union.sql|   6 +-
 10 files changed, 341 insertions(+), 218 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f457b5b..b3dcbaa 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2342,18 +2342,20 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqification purposes
 -- does not lead to array overflow due to unexpected duplicate hash keys
 -- see 

Re: Just for fun: Postgres 20?

2020-02-15 Thread Andrew Dunstan
On Fri, Feb 14, 2020 at 7:19 PM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > I also object because 20 is *my* unlucky number ...
>
> Not sure how serious Andrew is being here, but it does open up an
> important point: there are varying opinions on which numbers are unlucky.
> The idea that 13 is unlucky is Western, and maybe even only common in
> English-speaking countries.  In Asia, numbers containing the digit 4
> are considered unlucky [1], and there are probably other rules in other
> cultures.  If we establish a precedent that we'll skip release numbers
> for non-technical reasons, I'm afraid we'll be right back in the mess
> we sought to avoid, whereby nearly every year we had an argument about
> what the next release number would be.  So let's not go there.
>
>


Yes, I was being flippant, in an attempt to make the exact point
you're making cogently but less pithily here.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: assert pg_class.relnatts is consistent

2020-02-15 Thread Tom Lane
I wrote:
> So that leads me to the attached.
> ...
> (I agree with Alvaro's thought of shortening AddDefaultValues,
> but didn't do that here.)

Pushed both of those.  I also did something with the stale comment
that Justin referred to in the initial message (it wasn't really
good practice to try to deal with both things in one thread).

I think we're done here, though maybe the difficulty of finding a clean
way to get genbki.pl to do this suggests that AddDefaultValues needs
to be redesigned.  Not sure what that'd look like.

regards, tom lane




Re: Use LN_S instead of "ln -s" in Makefile

2020-02-15 Thread Tom Lane
Andreas Karlsson  writes:
> On 2/15/20 1:57 AM, Tom Lane wrote:
>> Hm.  So, these oversights are certainly bugs in narrow terms.  However,
>> they've all been like that for a *long* time --- the three instances
>> you found date to 2005, 2006, and 2008 according to "git blame".
>> The complete lack of complaints shows that nobody cares.  So I think
>> a fairly strong case could be made for going the other way, ie
>> s/$(LN_S)/ln -s/g and get rid of the configure-time cycles wasted in
>> setting up that variable.

> Here is a patch which does that.

I was just about to push that when I noticed something that gave me
pause: the "ln -s" in backend/Makefile is wrapped in 
ifneq ($(PORTNAME), win32)
This means there's one popular platform where we *don't* know for
sure that people aren't building in environments that don't support
"ln -s".  (The other two direct uses that Ashwin found are in test
code that a non-developer person very likely would never exercise,
so I don't think they prove much.)

I'm still on balance inclined to push this.  We have no buildfarm
animals exercising the case (they all report "ln -s" as supported,
even the Windows animals), and these days I think most people who
are building for Windows use the MSVC scripts not the makefiles.

Moreover, $(LN_S) is a horribly error-prone macro, because of the
fact that "ln -s" and "cp" don't have the same semantics for the
source argument.  The Autoconf manual says

 If you make a link in a directory other than the current
 directory, its meaning depends on whether `ln' or `ln -s' is used.
 To safely create links using `$(LN_S)', either find out which
 form is used and adjust the arguments, or always invoke `ln' in
 the directory where the link is to be created.

 In other words, it does not work to do:
  $(LN_S) foo /x/bar

 Instead, do:

  (cd /x && $(LN_S) foo bar)

So Ashwin's original patch would, far from fixing the code for
symlink-less systems, just have caused them to fail in a different way.
I could do without having that sort of gotcha in our build system,
especially if the set of people it would help is so close to empty,
and most especially when we have no testing that would catch mistakes.

Nonetheless, it looks like the current makefiles do work, for moderate
values of "work", on non-symlink Windows.  If we apply this patch
then they won't.

An alternative we could consider is to go back to Ashwin's patch,
after fixing it to use the "cd && ln" approach.  I noticed though
while chasing through the git history that that approach was in place
there originally and was specifically rejected in commit ccca61b5f.
That commit is quite old enough to drink, so maybe the underlying
concern no longer applies --- certainly we're using "cd && ln"
elsewhere.  But this seems like another point in favor of the whole
business being too complex/error-prone to want to support.

regards, tom lane




Re: Parallel copy

2020-02-15 Thread David Fetter
On Sat, Feb 15, 2020 at 06:02:06PM +0530, Amit Kapila wrote:
> On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner  wrote:
> >
> > On Sat, 15 Feb 2020 at 04:55, Amit Kapila  wrote:
> > >
> > > On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner  
> > > wrote:
> > > >
> > ...
> > > >
> > > > Parsing rows from the raw input (the work done by CopyReadLine()) in a 
> > > > single process would accommodate line returns in quoted fields. I don't 
> > > > think there's a way of getting parallel workers to manage the 
> > > > in-quote/out-of-quote state required.
> > > >
> > >
> > > AFAIU, the whole of this in-quote/out-of-quote state is manged inside
> > > CopyReadLineText which will be done by each of the parallel workers,
> > > something on the lines of what Thomas did in his patch [1].
> > > Basically, we need to invent a mechanism to allocate chunks to
> > > individual workers and then the whole processing will be done as we
> > > are doing now except for special handling for partial tuples which I
> > > have explained in my previous email.  Am, I missing something here?
> > >
> > The problem case that I see is the chunk boundary falling in the
> > middle of a quoted field where
> >  - The quote opens in chunk 1
> >  - The quote closes in chunk 2
> >  - There is an EoL character between the start of chunk 2 and the closing 
> > quote
> >
> > When the worker processing chunk 2 starts, it believes itself to be in
> > out-of-quote state, so only data between the start of the chunk and
> > the EoL is regarded as belonging to the partial line. From that point
> > on the parsing of the rest of the chunk goes off track.
> >
> > Some of the resulting errors can be avoided by, for instance,
> > requiring a quote to be preceded by a delimiter or EoL. That answer
> > fails when fields end with EoL characters, which happens often enough
> > in the wild.
> >
> > Recovering from an incorrect in-quote/out-of-quote state assumption at
> > the start of parsing a chunk just seems like a hole with no bottom. So
> > it looks to me like it's best done in a single process which can keep
> > track of that state reliably.
> >
> 
> Good point and I agree with you that having a single process would
> avoid any such stuff.   However, I will think some more on it and if
> you/anyone else gets some idea on how to deal with this in a
> multi-worker system (where we can allow each worker to read and
> process the chunk) then feel free to share your thoughts.

I see two pieces of this puzzle: an input format we control, and the
ones we don't.

In the former case, we could encode all fields with base85 (or
something similar that reduces the input alphabet efficiently), then
reserve bytes that denote delimiters of various types. ASCII has
separators for file, group, record, and unit that we could use as
inspiration.

I don't have anything to offer for free-form input other than to agree
that it looks like a hole with no bottom, and maybe we should just
keep that process serial, at least until someone finds a bottom.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use LN_S instead of "ln -s" in Makefile

2020-02-15 Thread Andreas Karlsson

On 2/15/20 1:57 AM, Tom Lane wrote:

Hm.  So, these oversights are certainly bugs in narrow terms.  However,
they've all been like that for a *long* time --- the three instances
you found date to 2005, 2006, and 2008 according to "git blame".
The complete lack of complaints shows that nobody cares.  So I think
a fairly strong case could be made for going the other way, ie
s/$(LN_S)/ln -s/g and get rid of the configure-time cycles wasted in
setting up that variable.


Here is a patch which does that.

Andreas
diff --git a/configure b/configure
index 37aa82dcd4..ea8ec36d3b 100755
--- a/configure
+++ b/configure
@@ -681,7 +681,6 @@ FLEX
 BISONFLAGS
 BISON
 MKDIR_P
-LN_S
 TAR
 install_bin
 INSTALL_DATA
@@ -9156,17 +9155,6 @@ $as_echo_n "checking for TAR... " >&6; }
 $as_echo "$TAR" >&6; }
 fi
 
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5
-$as_echo_n "checking whether ln -s works... " >&6; }
-LN_S=$as_ln_s
-if test "$LN_S" = "ln -s"; then
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
-$as_echo "yes" >&6; }
-else
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5
-$as_echo "no, using $LN_S" >&6; }
-fi
-
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for a thread-safe mkdir -p" >&5
 $as_echo_n "checking for a thread-safe mkdir -p... " >&6; }
 if test -z "$MKDIR_P"; then
diff --git a/configure.in b/configure.in
index 8adb409558..1184f165fa 100644
--- a/configure.in
+++ b/configure.in
@@ -1001,7 +1001,6 @@ esac
 AC_SUBST(install_bin)
 
 PGAC_PATH_PROGS(TAR, tar)
-AC_PROG_LN_S
 AC_PROG_MKDIR_P
 # When Autoconf chooses install-sh as mkdir -p program it tries to generate
 # a relative path to it in each makefile where it substitutes it. This clashes
diff --git a/contrib/uuid-ossp/Makefile b/contrib/uuid-ossp/Makefile
index 777f988a41..4a14a1eb42 100644
--- a/contrib/uuid-ossp/Makefile
+++ b/contrib/uuid-ossp/Makefile
@@ -33,4 +33,4 @@ include $(top_srcdir)/contrib/contrib-global.mk
 endif
 
 md5.c sha1.c: % : $(pgcrypto_src)/%
-	rm -f $@ && $(LN_S) $< .
+	rm -f $@ && ln -s $< .
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e4db3e80af..e01dec0267 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -338,7 +338,6 @@ perl_embed_ldflags	= @perl_embed_ldflags@
 # Miscellaneous
 
 AWK	= @AWK@
-LN_S	= @LN_S@
 MSGFMT  = @MSGFMT@
 MSGFMT_FLAGS = @MSGFMT_FLAGS@
 MSGMERGE = @MSGMERGE@
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 526361f31b..38d10dfbfb 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -295,12 +295,12 @@ ifdef shlib_major
 # If we're using major and minor versions, then make a symlink to major-version-only.
 ifneq ($(shlib), $(shlib_major))
 	rm -f $(shlib_major)
-	$(LN_S) $(shlib) $(shlib_major)
+	ln -s $(shlib) $(shlib_major)
 endif
 # Make sure we have a link to a name without any version numbers
 ifneq ($(shlib), $(shlib_bare))
 	rm -f $(shlib_bare)
-	$(LN_S) $(shlib) $(shlib_bare)
+	ln -s $(shlib) $(shlib_bare)
 endif
 endif # shlib_major
 
@@ -435,12 +435,12 @@ ifneq ($(PORTNAME), win32)
 ifneq ($(shlib), $(shlib_major))
 	cd '$(DESTDIR)$(libdir)' && \
 	rm -f $(shlib_major) && \
-	$(LN_S) $(shlib) $(shlib_major)
+	ln -s $(shlib) $(shlib_major)
 endif
 ifneq ($(shlib), $(shlib_bare))
 	cd '$(DESTDIR)$(libdir)' && \
 	rm -f $(shlib_bare) && \
-	$(LN_S) $(shlib) $(shlib_bare)
+	ln -s $(shlib) $(shlib_bare)
 endif
 endif # not win32
 endif # not cygwin
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 9706a95848..2862c03cb8 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -167,12 +167,12 @@ generated-headers: $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src
 $(top_builddir)/src/include/parser/gram.h: parser/gram.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
 	  cd '$(dir $@)' && rm -f $(notdir $@) && \
-	  $(LN_S) "$$prereqdir/$(notdir $<)" .
+	  ln -s "$$prereqdir/$(notdir $<)" .
 
 $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
 	  cd '$(dir $@)' && rm -f $(notdir $@) && \
-	  $(LN_S) "$$prereqdir/$(notdir $<)" .
+	  ln -s "$$prereqdir/$(notdir $<)" .
 
 utils/probes.o: utils/probes.d $(SUBDIROBJS)
 	$(DTRACE) $(DTRACEFLAGS) -C -G -s $(call expand_subsys,$^) -o $@
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index f8f0b4841c..59ffe22c61 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -117,7 +117,7 @@ bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_
 $(top_builddir)/src/include/catalog/header-stamp: bki-stamp
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
 	cd '$(dir $@)' && for file in $(GENERATED_HEADERS); do \
-	  rm -f $$file && $(LN_S) "$$prereqdir/$$file" . ; \
+	  rm -f $$file && ln -s "$$prereqdir/$$file" . ; \
 	done
 	touch $@
 
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index b91028ddfd..f8e13e0f6f 100644
--- 

Re: Parallel copy

2020-02-15 Thread Amit Kapila
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner  wrote:
>
> On Sat, 15 Feb 2020 at 04:55, Amit Kapila  wrote:
> >
> > On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner  wrote:
> > >
> ...
> > >
> > > Parsing rows from the raw input (the work done by CopyReadLine()) in a 
> > > single process would accommodate line returns in quoted fields. I don't 
> > > think there's a way of getting parallel workers to manage the 
> > > in-quote/out-of-quote state required.
> > >
> >
> > AFAIU, the whole of this in-quote/out-of-quote state is manged inside
> > CopyReadLineText which will be done by each of the parallel workers,
> > something on the lines of what Thomas did in his patch [1].
> > Basically, we need to invent a mechanism to allocate chunks to
> > individual workers and then the whole processing will be done as we
> > are doing now except for special handling for partial tuples which I
> > have explained in my previous email.  Am, I missing something here?
> >
> The problem case that I see is the chunk boundary falling in the
> middle of a quoted field where
>  - The quote opens in chunk 1
>  - The quote closes in chunk 2
>  - There is an EoL character between the start of chunk 2 and the closing 
> quote
>
> When the worker processing chunk 2 starts, it believes itself to be in
> out-of-quote state, so only data between the start of the chunk and
> the EoL is regarded as belonging to the partial line. From that point
> on the parsing of the rest of the chunk goes off track.
>
> Some of the resulting errors can be avoided by, for instance,
> requiring a quote to be preceded by a delimiter or EoL. That answer
> fails when fields end with EoL characters, which happens often enough
> in the wild.
>
> Recovering from an incorrect in-quote/out-of-quote state assumption at
> the start of parsing a chunk just seems like a hole with no bottom. So
> it looks to me like it's best done in a single process which can keep
> track of that state reliably.
>

Good point and I agree with you that having a single process would
avoid any such stuff.   However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-02-15 Thread Alastair Turner
On Sat, 15 Feb 2020 at 04:55, Amit Kapila  wrote:
>
> On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner  wrote:
> >
...
> >
> > Parsing rows from the raw input (the work done by CopyReadLine()) in a 
> > single process would accommodate line returns in quoted fields. I don't 
> > think there's a way of getting parallel workers to manage the 
> > in-quote/out-of-quote state required.
> >
>
> AFAIU, the whole of this in-quote/out-of-quote state is manged inside
> CopyReadLineText which will be done by each of the parallel workers,
> something on the lines of what Thomas did in his patch [1].
> Basically, we need to invent a mechanism to allocate chunks to
> individual workers and then the whole processing will be done as we
> are doing now except for special handling for partial tuples which I
> have explained in my previous email.  Am, I missing something here?
>
The problem case that I see is the chunk boundary falling in the
middle of a quoted field where
 - The quote opens in chunk 1
 - The quote closes in chunk 2
 - There is an EoL character between the start of chunk 2 and the closing quote

When the worker processing chunk 2 starts, it believes itself to be in
out-of-quote state, so only data between the start of the chunk and
the EoL is regarded as belonging to the partial line. From that point
on the parsing of the rest of the chunk goes off track.

Some of the resulting errors can be avoided by, for instance,
requiring a quote to be preceded by a delimiter or EoL. That answer
fails when fields end with EoL characters, which happens often enough
in the wild.

Recovering from an incorrect in-quote/out-of-quote state assumption at
the start of parsing a chunk just seems like a hole with no bottom. So
it looks to me like it's best done in a single process which can keep
track of that state reliably.

--
Aastair




Re: [Proposal] Global temporary tables

2020-02-15 Thread Pavel Stehule
> postgres=# insert into foo select generate_series(1,1);
> INSERT 0 1
> postgres=# \dt+ foo
>   List of relations
> ┌┬──┬───┬───┬─┬┬─┐
> │ Schema │ Name │ Type  │ Owner │ Persistence │  Size  │ Description │
> ╞╪══╪═══╪═══╪═╪╪═╡
> │ public │ foo  │ table │ pavel │ session │ 384 kB │ │
> └┴──┴───┴───┴─┴┴─┘
> (1 row)
>
> postgres=# truncate foo;
> TRUNCATE TABLE
> postgres=# \dt+ foo
>   List of relations
> ┌┬──┬───┬───┬─┬───┬─┐
> │ Schema │ Name │ Type  │ Owner │ Persistence │ Size  │ Description │
> ╞╪══╪═══╪═══╪═╪═══╪═╡
> │ public │ foo  │ table │ pavel │ session │ 16 kB │ │
> └┴──┴───┴───┴─┴───┴─┘
> (1 row)
>
> I expect zero size after truncate.
>
> Thanks for review.
>
> I can explain, I don't think it's a bug.
> The current implementation of the truncated GTT retains two blocks of FSM
> pages.
> The same is true for truncating regular tables in subtransactions.
> This is an implementation that truncates the table without changing the
> relfilenode of the table.
>
>
This is not extra important feature - now this is little bit a surprise,
because I was not under transaction.

Changing relfilenode, I think, is necessary, minimally for future VACUUM
FULL support.

Regards

Pavel Stehule


>
> Wenjing
>
>
> Regards
>
> Pavel
>
>
>>>
>>> Wenjing
>>>
>>>
>>>
>>>
>>> >
>>> > --
>>> > Robert Haas
>>> > EnterpriseDB: http://www.enterprisedb.com
>>> > The Enterprise PostgreSQL Company
>>>
>>>
>


Re: [Proposal] Global temporary tables

2020-02-15 Thread 曾文旌(义从)


> 2020年2月14日 下午5:19,Pavel Stehule  写道:
> 
> 
> 
> čt 30. 1. 2020 v 15:21 odesílatel Pavel Stehule  > napsal:
> 
> 
> čt 30. 1. 2020 v 15:17 odesílatel 曾文旌(义从)  > napsal:
> 
> 
> > 2020年1月29日 下午9:48,Robert Haas  > > 写道:
> > 
> > On Tue, Jan 28, 2020 at 12:12 PM 曾文旌(义从)  > > wrote:
> >>> Opinion by Pavel
> >>> + rel->rd_islocaltemp = true;  <<< if this is valid, then the name of 
> >>> field "rd_islocaltemp" is not probably best
> >>> I renamed rd_islocaltemp
> >> 
> >> I don't see any change?
> >> 
> >> Rename rd_islocaltemp to rd_istemp  in global_temporary_table_v8-pg13.patch
> > 
> > In view of commit 6919b7e3294702adc39effd16634b2715d04f012, I think
> > that this has approximately a 0% chance of being acceptable. If you're
> > setting a field in a way that is inconsistent with the current use of
> > the field, you're probably doing it wrong, because the field has an
> > existing purpose to which new code must conform. And if you're not
> > doing that, then you don't need to rename it.
> Thank you for pointing it out.
> I've rolled back the rename.
> But I still need rd_localtemp to be true, The reason is that
> 1 GTT The GTT needs to support DML in read-only transactions ,like local temp 
> table.
> 2 GTT does not need to hold the lock before modifying the index buffer ,also 
> like local temp table.
> 
> Please give me feedback.
> 
> maybe some like
> 
> rel->rd_globaltemp = true;
> 
> and somewhere else
> 
> if (rel->rd_localtemp || rel->rd_globaltemp)
> {
>   ...
> }
> 
> 
> I tested this patch again and I am very well satisfied with behave. 
> 
> what doesn't work still - TRUNCATE statement
> 
> postgres=# insert into foo select generate_series(1,1);
> INSERT 0 1
> postgres=# \dt+ foo
>   List of relations
> ┌┬──┬───┬───┬─┬┬─┐
> │ Schema │ Name │ Type  │ Owner │ Persistence │  Size  │ Description │
> ╞╪══╪═══╪═══╪═╪╪═╡
> │ public │ foo  │ table │ pavel │ session │ 384 kB │ │
> └┴──┴───┴───┴─┴┴─┘
> (1 row)
> 
> postgres=# truncate foo;
> TRUNCATE TABLE
> postgres=# \dt+ foo
>   List of relations
> ┌┬──┬───┬───┬─┬───┬─┐
> │ Schema │ Name │ Type  │ Owner │ Persistence │ Size  │ Description │
> ╞╪══╪═══╪═══╪═╪═══╪═╡
> │ public │ foo  │ table │ pavel │ session │ 16 kB │ │
> └┴──┴───┴───┴─┴───┴─┘
> (1 row)
> 
> I expect zero size after truncate.
Thanks for review.

I can explain, I don't think it's a bug.
The current implementation of the truncated GTT retains two blocks of FSM pages.
The same is true for truncating regular tables in subtransactions.
This is an implementation that truncates the table without changing the 
relfilenode of the table.


Wenjing

> 
> Regards
> 
> Pavel
> 
> 
> 
> Wenjing
> 
> 
> 
> 
> > 
> > -- 
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com 
> > The Enterprise PostgreSQL Company
> 



Re: proposal: schema variables

2020-02-15 Thread Pavel Stehule
po 10. 2. 2020 v 19:47 odesílatel Pavel Stehule 
napsal:

>
>
> pá 7. 2. 2020 v 17:09 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> rebase
>>
>> Regards
>>
>> Pavel
>>
>
> Hi
>
> another rebase, fix \dV statement (for 0001 patch)
>

rebase

Pavel


> Regards
>
> Pavel
>


0001-schema-variables-20200215.patch.gz
Description: application/gzip