date:20151116

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-11-16 Thread Alvaro Herrera

Thomas Munro wrote:
> New version attached, merging recent changes.

I wonder about the TailMatches and Matches macros --- wouldn't it be
better to have a single one, renaming TailMatches to Matches and
replacing the current Matches() with an initial token that corresponds
to anchoring to start of command?  Just wondering, not terribly attached
to the idea.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-11-16 Thread Masahiko Sawada

On Fri, Nov 13, 2015 at 12:52 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada  
> wrote in 
>> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao  wrote:
>> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada  
>> > wrote:
> ...
>> >> This patch is a tool for discussion, so I'm not going to fix this bug
>> >> until getting consensus.
>> >>
>> >> We are still under the discussion to find solution that can get consensus.
>> >> I felt that it's difficult to select from the two approaches within
>> >> this development cycle, and there would not be time to implement such
>> >> big feature even if we selected.
>> >> But this feature is obviously needed by many users.
>> >> So I'm considering more simple and extensible something solution, the
>> >> idea I posted is one of them.
>> >> The another worth considering approach is that just specifying the
>> >> number of sync standby. It also can cover the main use cases in
>> >> some-cases.
>> >
>> > Yes, it covers main and simple use case like "I want to have multiple
>> > synchronous replicas!". Even if we miss quorum commit at the first
>> > version, the feature is still very useful.
>
> +1
>
>> It can cover not only the case you mentioned but also main use case
>> Michael mentioned by setting same application_name.
>> And that first version patch is almost implemented, so just needs to
>> be reviewed.
>>
>> I think that it would be good to implement the simple feature at the
>> first version, and then coordinate the design based on opinion and
>> feed backs from more user, use-case.
>
> Yeah. I agree with it. And I have two proposals in this
> direction.
>
> - Notation
>
>  synchronous_standby_names, and synchronous_replication_method as
>  a variable to provide other syntax is probably no argument
>  except its name. But I feel synchronous_standby_num looks bit
>  too specific.
>
>  I'd like to propose if this doesn't reprise the argument on
>  notation for replication definitions:p
>
>  The following two GUCs would be enough to bear future expansion
>  of notation syntax and/or method.
>
>  synchronous_standby_names :  as it is
>
>  synchronous_replication_method:
>
>default is "1-priority", which means the same with the current
>meaning.  possible additional values so far would be,
>
> "n-priority": the format of s_s_names is "n, , , ...",
>   where n is the number of required acknowledges.

One question is that what is different between the leading "n" in
s_s_names and the leading "n" of "n-priority"?

>
> "n-quorum":   the format of s_s_names is the same as above, but
>   it is read in quorum context.
>
>  These can be expanded, for example, as follows, but in future.
>
> "complex" : Michael's format.
> "json": JSON?
> "json-ext": specify JSON in external file.
>
> Even after we have complex notations, I suppose that many use
> cases are coverd by the first tree notations.

I'm not sure it's desirable to implement the all kind of methods into core.
I think it's better to extend replication  in order to be more
extensibility like adding hook function.
And then other approach is implemented as a contrib module.

>
> - Internal design
>
>  What should be done in SyncRepReleaseWaiters() is calculating a
>  pair of LSNs that can be regarded as synced and decide whether
>  *this* walsender have advanced the LSN pair, then trying to
>  release backends that wait for the LSNs *if* this walsender has
>  advanced them.
>
>  From such point, the proposed patch will make redundant trials
>  to release backens.
>
>  Addition to that, the patch looks to be a mixture of the current
>  implement and the new feature. These are for the same objective
>  so they cannot coexist each other, I think. As the result, codes
>  for both quorum/priority judgement appear at multiple level in
>  call tree. This would be an obstacle for future (possible)
>  expansion.
>
>  So, I think this feature should be implemented as following,
>
>  SyncRepInitConfig reads the configuration and stores the result
>  structure into elsewhere such like WalSnd->syncrepset_definition
>  instead of WalSnd->sync_standby_priority, which should be
>  removed. Nothing would be stored if the current wal sender is
>  not a member of the defined replication set. Storing a pointer
>  to matching function there would increase the flexibility but
>  such implement in contrast will make the code difficult to be
>  read.. (I often look for the entity of xlogreader->read_page()
>  ;)
>
>  Then SyncRepSyncedLsnAdvancedTo() instead of
>  SyncRepGetSynchronousStandbys() returns an LSN pair that can be
>  regarded as 'synced' according to specified definition of
>  replication set and whether this walsender have advanced the
>  LSNs.
>
>  Finally,

Re: [HACKERS] [DESIGN] ParallelAppend

2015-11-16 Thread Kouhei Kaigai

> On Thu, Nov 12, 2015 at 12:09 AM, Kouhei Kaigai  wrote:
> > I'm now designing the parallel feature of Append...
> >
> > Here is one challenge. How do we determine whether each sub-plan
> > allows execution in the background worker context?
> 
> I've been thinking about these questions for a bit now, and I think we
> can work on improving Append in multiple phases.  The attached patch
> shows what I have in mind for phase 1.  Currently, if you set up an
> inheritance hierarchy, you might get an Append some of whose children
> are Gather nodes with Parallel Seq Scans under them, like this:
> 
> Append
> -> Gather
>   -> Parallel Seq Scan
> -> Gather
>   -> Parallel Seq Scan
> -> Seq Scan
> 
> This is a crappy plan.  Each Gather node will try to launch its own
> bunch of workers, which sucks.  The attached patch lets you get this
> plan instead:
> 
> Gather
> -> Append
>   -> Partial Seq Scan
>   -> Partial Seq Scan
>   -> Partial Seq Scan
> 
> That's a much better plan.
> 
> To make this work, this plan introduces a couple of new concepts.
> Each RelOptInfo gets a new partial_pathlist field, which stores paths
> that need to be gathered to produce a workable plan.  Once we've
> populated the partial_pathlist with whatever partial paths we know how
> to generate, we can either push a Gather node on top of one of those
> partial paths to create a real path; or we can use those partial paths
> to build more partial paths.  The current patch handles only the
> simplest case: we can build a partial path for an appendrel by
> appending a partial path for each member rel.  But eventually I hope
> to handle joinrels this way as well: you can join a partial path with
> an ordinary path for form a partial path for the joinrel.
>
This idea will solve my concern gracefully.
The new partial_pathlist keeps candidate of path-nodes to be gathered
in this level or upper. Unlike path-nodes in the pathlist already, we
don't need to rip off GatherPath later.

Can we expect any path-nodes in the partial_pathlist don't contain
underlying GatherPath even if and when we would apply this design on
joinrel also?

If we would be able to ensure the path-nodes in partial_pathlist is
safe to run under the Gather node - it never contains Gather itself
or any others should not perform on the background worker context,
it will make path consideration much simpler than my expectation.

> This requires some way of figuring out how many workers to request for
> the append-path, so this patch also adds a parallel_degree field to
> the path object, which is 0 for non-parallel things and the number of
> workers that the path believes to be ideal otherwise.  Right now, it
> just percolates the highest worker count of any child up to the
> appendrel, which might not be right, especially once the append node
> knows how to balance workers, but it seems like a reasonable place to
> start.
>
I agree with. The new parallel_degree will give useful information,
and one worker per child relation is a good start.

> > Type-A) It can be performed on background worker context and
> >picked up by multiple worker processes concurrently.
> >   (e.g: Parallel SeqScan)
> > Type-B) It can be performed on background worker context but
> >   cannot be picked up by multiple worker processes.
> >   (e.g: non-parallel aware node)
> > Type-C) It should not be performed on background worker context.
> >(e.g: plan/path node with volatile functions)
> 
> At the time that we're forming an AppendPath, we can identify what
> you're calling type-A paths very easily: they're the ones that are
> coming from the partial_pathlist.  We can distinguish between type-B
> and type-C paths coming from the childrel's pathlist based on the
> childrel's consider_parallel flag: if it's false, it's type-C, else
> type-B.  At some point, we might need a per-path flag to distinguish
> cases where a particular path is type-C even though some other plan
> for that relation might be type-B.  But right now that case doesn't
> arise.
>
I also think we eventually have to have a per-path flag when we support
parallel capability on joinrel, although, it is not an immediate action.

> Now, of course, it's not good enough to have this information
> available when we're generating the AppendPath; it has to survive
> until execution time.  But that doesn't mean the paths need to be
> self-identifying.  We could, of course, decide to make them so, and
> maybe that's the best design, but I'm thinking about another approach:
> suppose the append node itself, instead of having one list of child
> plans, has a list of type-A plans, a list of type-B plans, and a list
> of type-C plans.  This actually seems quite convenient, because at
> execution time, you presumably want the leader to prioritize type-C
> plans over any of the others, since it's the only one that can execute
> them, and the workers to prioritize type-B plans, since they can only
> take one worker each and are

Re: [HACKERS] check for interrupts in set_rtable_names

2015-11-16 Thread Tom Lane

I wrote:
> I experimented with using a hash table to avoid the O(N^2) behavior.
> This seems to work quite well, and I think it doesn't change the results
> (leastwise, it does not break the regression tests).

Looking at this again in the light of morning, it occurred to me that it's
pretty broken in the face of long (approaching NAMEDATALEN) input
identifiers.  If the de-duplication digits are beyond the NAMEDATALEN
threshold, it would actually get into an infinite loop because hash_search
would ignore them as not part of the key.  That can be fixed by truncating
the name appropriately.

However: actually, this code had a problem already with long identifiers.
What happened before was that it would blithely add digits and thereby
produce an overlength identifier, which indeed looks distinct from the
other ones in the query --- but if we're dumping a view/rule, the
identifier won't be distinct after truncation, which means that
dump/reload could fail, or even worse, silently produce something with
different semantics than intended.

So the attached updated patch takes care of that problem, not only for
relation names but also for column names.

I had been leaning towards not back-patching this, but I now feel that
we must back-patch at least the truncation fix, and we probably might as
well back-patch it in toto.  Comments?

regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3388f7c..771b14b 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***
*** 38,43 
--- 38,44 
  #include "commands/tablespace.h"
  #include "executor/spi.h"
  #include "funcapi.h"
+ #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
***
*** 55,60 
--- 56,62 
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
+ #include "utils/hsearch.h"
  #include "utils/lsyscache.h"
  #include "utils/rel.h"
  #include "utils/ruleutils.h"
*** typedef struct
*** 267,272 
--- 269,283 
  #define deparse_columns_fetch(rangetable_index, dpns) \
  	((deparse_columns *) list_nth((dpns)->rtable_columns, (rangetable_index)-1))
  
+ /*
+  * Entry in set_rtable_names' hash table
+  */
+ typedef struct
+ {
+ 	char		name[NAMEDATALEN];		/* Hash key --- must be first */
+ 	int			counter;		/* Largest addition used so far for name */
+ } NameHashEntry;
+ 
  
  /* --
   * Global data
*** static void print_function_rettype(Strin
*** 312,319 
  static void print_function_trftypes(StringInfo buf, HeapTuple proctup);
  static void set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
   Bitmapset *rels_used);
- static bool refname_is_unique(char *refname, deparse_namespace *dpns,
-   List *parent_namespaces);
  static void set_deparse_for_query(deparse_namespace *dpns, Query *query,
  	  List *parent_namespaces);
  static void set_simple_column_names(deparse_namespace *dpns);
--- 323,328 
*** static void
*** 2676,2690 
  set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
   Bitmapset *rels_used)
  {
  	ListCell   *lc;
- 	int			rtindex = 1;
  
  	dpns->rtable_names = NIL;
  	foreach(lc, dpns->rtable)
  	{
  		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
  		char	   *refname;
  
  		if (rels_used && !bms_is_member(rtindex, rels_used))
  		{
  			/* Ignore unreferenced RTE */
--- 2685,2745 
  set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
   Bitmapset *rels_used)
  {
+ 	HASHCTL		hash_ctl;
+ 	HTAB	   *names_hash;
+ 	NameHashEntry *hentry;
+ 	bool		found;
+ 	int			rtindex;
  	ListCell   *lc;
  
  	dpns->rtable_names = NIL;
+ 	/* nothing more to do if empty rtable */
+ 	if (dpns->rtable == NIL)
+ 		return;
+ 
+ 	/*
+ 	 * We use a hash table to hold known names, so that this process is O(N)
+ 	 * not O(N^2) for N names.
+ 	 */
+ 	MemSet(_ctl, 0, sizeof(hash_ctl));
+ 	hash_ctl.keysize = NAMEDATALEN;
+ 	hash_ctl.entrysize = sizeof(NameHashEntry);
+ 	hash_ctl.hcxt = CurrentMemoryContext;
+ 	names_hash = hash_create("set_rtable_names names",
+ 			 list_length(dpns->rtable),
+ 			 _ctl,
+ 			 HASH_ELEM | HASH_CONTEXT);
+ 	/* Preload the hash table with names appearing in parent_namespaces */
+ 	foreach(lc, parent_namespaces)
+ 	{
+ 		deparse_namespace *olddpns = (deparse_namespace *) lfirst(lc);
+ 		ListCell   *lc2;
+ 
+ 		foreach(lc2, olddpns->rtable_names)
+ 		{
+ 			char	   *oldname = (char *) lfirst(lc2);
+ 
+ 			if (oldname == NULL)
+ continue;
+ 			hentry = (NameHashEntry *) hash_search(names_hash,
+    oldname,
+    HASH_ENTER,
+    );
+ 			/* we do not complain about duplicate names in parent namespaces */
+ 			hentry->counter = 0;
+ 		}
+ 	}
+ 
+ 	/* Now we can scan the rtable */
+ 	rtindex = 1;
  	foreach(lc, dpns->rtable)
  	{

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-11-16 Thread Alvaro Herrera

Kyotaro HORIGUCHI wrote:

> Auto-generating from grammer should be the ultimate solution but
> I don't think it will be available. But still I found that the
> word-splitting-then-match-word-by-word-for-each-matching is
> terriblly unmaintainable and poorly capable. So, how about
> regular expressions?

I don't think this is an improvement.  It seems to me that a lot more
work is required to maintain these regular expressions, which while
straightforward are not entirely trivial (harder to read).

If we could come up with a reasonable format that is pre-processed to a
regexp, that might be better.  I think Thomas' proposed patch is closer
to that than Horiguchi-san's.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-16 Thread Catalin Iacob

On Sun, Nov 15, 2015 at 3:53 PM, Andrew Dunstan  wrote:
> I suggest you review the original thread on this subject before a line was
> ever written. "multiple" (see subject line on this whole thread) is clearly
> what is being asked for. Making people turn that into a single argument is
> not what was envisaged. See for example Pavel's original example involving
> use of xargs where that's clearly not at all easy.

I couldn't see why it would matter to have multiple -C, but xargs
having -n which consumes more than 1 stdin item is indeed an use case.
When reading the thread I didn't notice it since I didn't know what -n
does.

But multiple -C is confusing since it suggests the groupings matter.

To me at least, it feels weird that -C "SELECT 1; SELECT 2;" -C
"SELECT 3;" is the same as -C "SELECT 1; SELECT 2; SELECT 3" and lots
of other combinations. It feels like the split in groups must mean
something, otherwise why would you support/use multiple groups?

Upthread at least somebody thought each -C group would/should be a
transaction and I can see this confusion coming up again and again,
enough to question whether this patch is an improvement over the
current situation. And if a single -C is too small of an improvement,
maybe it means the whole idea should be dropped. I think the same of
multiple -f as well: to me they're more confusing than helpful for the
same reason.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-16 Thread Andrew Dunstan




On 11/15/2015 08:24 PM, Peter Eisentraut wrote:

On 11/15/15 9:53 AM, Andrew Dunstan wrote:

I suggest you review the original thread on this subject before a line
was ever written. "multiple" (see subject line on this whole thread) is
clearly what is being asked for. Making people turn that into a single
argument is not what was envisaged. See for example Pavel's original
example involving use of xargs where that's clearly not at all easy.

I can see (small) value in having a new option that is like -c but
interprets the string as a fully-featured script like -f.  (Small
because the same behavior can already be had with here strings in bash.)

The behavior should be exactly like -f, including all the behavior with
single-transaction and single-step modes or whatever.

But then I will point out that we currently don't handle multiple -f
options.





If we can only have one I would say the value is vanishingly small.

As to -f, I don't see why we shouldn't allow multiple such options, only 
that nobody has bothered to do it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-16 Thread Ildus Kurbangaliev

On Sun, 15 Nov 2015 16:24:24 -0500
Robert Haas  wrote:

> > I have some questions about next steps on other tranches.
> > First of all, I think we can have two API calls, something like:
> >
> > 1) LWLockRequestTranche(char *tranche_name, int locks_count)
> > 2) LWLockGetTranche(char *tranche_name)
> >
> > LWLockRequestTranche reserve an item in main tranches array in shared 
> > memory, and
> > allocates space for name, LWLockTranche and LWLocks. There is first 
> > question. It is
> > ok if this call will be from *ShmemSize functions? We keep those requests,
> > calculate a required size in LWLockShmemSize (by moving this call to the 
> > end)
> > and create all these tranches in CreateLWLocks.
> 
> I think what we should do about the buffer locks is polish up this
> patch and get it applied:
> 
> http://www.postgresql.org/message-id/20150907175909.gd5...@alap3.anarazel.de
> 
> I think it needs to be adapted to use predefined constants for the
> tranche IDs instead of hardcoded values, maybe based on the attached
> tranche-constants.patch.  This approach could be extended for the
> other stuff in the main tranche, and I think that would be cleaner
> than inventing LWLockRequestTranche as you are proposing.  Just make
> the tranche IDs constants, and then the rest fits nicely into the
> framework we've already got.

What if just create a control struct in shared memory like in other places? 
BufferDescriptors 
and BufferBlocks can be kept there along with tranches definitions
and lwlocks. Buffer locks that are located in MainLWLockArray by offset 
can be moved there too.

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Conversion error of floating point numbers in pl/pgsql

2015-11-16 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> Hello. I found that 9.5 has an undocumented difference from 9.4
> in type cast in pl/pgsql and I think it might better be mentioned
> as a change of behavior in release notes.

> Whether do you think it is worth mentioning or not in release notes?

This seems unnecessarily alarmist to me.  Anybody who's in the habit
of converting between float4 and float8 will already be used to this
behavior, because it is what has always happened everywhere else in
the system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2015-11-16 Thread Artur Zakirov


On 10.11.2015 13:23, Artur Zakirov wrote:


Link to patch in commitfest:
https://commitfest.postgresql.org/8/420/

Link to regression tests:
https://dl.dropboxusercontent.com/u/15423817/HunspellDictTest.tar.gz



Hello!

Do you have any remarks or comments about my patch?

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Amit Kapila

On Mon, Nov 16, 2015 at 4:35 AM, Robert Haas  wrote:
>
> On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:
> >>> And perhaps associated PIDs?
> >>
> >> Yeah, that can be useful, if others also feel like it is important, I
can
> >> look into preparing a patch for the same.
> >
> > Thanks.
>
> Thom, what do you think the EXPLAIN output should look like,
> specifically?  Or anyone else who feels like answering.
>
> I don't think it would be very useful to repeat the entire EXPLAIN
> output n times, once per worker.  That sounds like a loser.
>

Yes, it doesn't seem good idea to repeat the information, but what
about the cases when different workers perform scan on different
relations (partitions in case of Append node) or may be performs a
different operation in Sort or join node parallelism.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Pavel Stehule

2015-11-16 14:17 GMT+01:00 Amit Kapila :

> On Mon, Nov 16, 2015 at 4:35 AM, Robert Haas 
> wrote:
> >
> > On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:
> > >>> And perhaps associated PIDs?
> > >>
> > >> Yeah, that can be useful, if others also feel like it is important, I
> can
> > >> look into preparing a patch for the same.
> > >
> > > Thanks.
> >
> > Thom, what do you think the EXPLAIN output should look like,
> > specifically?  Or anyone else who feels like answering.
> >
> > I don't think it would be very useful to repeat the entire EXPLAIN
> > output n times, once per worker.  That sounds like a loser.
> >
>
> Yes, it doesn't seem good idea to repeat the information, but what
> about the cases when different workers perform scan on different
> relations (partitions in case of Append node) or may be performs a
> different operation in Sort or join node parallelism.
>

+1

Pavel


>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-11-16 Thread Michael Paquier

On Thu, Nov 12, 2015 at 1:16 PM, Kyotaro HORIGUCHI wrote:
> 1. 0001-Allow-regex-module-to-be-used-outside-server.patch
>
>   This small change makes pg_regex possible to be used in
>   frontend.

This is generic enough to live in src/common, then psql would directly
reuse it using lpgcommon.

> 2. 0002-Replace-previous-matching-rule-with-regexps.patch
>
>   Simply replaces existing matching rules almost one-by-one with
>   regular expression matches.

This makes the situation messier. At least with Thomas' patch one can
immediately know the list of words that are being matched for a given
code path while with this patch we need to have a look to the regex
where they are list. And this would get more and more complicated with
new commands added.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

2015-11-16 Thread Kevin Grittner

On Sunday, November 15, 2015 8:51 PM, Craig Ringer  
wrote:

> I'd prefer to omit fields if explicitly assigned to NULL. You can
> always use coalesce if you want the string 'NULL'; I agree with
> others here that the vast majority of users will want the field
> just omitted.

+1

Unfortunately those writing the SQL standard chose to have a single
flag (NULL) to indicate either "unknown" or "not applicable".  That
causes problems where it's not clear which way the value should be
interpreted, but in this case it seems pretty clear that someone
passing a NULL parameter for hint to a function like this doesn't
mean "there is likely to be a valid value for hint, but I don't
know what it is" -- they mean there is no available hint, so please
don't show one.  Any other behavior seems rather silly.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Roles (was: Additional role attributes)

2015-11-16 Thread Michael Paquier

Stephen,

On Tue, Jul 14, 2015 at 9:22 PM, Fujii Masao  wrote:
>
> On Tue, Jul 14, 2015 at 3:46 AM, Stephen Frost  wrote:
> > Fujii,
> >
> > * Fujii Masao (masao.fu...@gmail.com) wrote:
> >> he documents of the functions which the corresponding default roles
> >> are added by this patch need to be updated. For example, the description
> >> of pg_xlog_replay_pause() says "Pauses recovery immediately (restricted
> >> to superusers).". I think that the description needs to mention
> >> the corresponding default role "pg_replay". Otherwise, it's difficult for
> >> users to see which default role is related to the function they want to 
> >> use.
> >> Or probably we can add the table explaining all the relationships between
> >> default roles and corresponding operations. And it's useful.
> >
> > Certainly, totally agree that we need to make it clear in the function
> > descriptions also.
> >
> >> Why do we allow users to change the attributes of default roles?
> >> For example, ALTER ROLE default_role or GRANT ... TO default_role.
> >> Those changes are not dumped by pg_dumpall. So if users change
> >> the attributes for some reasons but they disappear via pg_dumpall,
> >> maybe the system goes into unexpected state.
> >
> > Good point.  I'm fine with simply disallowing that completely; does
> > anyone want to argue that we should allow superusers to ALTER or GRANT
> > to these roles?  I have a hard time seeing the need for that and it
> > could make things quite ugly.
> >
> >> I think that it's better to allow the roles with pg_monitor to
> >> execute pgstattuple functions. They are usually used for monitoring.
> >> Thought?
> >
> > Possibly, but I'd need to look at them more closely than I have time to
> > right now.  Can you provide a use-case?  That would certainly help.
>
> I have seen the monitoring system which periodically executes
> the statement like
>
> SELECT * FROM pgstattuple('hoge');
>
> to monitor the relation's physical length, the number of dead
> tuples, etc. Then, for example, if the number of dead tuples is
> increased unexpectedly and the relation becomes bloated, DBA tries
> to find out the cause and execute the maintenance commands
> if necessary to alleviate the situation. The monitoring system calls
> pgstattuple() at off-peak times because pgstattuple() needs to
> scan all the pages in the relation and its performance penalty
> might be big.
>
> > Also, we are mostly focused on things which are currently superuser-only
> > capabilities, if you don't need to be superuser today then the
> > monitoring system could be granted access using the normal mechanisms.
>
> Currently only superusers can call pgstattuple().


Will there be any work on this patch for this commit fest or not? This
is being carried around for a couple of months now with not much
progress. This thread is idle for 4 months now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BRIN cost estimate

2015-11-16 Thread Alvaro Herrera

Somebody wrote to me a few days ago that the BRIN cost estimation is
rather poor.  One immediately obvious issue which I think is easily
fixed is the correlation estimate, which is currently hardcoded to 1.  

Since a BRIN scan always entails a scan of the relation in physical
order, it's simple to produce a better estimate for that, per the
attached patch.  (Note: trying to run this on expression indexes will
cause a crash.  I need to complete that part ...)

There are other improvements we've been discussing, but I'm unclear that
I will have time to study them to get a useful patch quickly enough.  If
somebody else wants to mess with this, please get in touch.

Thoughts?

-- 
Álvaro Herrera33.5S 70.5W
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 37fad86..5506f67 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7531,6 +7531,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	Cost		spc_random_page_cost;
 	double		qual_op_cost;
 	double		qual_arg_cost;
+	VariableStatData vardata;
 
 	/* Do preliminary analysis of indexquals */
 	qinfos = deconstruct_indexquals(path);
@@ -7544,6 +7545,8 @@ brincostestimate(PG_FUNCTION_ARGS)
 	 * BRIN indexes are always read in full; use that as startup cost.
 	 *
 	 * XXX maybe only include revmap pages here?
+	 * XXX we should consider the revmap at seqpage cost, and regular pages
+	 * at random page cost.
 	 */
 	*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
 
@@ -7558,7 +7561,88 @@ brincostestimate(PG_FUNCTION_ARGS)
 		clauselist_selectivity(root, indexQuals,
 			   path->indexinfo->rel->relid,
 			   JOIN_INNER, NULL);
-	*indexCorrelation = 1;
+
+	/*
+	 * Compute index correlation.
+	 *
+	 * Because we can use all index quals equally when scanning, we can use the
+	 * largest correlation (in absolute value) among columns used by the query.
+	 * Start at zero, the worst possible case.
+	 */
+	*indexCorrelation = 0;
+
+	{
+		RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+		Oid			relid;
+		ListCell   *cell;
+
+		Assert(rte->rtekind == RTE_RELATION);
+		relid = rte->relid;
+		Assert(relid != InvalidOid);
+
+		foreach (cell, qinfos)
+		{
+			IndexQualInfo  *qinfo = (IndexQualInfo *) lfirst(cell);
+			AttrNumber		colnum = index->indexkeys[qinfo->indexcol];
+
+			if (colnum != 0)
+			{
+/* Simple variable -- look to stats for the underlying table */
+if (get_relation_stats_hook &&
+	(*get_relation_stats_hook) (root, rte, colnum, ))
+{
+	/*
+	 * The hook took control of acquiring a stats tuple.  If it did
+	 * supply a tuple, it'd better have supplied a freefunc.
+	 */
+	if (HeapTupleIsValid(vardata.statsTuple) &&
+		!vardata.freefunc)
+		elog(ERROR, "no function provided to release variable stats with");
+}
+else
+{
+	vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+		 ObjectIdGetDatum(relid),
+		 Int16GetDatum(colnum),
+		 /* XXX no inh */
+		 BoolGetDatum(false));
+	vardata.freefunc = ReleaseSysCache;
+}
+
+if (HeapTupleIsValid(vardata.statsTuple))
+{
+	float4	   *numbers;
+	int			nnumbers;
+
+	/* XXX is InvalidOID reqop fine?? */
+	if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+		 STATISTIC_KIND_CORRELATION,
+		 InvalidOid,
+		 NULL,
+		 NULL, NULL,
+		 , ))
+	{
+		double	varCorrelation;
+
+		Assert(nnumbers == 1);
+		varCorrelation = numbers[0];
+
+		if (Abs(varCorrelation) > Abs(*indexCorrelation))
+			*indexCorrelation = varCorrelation;
+
+		free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+	}
+}
+			}
+			else
+			{
+/* Expression --- maybe there are stats for the index itself */
+Assert(false);
+			}
+
+			ReleaseVariableStats(vardata);
+		}
+	}
 
 	/*
 	 * Add on index qual eval costs, much as in genericcostestimate.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: SCRAM authentication

2015-11-16 Thread Michael Paquier

On Sat, Sep 5, 2015 at 9:31 AM, Bruce Momjian wrote:
> On Fri, Sep  4, 2015 at 04:51:33PM -0400, Stephen Frost wrote:
>> > Coming in late, but can you explain how multiple passwords allow for
>> > easier automated credential rotation?  If you have five applications
>> > with stored passwords, I imagine you can't change them all at once, so
>> > with multiples you could change it on one, then go to the others and
>> > change it there, and finally, remove the old password.  Is that the
>> > process?  I am not realizing that without multiple plasswords, this is a
>> > hard problem.
>>
>> That's exactly the process if multiple passwords can be used.  If
>> there's only one account and one password supported then you have to
>> change all the systems all at once and that certainly can be a hard
>> problem.
>>
>> One way to deal with this is to have a bunch of different accounts, but
>> that's certainly not simple either and can get quite painful.
>
> OK, for me, if we can explain the benefit for users, it seems worth
> doing just to allow that.

Reviving an old thread for a patch still registered in this commit
fest to make the arguing move on.

Supporting multiple verifiers for a single role has IMO clear advantages:
- help transition to new protocols and decommission of old protocols
without the need to create alternative roles in the backend when
switching from one or the other.
- move on to a more complex password aging system. The patch currently
submitted allows only one verifier per type and per role so this is
not a complete system. Still, the new catalog table pg_auth_verifiers
could be later easily extended based on other aging properties that we
consider adapted to reach this goal.

There are clear concerns about protocol aging and how to facilitate
the life of users to switch to a new system. Hence, I think that the
patch could include the following:
- A compatibility GUC allowed_password_verifiers defaulting to a list
of verifier protocols that we think are safe. This would be added in
the patch adding infrastructure for multiple verifiers, with default
to md5. In the patch adding SCRAM, the value of this parameter is
changed to md5,scram. If a user create a password verifier with a
protocol not listed in this parameter we return to him a WARNING.
ERROR may be too much but opinions are welcome.
- A superuser cleanup function for pg_auth_verifiers aimed at being
used by pg_upgrade to do needed cleanup of this catalog based on the
previous GUC to remove outdated verifiers. Optionally, we could have
an argument for a list of protocols to clean up.
Using both things we could leverage the upgrade experience and
transition between systems. Say even if at some point we decide to
decommission SCRAM we could reuse the same infrastructure to move on
to a new major version.

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-16 Thread Pavel Stehule

Hi

2015-11-16 17:16 GMT+01:00 Catalin Iacob :

> On Sun, Nov 15, 2015 at 3:53 PM, Andrew Dunstan 
> wrote:
> > I suggest you review the original thread on this subject before a line
> was
> > ever written. "multiple" (see subject line on this whole thread) is
> clearly
> > what is being asked for. Making people turn that into a single argument
> is
> > not what was envisaged. See for example Pavel's original example
> involving
> > use of xargs where that's clearly not at all easy.
>
> I couldn't see why it would matter to have multiple -C, but xargs
> having -n which consumes more than 1 stdin item is indeed an use case.
> When reading the thread I didn't notice it since I didn't know what -n
> does.
>
> But multiple -C is confusing since it suggests the groupings matter


I disagree

The user can choose the best grouping for better readability and
maintainability.

There is not any real reason to limit

a) number of usage -C option
b) number of commands inside -C option.

The multiple usage of -C option is necessary - the backslash commands with
params have to be alone or last in group

But if it is not necessary, then requirement only one commands per option
is unfriendly

Regards

Pavel


> .
>
> To me at least, it feels weird that -C "SELECT 1; SELECT 2;" -C
> "SELECT 3;" is the same as -C "SELECT 1; SELECT 2; SELECT 3" and lots
> of other combinations. It feels like the split in groups must mean
> something, otherwise why would you support/use multiple groups?
>



>
> Upthread at least somebody thought each -C group would/should be a
> transaction and I can see this confusion coming up again and again,
> enough to question whether this patch is an improvement over the
> current situation. And if a single -C is too small of an improvement,
> maybe it means the whole idea should be dropped. I think the same of
> multiple -f as well: to me they're more confusing than helpful for the
> same reason.
>

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Thomas Munro

On Tue, Nov 17, 2015 at 12:44 AM, Simon Riggs  wrote:

> On 15 November 2015 at 10:41, Simon Riggs  wrote:
>
>
>>  So anyway, consider me nudged to finish my patch to provide capability
>> for that by 1 Jan.
>>
>
> My earlier patch aimed to allow WALReceiver to wait on both a latch and a
> socket as well as allow WALWriter to be active, so that WALReceiver could
> reply more quickly and handle greater workload. As I explained previously
> when we discussed that in recent posts, it is necessary infrastructure to
> have anybody wait on anything higher than remote-fsync. I aim to complete
> that by 1 Jan.
>

Right, handing write/fsync work off to WALWriter in standbys makes a lot of
sense for any kind of writer-waits system, so that WALReceiver doesn't
spend time in long syscalls which wouldn't play nicely with signals
(whether from 'kill' or SetLatch) and can deal with network IO with the
lowest possible latency.  I would like to help test/review that, if that
could be useful.

The SIGUSR1 code in the WalReceiverMain and WalRecvWakeup in this patch
works well enough for now for proof-of-concept purposes until then.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Jeff Janes

On Sat, Nov 14, 2015 at 10:12 PM, Amit Kapila  wrote:
> On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes  wrote:
>>
>> On Wed, Nov 11, 2015 at 6:53 AM, Robert Haas 
>> wrote:
>> >
>> > I've committed most of this, except for some planner bits that I
>> > didn't like, and after a bunch of cleanup.  Instead, I committed the
>> > consider-parallel-v2.patch with some additional planner bits to make
>> > up for the ones I removed from your patch.  So, now we have parallel
>> > sequential scan!
>>
>> Pretty cool.  All I had to do is mark my slow plperl functions as
>> being parallel safe, and bang, parallel execution of them for seq
>> scans.
>>
>> But, there does seem to be a memory leak.
>>
>
> Thanks for the report.
>
> I think main reason of the leak in workers seems to be due the reason
> that one of the buffer used while sending tuples (in function
> BuildRemapInfo)
> from worker to master is not getting freed and it is allocated for each
> tuple worker sends back to master.  I couldn't find use of such a buffer,
> so I think we can avoid the allocation of same or atleast we need to free
> it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> issue.

Thanks, that patch (as committed) has fixed the problem for me.  I
don't understand the second one.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog: spurious error message connecting to 9.3

2015-11-16 Thread Marco Nenciarini

On 10/11/15 07:35, Craig Ringer wrote:
> On 10 November 2015 at 01:47, Marco Nenciarini
>  wrote:
> 
>> I've attached a little patch that removes the errors when connected to 9.3.
> 
> Looks good to me. No point confusing users.
> 
> The other callers of RunIdentifySystem are pg_basebackup and
> pg_receivelogical.
> 
> pg_basebackup doesn't ask for the db_name (passes null).
> 
> pg_receivelogical handles it being null already (and if it didn't,
> it'd die with or without this patch).
> 
> pg_receivexlog expects it to be null and fails gracefully if it isn't.
> 
> So this change just removes some pointless noise.
> 

I've added it to the next commit fest to keep track of it. I've also set Craig 
as reviewer, as he has already sent a review for it (and it's a really simple 
patch).

Thanks,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [PATCH] SQL function to report log message

2015-11-16 Thread Tom Lane

Peter Eisentraut  writes:
> On 11/15/15 9:50 PM, Craig Ringer wrote:
>> I'd prefer to omit fields if explicitly assigned to NULL. You can always
>> use coalesce if you want the string 'NULL'; I agree with others here
>> that the vast majority of users will want the field just omitted.

> I think the problem was that you can't omit the primary message.

If you ask me, that would be a feature not a bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-11-16 Thread Peter Eisentraut

On 11/16/15 2:37 AM, Haribabu Kommi wrote:
> On Mon, Nov 16, 2015 at 2:30 PM, Peter Eisentraut  wrote:
>> On 7/21/15 5:15 AM, Haribabu Kommi wrote:
>>> With the output of this view, administrator can identify the lines
>>> that are matching for the given
>>> criteria easily without going through the file.
>>
>> How is this useful?  I could see the use if you want to debug cases of
>> user foo on host bar says they can't connect, but you can't impersonate
>> them to verify it.  But then all you need is a function with a scalar
>> result, not a result set.
> 
> Do you mean the function should return true or false based on the connection
> status with the provided arguments?
> 
> I also feel difficult to understand the function result as compared to a view.

An hba lookup is essentially a lookup by user name, database name,
client address, yielding an authentication method (possibly with
parameters).  So I think this function should work that way as well:
arguments are user name, database name, and so on, and the return value
is an authentication method.  Maybe it would be some kind of record,
with line number and some parameters.

That would address the use case I put forth above.  I don't know whether
that's what you were going for.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-16 Thread Peter Eisentraut

On 11/15/15 11:29 PM, Pavel Stehule wrote:
> 
> 
> 2015-11-16 5:20 GMT+01:00 Peter Eisentraut  >:
> 
> I don't think it's right to reuse SPIError for this.  SPIError is
> clearly meant to signal an error in the SPI calls.  Of course, we can't
> stop users from raising whatever exception they want, but if we're going
> to advertise that users can raise exceptions, then we should create
> separate exception classes.
> 
> I suppose the proper way to set this up would be to create a base class
> like plpy.Error and derive SPIError from that.
> 
> 
> Do you have some ideas about the name of this class?

I think plpy.Error is fine.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

2015-11-16 Thread Peter Eisentraut

On 11/16/15 5:10 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 11/15/15 9:50 PM, Craig Ringer wrote:
>>> I'd prefer to omit fields if explicitly assigned to NULL. You can always
>>> use coalesce if you want the string 'NULL'; I agree with others here
>>> that the vast majority of users will want the field just omitted.
> 
>> I think the problem was that you can't omit the primary message.
> 
> If you ask me, that would be a feature not a bug.

Right.

Frankly, I have lost track of what the problem here is.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] max_worker_processes on the standby

2015-11-16 Thread Alvaro Herrera

I paraphrase Fujii Masao, who wrote:

> 1. Start the master and standby servers with track_commit_timestamp enabled.
> 2. Disable track_commit_timestamp in the master and restart the master server.
> 3. Run checkpoint in the master.
> 4. Run restartpoint in the standby after the checkpoint WAL record generated
> 5. Restart the standby server.
> 6. Enable track_commit_timestamp in the master and restart the master server.
> 7. Disable track_commit_timestamp in the master and restart the master server.

> What I think strange is that pg_last_committed_xact() behaves differently
> in #2, #5, and #7 though the settings of track_commit_timestamp are same
> in both servers, i.e., it's disabled in the master but enabled in the standby.

Interesting, thanks.  You're right that this behaves oddly.

I think in order to fix these two points (#5 and #7), we need to make
the standby not honour the GUC at all, i.e. only heed what the master
setting is.

> 8. Promote the standby server to new master.
> Since committs is still inactive even after the promotion,
> pg_last_committed_xact() keeps causing an ERROR though
> track_commit_timestamp is on.
>
> I was thinking that whether committs is active or not should depend on
> the setting of track_commit_timestamp *after* the promotion.
> The behavior in #8 looked strange.

To fix this problem, we can re-run StartupCommitTs() after recovery is
done, this time making sure to honour the GUC setting.

I haven't tried the scenarios we fixed with the previous round of
patching, but this patch seems to close the problems just reported.
(My next step will be looking over the recovery test framework by
Michael et al, so that I can apply a few tests for this stuff.)
In the meantime, if you can look this over I would appreciate it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 349228d..03d9743 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -510,7 +510,7 @@ BootStrapCommitTs(void)
 	/*
 	 * Nothing to do here at present, unlike most other SLRU modules; segments
 	 * are created when the server is started with this module enabled. See
-	 * StartupCommitTs.
+	 * ActivateCommitTs.
 	 */
 }
 
@@ -544,13 +544,13 @@ ZeroCommitTsPage(int pageno, bool writeXlog)
  * configuration.
  */
 void
-StartupCommitTs(bool force_enable)
+StartupCommitTs(bool honour_config, bool force_enable)
 {
 	/*
 	 * If the module is not enabled, there's nothing to do here.  The module
 	 * could still be activated from elsewhere.
 	 */
-	if (track_commit_timestamp || force_enable)
+	if ((honour_config && track_commit_timestamp) || force_enable)
 		ActivateCommitTs();
 }
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 08d1682..0ed6822 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6567,7 +6567,7 @@ StartupXLOG(void)
 			 * maintained during recovery and need not be started yet.
 			 */
 			StartupCLOG();
-			StartupCommitTs(ControlFile->track_commit_timestamp);
+			StartupCommitTs(false, ControlFile->track_commit_timestamp);
 			StartupSUBTRANS(oldestActiveXID);
 
 			/*
@@ -7336,7 +7336,7 @@ StartupXLOG(void)
 	if (standbyState == STANDBY_DISABLED)
 	{
 		StartupCLOG();
-		StartupCommitTs(false);
+		StartupCommitTs(true, false);
 		StartupSUBTRANS(oldestActiveXID);
 	}
 
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 3844bb3..991dd42 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -34,7 +34,7 @@ extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
-extern void StartupCommitTs(bool force_enable);
+extern void StartupCommitTs(bool honour_config, bool force_enable);
 extern void CommitTsParameterChange(bool xlrecvalue, bool pgcontrolvalue);
 extern void CompleteCommitTsInitialization(void);
 extern void ShutdownCommitTs(void);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-16 Thread Robert Haas

On Wed, Nov 11, 2015 at 11:13 PM, Kouhei Kaigai  wrote:
> I agree with we have no reason why only custom-scan is allowed to have
> serialize/deserialize capability. I can implement an equivalent stuff
> for foreign-scan also, and it is helpful for extension authors, especially,
> who tries to implement remote join feature because FDW driver has to
> keep larger number of private fields than scan.
>
> Also, what is the reason why we allow extensions to define a larger
> structure which contains CustomPath or CustomScanState? It seems to
> me that these types are not (fully) supported by the current copyfuncs,
> outfuncs and readfuncs, aren't it?
> (Although outfuncs.c supports path-nodes, thus CustomPathMethods has
> TextOut callback but no copy/read handler at this moment.)

I feel like we're sort of plunging blindly down a path of trying to
make certain parts of the system extensible and pluggable without
really having a clear vision of where we want to end up.  Somehow I
feel like we ought to start by thinking about a framework for *node*
extensibility; then, within that, we can talk about what that means
for particular types of nodes.

For example, suppose do something like this:

typedef struct
{
   NodeTag tag;
   char *extnodename;
} ExtensibleNode;

typedef stuct
{
char *extnodename;
char *library_name;
char *function_name;
Size node_size;
ExtensibleNode *(*nodeCopy)(ExtensibleNode *);
bool (*nodeEqual)(ExtensibleNode *, ExtensibleNode *);
void (*nodeOut)(StringInfo str, ExtensibleNode *);
ExtensibleNode (*nodeRead)(void);
} ExtensibleNodeMethods;

extern void RegisterExtensibleNodeMethods(ExtensibleNodeMethods *);
extern ExtensibleNodeMethods *GetExtensibleNodeMethods(char *extnodename);

This provides a generic infrastructure so that we don't have to keep
inventing the same stuff over and over, rather than coming up with one
way to do it for ForeignScans and another way for CustomScan and
another way for CustomScanState.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Robert Haas

On Mon, Nov 16, 2015 at 5:44 AM, Simon Riggs  wrote:
> On 15 November 2015 at 14:50, Robert Haas  wrote:
>> On Sun, Nov 15, 2015 at 5:41 AM, Simon Riggs 
>> wrote:
>> > Hmm, if that's where we're at, I'll summarize my thoughts.
>> >
>> > All of this discussion presupposes we are distributing/load balancing
>> > queries so that reads and writes might occur on different nodes.
>>
>> Agreed.  I think that's a pretty common pattern, though certainly not
>> the only one.
> It looks to me this functionality is only of use in a pooler. Please explain
> how else this would be used.

I think you're right.  I mean, you could have the pooling built into
the application, but some place connections have to be being farmed
out to different nodes, or there's no point to using this feature.
Some people may not want to use this feature, but those who do are
using some form of pooling at some level.

>> > Your option (2) is wider but also worse in some ways. It can be
>> > implemented
>> > in a pooler.
>> >
>> > Your option (3) doesn't excite me much. You've got a load of stuff that
>> > really should happen in a pooler. And at its core we have
>> > synchronous_commit
>> > = apply but with a timeout rather than a wait.
>>
>> I don't see how either option (2) or option (3) could be implemented
>> in a pooler.  How would that work?
>
> My starting thought was that (1) was the only way forwards. Through
> discussion, I now see that its not the best solution for the general case.
>
> The pooler knows which statements are reads and writes, it also knows about
> transaction boundaries, so it is possible for it to perform the waits for
> either (2) or (3).

As Craig says, it may not: pgbouncer, for example, won't.  pgpool
will, except when it's wrong because some apparently read-only
function is actually writing data.  But even if the pooler does know,
that isn't enough for it to perform the waits for (2) or (3) without
some support for the server.  If it wants to wait for a particular
transaction to be applied on the standby, it needs to know how long to
wait, and without some support for the server, it has no way of
knowing.  Now that could be done by doing (1) and then having the
pooler perform the waits, but now every pooler has to be taught how to
do that.  pgpool needs to know, pgbouncer needs to know, every
JDBC-based connection pooler needs to know.  Uggh.  Thomas's solution
is nice because it works with any pooler.

The other point I'd make, which I think may be what you said before
but I think is worth making very explicit, is that (1) supposes that
we know which reads are dependent on which previous writes.  In the
real world, that's probably frequently untrue.  If someone does SELECT
sum(salary) FROM emp on the standby, there's no particular write to
the emp table that they want to wait for: they want to wait for ALL
such writes previously acknowledged as committed.  Now, even when the
dependent writes can be identified, it may be convenient to just wait
for all of them instead of a particular subset that we know are
related.  But I bet there will be many cases where identification is
impractical or impossible, and thus I suspect (1) won't be very
useful.

I think (2) and (3) both have good prospects for being useful, but I
suspect that the performance consequences of (3), which is what Thomas
actually implemented, although possibly severe, are still likely to be
only a fraction of the cost of (2).  Having to potentially wait every
time a standby takes a snapshot just sounds awful to me.

> I would like to see a load balancing pooler in Postgres.

Me, too, but I don't expect that to be possible in the near future,
and I think this is awfully useful until it does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses

2015-11-16 Thread Peter Geoghegan

On Thu, Nov 12, 2015 at 4:38 PM, Peter Geoghegan  wrote:
> * bytea default opclass. This is a type that, like the others, shares
> its representation with text (a varlena header and some data bytes --
> a string, essentially). Its comparator already behaves identically to
> that of the text comparator when the "C" collation is used. I've
> actually seen a specific request for this [1].

I probably should have given an explanation for why it's okay that NUL
bytes can exist in strings from the point of view of the generalized
SortSupport for text worker function. The next revision will have
comments along those lines.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Alvaro Herrera

Konstantin Knizhnik wrote:

> But you may notice that original TransactionIdSetTreeStatus function is void
> - it is not intended to return anything.
> It is called in RecordTransactionCommit in critical section where it is not
> expected that commit may fail.
> But in case of DTM transaction may be rejected by arbiter. XTM API allows to
> control access to CLOG, so everybody will see that transaction is aborted.
> But we in any case have to somehow notify client about abort of transaction.

I think you'll need to rethink how a transaction commits rather
completely, rather than consider localized tweaks to specific functions.
For one thing, the WAL record about transaction commit has already been
written by XactLogCommitRecord much earlier than calling
TransactionIdCommitTree.  So if you were to crash at that point, it
doesn't matter how much the arbiter has rejected the transaction, WAL
replay would mark it as committed.  Also, what about the replication
origin stuff and the TransactionTreeSetCommitTsData() call?

I think you need to involve the arbiter earlier, so that the commit
process can be aborted earlier than those things.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Konstantin Knizhnik


On 11/16/2015 10:54 PM, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:


But you may notice that original TransactionIdSetTreeStatus function is void
- it is not intended to return anything.
It is called in RecordTransactionCommit in critical section where it is not
expected that commit may fail.
But in case of DTM transaction may be rejected by arbiter. XTM API allows to
control access to CLOG, so everybody will see that transaction is aborted.
But we in any case have to somehow notify client about abort of transaction.

I think you'll need to rethink how a transaction commits rather
completely, rather than consider localized tweaks to specific functions.
For one thing, the WAL record about transaction commit has already been
written by XactLogCommitRecord much earlier than calling
TransactionIdCommitTree.  So if you were to crash at that point, it
doesn't matter how much the arbiter has rejected the transaction, WAL
replay would mark it as committed.


Yes, WAL replay will recover this transaction and try to mark it in CLOG as 
completed, but ... we have caught control over CLOG using XTM.
And instead of direct writing to CLOG, DTM will contact arbiter and ask his 
opinion concerning this transaction.
If arbiter doesn't think that it was committed, then it will not be marked as 
committed in local CLOG.



  Also, what about the replication
origin stuff and the TransactionTreeSetCommitTsData() call?

I think you need to involve the arbiter earlier, so that the commit
process can be aborted earlier than those things.

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Kevin Grittner

On Monday, November 16, 2015 2:47 AM, Konstantin Knizhnik 
 wrote:

> Some time ago at PgConn.Vienna we have proposed eXtensible
>Transaction Manager API (XTM).
> The idea is to be able to provide custom implementation of
>transaction managers as standard Postgres extensions,
> primary goal is implementation of distritibuted transaction manager.
> It should not only support 2PC, but also provide consistent
>snapshots for global transaction executed at different nodes.
>
> Actually, current version of XTM API  propose any particular 2PC
>model. It can be implemented either at coordinator side
> (as it is done in our pg_tsdtm implementation based on timestamps
> and not requiring centralized arbiter), either by arbiter
> (pg_dtm).

I'm not entirely clear on what you're saying here.  I admit I've
not kept in close touch with the distributed processing discussions
lately -- is there a write-up and/or diagram to give an overview of
where we're at with this effort?

> In the last case 2PC logic is hidden under XTM
> SetTransactionStatus method:
>
>  bool (*SetTransactionStatus)(TransactionId xid, int nsubxids,
>TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
>
> which encapsulates TransactionIdSetTreeStatus in clog.c.
> But you may notice that original TransactionIdSetTreeStatus function
>is void - it is not intended to return anything.
> It is called in RecordTransactionCommit in critical section where it
>is not expected that commit may fail.

This issue, though, seems clear enough.  At some point a
transaction must cross a hard line between when it is not committed
and when it is, since after commit subsequent transactions can then
see the data and modify it.  There has to be some "point of no
return" in order to have any sane semantics.  Entering that critical
section is it.

> But in case of DTM transaction may be rejected by arbiter. XTM API
>allows to control access to CLOG, so everybody will see that
>transaction is aborted. But we in any case have to somehow notify
>client about abort of transaction.

If you are saying that DTM tries to roll back a transaction after
any participating server has entered the RecordTransactionCommit()
critical section, then IMO it is broken.  Full stop.  That can't
work with any reasonable semantics as far as I can see.

> We can not just call elog(ERROR,...) in SetTransactionStatus
>implementation because inside critical section it cause Postgres
>crash with panic message. So we have to remember that transaction is
>rejected and report error later after exit from critical section:

I don't believe that is a good plan.  You should not enter the
critical section for recording that a commit is complete until all
the work for the commit is done except for telling the all the
servers that all servers are ready.

> There is one more problem - at this moment the state of transaction
>is TRANS_COMMIT.
> If ERROR handler will try to abort it, then we get yet another fatal
>error: attempt to rollback committed transaction.
> So we need to hide the fact that transaction is actually committed
>in local XLOG.

That is one of pretty much an infinite variety of problems you have
if you don't have a "hard line" for when the transaction is finally
committed.

> This approach works but looks a little bit like hacker approach. It
>requires not only to replace direct call of
>TransactionIdSetTreeStatus with indirect (though XTM API), but also
>requires  to make some non obvious changes in
>RecordTransactionCommit.
>
> So what are the alternatives?
>
> 1. Move RecordTransactionCommit to XTM. In this case we have to copy
>original RecordTransactionCommit to DTM implementation and patch it
>here. It is also not nice, because it will complicate maintenance of
>DTM implementation.
> The primary idea of XTM is to allow development of DTM as standard
>PostgreSQL extension without creating of specific clones of main
>PostgreSQL source tree. But this idea will be compromised if we have
>copy some pieces of PostgreSQL code.
> In some sense it is even worser than maintaining separate branch -
>in last case at least we have some way to perfrtom automatic merge.

You can have a call in XTM that says you want to record the the
commit on all participating servers, but I don't see where that
would involve moving anything we have now out of each participating
server -- it would just need to function like a real,
professional-quality distributed transaction manager doing the
second phase of a two-phase commit.  If any participating server
goes through the first phase and reports that all the heavy lifting
is done, and then is swallowed up in a pyroclastic flow of an
erupting volcano before phase 2 comes around, the DTM must
periodically retry until the administrator cancels the attempt.

> 2. Propose some alternative two-phase commit implementation in
>

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Bert

Hey,

I've just pulled and compiled the new code.
I'm running a TPC-DS like test on different PostgreSQL installations, but
running (max) 12queries in parallel on a server with 12cores.
I've configured max_parallel_degree to 2, and I get messages that backend
processes crash.
I am running the same test now with 6queries in parallel, and parallel
degree to 2, and they seem to work. for now. :)

This is the output I get in /var/log/messages
Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at
7fa3437bf104 ip 00490b56 sp 7ffdf2f083a0 error 6 in
postgres[40+5b5000]

Is there something else I should get?

cheers,
Bert

On Mon, Nov 16, 2015 at 6:06 PM, Jeff Janes  wrote:

> On Sat, Nov 14, 2015 at 10:12 PM, Amit Kapila 
> wrote:
> > On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes 
> wrote:
> >>
> >> On Wed, Nov 11, 2015 at 6:53 AM, Robert Haas 
> >> wrote:
> >> >
> >> > I've committed most of this, except for some planner bits that I
> >> > didn't like, and after a bunch of cleanup.  Instead, I committed the
> >> > consider-parallel-v2.patch with some additional planner bits to make
> >> > up for the ones I removed from your patch.  So, now we have parallel
> >> > sequential scan!
> >>
> >> Pretty cool.  All I had to do is mark my slow plperl functions as
> >> being parallel safe, and bang, parallel execution of them for seq
> >> scans.
> >>
> >> But, there does seem to be a memory leak.
> >>
> >
> > Thanks for the report.
> >
> > I think main reason of the leak in workers seems to be due the reason
> > that one of the buffer used while sending tuples (in function
> > BuildRemapInfo)
> > from worker to master is not getting freed and it is allocated for each
> > tuple worker sends back to master.  I couldn't find use of such a buffer,
> > so I think we can avoid the allocation of same or atleast we need to free
> > it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> > issue.
>
> Thanks, that patch (as committed) has fixed the problem for me.  I
> don't understand the second one.
>
> Cheers,
>
> Jeff
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Bert Desmet
0477/305361

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Robert Haas

On Mon, Nov 16, 2015 at 2:51 PM, Bert  wrote:
> I've just pulled and compiled the new code.
> I'm running a TPC-DS like test on different PostgreSQL installations, but
> running (max) 12queries in parallel on a server with 12cores.
> I've configured max_parallel_degree to 2, and I get messages that backend
> processes crash.
> I am running the same test now with 6queries in parallel, and parallel
> degree to 2, and they seem to work. for now. :)
>
> This is the output I get in /var/log/messages
> Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at 7fa3437bf104
> ip 00490b56 sp 7ffdf2f083a0 error 6 in postgres[40+5b5000]
>
> Is there something else I should get?

Can you enable core dumps e.g. by passing the -c option to pg_ctl
start?  If you can get a core file, you can then get a backtrace
using:

gdb /path/to/postgres /path/to/core
bt full
q

That should be enough to find and fix whatever the bug is.  Thanks for testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

2015-11-16 Thread Peter Eisentraut

On 11/15/15 9:50 PM, Craig Ringer wrote:
> On 16 November 2015 at 09:50, Peter Eisentraut  > wrote:
> 
> 
> I haven't seen this discussed before, but I don't find the name
> pg_report_log particularly good.  Why not jut pg_log?
> 
> 
> Sounds like a better name to me. 'report' is noise that adds nothing useful.
> 
> I'd like to have this functionality.
> 
> I'd prefer to omit fields if explicitly assigned to NULL. You can always
> use coalesce if you want the string 'NULL'; I agree with others here
> that the vast majority of users will want the field just omitted.

I think the problem was that you can't omit the primary message.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

2015-11-16 Thread dinesh kumar

On Mon, Nov 16, 2015 at 3:58 PM, Kevin Grittner  wrote:

> On Sunday, November 15, 2015 8:51 PM, Craig Ringer 
> wrote:
>
> > I'd prefer to omit fields if explicitly assigned to NULL. You can
> > always use coalesce if you want the string 'NULL'; I agree with
> > others here that the vast majority of users will want the field
> > just omitted.
>
> +1
>
> Unfortunately those writing the SQL standard chose to have a single
> flag (NULL) to indicate either "unknown" or "not applicable".  That
> causes problems where it's not clear which way the value should be
> interpreted, but in this case it seems pretty clear that someone
> passing a NULL parameter for hint to a function like this doesn't
> mean "there is likely to be a valid value for hint, but I don't
> know what it is" -- they mean there is no available hint, so please
> don't show one.  Any other behavior seems rather silly.
>
> Thanks Kevin/Craig for your comments.



> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 

Regards,
Dinesh
manojadinesh.blogspot.com

Re: [HACKERS] Proposing COPY .. WITH PERMISSIVE

2015-11-16 Thread Michael Paquier

On Mon, Nov 16, 2015 at 11:27 AM, Michael Paquier wrote:
> In short, it seems that this patch is better rejected. And I am
> planning to do so if there are no complaints.

OK, done so in the CF app.
Dinesh, please do not be discouraged by that. Everybody has patches
rejected, and I know a bit of it :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2015-11-16 Thread Marko Tiikkaja


On 9/11/15 12:25 PM, Pavel Stehule wrote:

new update of parse_ident function patch


Nice!  I've certainly wanted something like this a number of times.

Some comments about the v2 of the patch:

   - The patch doesn't apply anymore, so it should be rebased.
   - The docs don't even try and explain what the "strictmode" 
parameter does.
   - The comment before the parse_ident function is not up to date 
anymore, since "the rest" was removed from the interface.
   - I can't immediately grep for any uses of  do { .. } while (true) 
from our code base.  AFAICT the majority look like  for (;;);  I see no 
reason not to be consistent here.
   - What should happen if the input is a string like 
'one.two.three.four.five.six'?  Do we want to accept input like that?
   - I haven't reviewed the actual parsing code carefully, but didn't 
we already have a function which splits identifiers up?  I of course 
can't find one with my grepping right now, so I might be wrong.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-16 Thread Andrew Dunstan




On 11/16/2015 11:16 AM, Catalin Iacob wrote:

On Sun, Nov 15, 2015 at 3:53 PM, Andrew Dunstan  wrote:

I suggest you review the original thread on this subject before a line was
ever written. "multiple" (see subject line on this whole thread) is clearly
what is being asked for. Making people turn that into a single argument is
not what was envisaged. See for example Pavel's original example involving
use of xargs where that's clearly not at all easy.

I couldn't see why it would matter to have multiple -C, but xargs
having -n which consumes more than 1 stdin item is indeed an use case.
When reading the thread I didn't notice it since I didn't know what -n
does.

But multiple -C is confusing since it suggests the groupings matter.

To me at least, it feels weird that -C "SELECT 1; SELECT 2;" -C
"SELECT 3;" is the same as -C "SELECT 1; SELECT 2; SELECT 3" and lots
of other combinations. It feels like the split in groups must mean
something, otherwise why would you support/use multiple groups?

Upthread at least somebody thought each -C group would/should be a
transaction and I can see this confusion coming up again and again,
enough to question whether this patch is an improvement over the
current situation. And if a single -C is too small of an improvement,
maybe it means the whole idea should be dropped. I think the same of
multiple -f as well: to me they're more confusing than helpful for the
same reason.




I honestly don't see what's so confusing about it, and if there is any 
confusion then surely we could make sure what's happening is well 
documented.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-16 Thread Jeff Janes

On Mon, Nov 9, 2015 at 6:37 AM, Tom Lane  wrote:
> kawami...@tkl.iis.u-tokyo.ac.jp writes:
>>   - cost parameter calibration: random_page_cost = 92.89
>
> TBH, you lost me there already.  I know of no hardware on which that would
> be a sane depiction of reality, so I think you've probably overfitted the
> model to some particular case it was already inaccurate on.

I can easily get a ratio of random to sequential of 50, and my RAID is
nothing special.  I don't see why a high-end RAID couldn't justify 100
or more, as long as they know the cache-hit is very low. (The default
value of 4 seems to bake in the notion that 90% of even random IO is
going to be satisfied from the cache, which might be a good estimate
if you have frequently used smallish lookup tables that always get
joined to the RAM-busters, but some people aren't going to have that
type of database queries as their main load).

With the current code, a single scan out of several can get estimated
to have a higher cost than just a free-standing single scan
(loop_count > 1), and I don't see how that can ever make sense.

Right now it can only benefit from assumed cache hits (driven by
effective_cache_size) via Mackert and Lohman.

I think that, at least, it should get to claim the greater of either
the Mackert and Lohman benefit between inner scans, or the benefit of
converting some random IO to sequential within each separate inner
scan.

And really, I don't see why it should not get both benefits.  If the
pages are still in cache between inner scans, that's great.  But if
the one time they do need to be read in from disk they are read in
mostly sequentially, why is that benefit not also fully justified? I
don't see where the worry about "double-counting the cache effects"
comes from.  The effective_cache_size and io read-ahead can both apply
and both give benefits simultaneously and cumulatively.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-16 Thread Robert Haas

On Mon, Nov 16, 2015 at 7:32 AM, Ildus Kurbangaliev
 wrote:
> What if just create a control struct in shared memory like in other places? 
> BufferDescriptors
> and BufferBlocks can be kept there along with tranches definitions
> and lwlocks. Buffer locks that are located in MainLWLockArray by offset
> can be moved there too.

Yeah, we could do that, but what's the advantage of it?  The alignment
of the buffer descriptors is kinda finnicky and matters to
performance, so it seems better not to prefix them with something that
might perturb it.  If we just rebase Andres' patch over what I just
committed and add in something so that the buffer numbers are fed from
#defines or an enum instead of being random integers, I think we're
done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Conversion error of floating point numbers in pl/pgsql

2015-11-16 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 16 Nov 2015 09:49:54 -0500, Tom Lane  wrote in 
<32508.1447685...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > Hello. I found that 9.5 has an undocumented difference from 9.4
> > in type cast in pl/pgsql and I think it might better be mentioned
> > as a change of behavior in release notes.
> 
> > Whether do you think it is worth mentioning or not in release notes?
> 
> This seems unnecessarily alarmist to me.  Anybody who's in the habit
> of converting between float4 and float8 will already be used to this
> behavior, because it is what has always happened everywhere else in
> the system.

I guess not a small number of users don't have an enough insight
to antcipate such influence, but I'll agree to ommit this if such
a kind of users are not in target of the release notes.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-11-16 Thread Peter Geoghegan

On Mon, Sep 7, 2015 at 9:20 PM, Peter Geoghegan  wrote:
>>> This matters because a major cost during CREATE INDEX CONCURRENTLY is
>>> a TID-based datum sort (this is probably most of the cost over and
>>> above a conventional CREATE INDEX).
>>
>> Might be better to hack a special case right there (ie, embed TIDs into
>> int8s and sort the int8s) rather than try to change the type's SQL
>> declaration.
>
> That sounds at least as effective as what I originally sketched.

> I hope someone picks this up soon.

I suggested to someone else that he take a look at this as a project,
but I guess he was busy too. I decided to just do it myself. Patch is
attached. This has the bulkdelete callback that gathers TIDs from the
index during CREATE INDEX CONCURRENTLY encode them as int8 values
ahead of the sort, while sorting the values as int8 values. They're
later decoded as needed.

This turns out to be a relatively simple patch. Testing shows it
removes *most* of the overhead of CREATE INDEX CONCURRENTLY over
CREATE INDEX. In absolute terms, a test case involving a CREATE INDEX
CONCURRENTLY on a single integer column takes about 30% - 40% less
time with the patch applied. The TID sort itself is about 3 times
faster, and that's without the benefit of the patch making the TID
sort an internal sort where it would otherwise have been an external
sort. Sorting item pointers as TIDs naively currently wastes a lot of
memory (not just memory bandwidth) -- a pass-by-value datum sort only
ever allocates memory for SortTuples, avoiding allocating any memory
for a "tuple proper". Clearly just having the sort be pass-by-value is
*much* faster. As comments above process_ordered_aggregate_single()
put it:

 * The one-input case is handled separately from the multi-input case
 * for performance reasons: for single by-value inputs, such as the
 * common case of count(distinct id), the tuplesort_getdatum code path
 * is around 300% faster.  (The speedup for by-reference types is less
 * but still noticeable.)

Another way of stating how things are improved here is: my CREATE
INDEX CONCURRENTLY test case had about a 2.1x overhead over an
equivalent CREATE INDEX on master, but with the patch applied the
CREATE INDEX CONCURRENTLY has an overhead of only about 1.3x. The
extra logical I/O for CONCURRENTLY's second scan, and for the
physical-order index scan (to "bulk delete", to gather TIDs to sort)
is a surprisingly small cost.

I'll add the patch to the open commitfest. Think of all these patches
of mine as giving reviewers a choice of what to review. This patch
does seem like a slam dunk, even if I do say so myself, so at least
it's relatively easy to review. The external sort work remains my most
interesting pending work, though.

-- 
Peter Geoghegan
From 3c7551435394090a4c2d3e7588d018ea9dca2482 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 14 Nov 2015 16:57:20 -0800
Subject: [PATCH] Speed up CREATE INDEX CONCURRENTLY's TID sort

Add a simple TID encoding scheme.  This is used for encoding and
decoding TIDs to and from int8/int64 values, for the benefit of an
expensive tuplesort operation performed by CREATE INDEX CONCURRENTLY.
The sort now uses the default int8 opclass to sort values in the ad-hoc
encoding, and so benefits from the availability of int8 SortSupport.

Despite the fact that item pointers take up only 6 bytes on disk, the
TID type is always pass-by-reference due to considerations about how
Datums are represented and accessed.  On the other hand, int8 is
usually, in practice, a pass-by-value type.

Testing shows this approach makes CREATE INDEX CONCURRENTLY
significantly faster on platforms where int8 is pass-by-value,
especially when the big saving in memory within tuplesort.c prevents the
sort from being performed externally.  The approach will help to some
degree on all platforms, though.
---
 src/backend/catalog/index.c | 56 +
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e59b163..7429f3c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -109,6 +109,8 @@ static void index_update_stats(Relation rel,
 static void IndexCheckExclusion(Relation heapRelation,
 	Relation indexRelation,
 	IndexInfo *indexInfo);
+static inline int64 itemptr_encode(ItemPointer itemptr);
+static inline void itemptr_decode(ItemPointer itemptr, int64 encoded);
 static bool validate_index_callback(ItemPointer itemptr, void *opaque);
 static void validate_index_heapscan(Relation heapRelation,
 		Relation indexRelation,
@@ -2832,7 +2834,13 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;

-	state.tuplesort = tuplesort_begin_datum(TIDOID, TIDLessOperator,
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Simon Riggs

On 15 November 2015 at 10:41, Simon Riggs  wrote:

>  So anyway, consider me nudged to finish my patch to provide capability
> for that by 1 Jan.
>

My earlier patch aimed to allow WALReceiver to wait on both a latch and a
socket as well as allow WALWriter to be active, so that WALReceiver could
reply more quickly and handle greater workload. As I explained previously
when we discussed that in recent posts, it is necessary infrastructure to
have anybody wait on anything higher than remote-fsync. I aim to complete
that by 1 Jan.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [DESIGN] ParallelAppend

2015-11-16 Thread Robert Haas

On Mon, Nov 16, 2015 at 10:10 AM, Kouhei Kaigai  wrote:
> This idea will solve my concern gracefully.
> The new partial_pathlist keeps candidate of path-nodes to be gathered
> in this level or upper. Unlike path-nodes in the pathlist already, we
> don't need to rip off GatherPath later.

Cool, yes.

> Can we expect any path-nodes in the partial_pathlist don't contain
> underlying GatherPath even if and when we would apply this design on
> joinrel also?

Yes.  A path that's already been gathered is not partial any more.
However, to create a partial path for a joinrel, we must join a
partial path to a complete path.  The complete path mustn't be one
which internally contains a Gather.   This is where we need another
per-path flag, I think.

> I'd like to agree with this idea. If Append can handle restricted_plans
> concurrently with safe_plans and parallel_plans, we don't need to give
> up parallelism even if any of child relation has neither safe- nor
> parallel-plans.

Right.

> One thing we need to pay attention is, we have to inform Gather node
> to kick local sub-plans if underlying Append node has any restricted
> plans. It also needs to distinguish the case when Gather node cannot
> launch any background workers, because the first case runs only type-C
> but the second case has to run all the sub-plans in local context.

I don't think that Gather needs to know anything about what's under
the Append.  What I think we want is that when we execute the Append:

(1) If we're the leader or not in parallel mode, run restricted plans,
then parallel plans, then safe plans.
(2) If we're a worker, run safe plans, then parallel plans.
(3) Either way, never run a safe plan if the leader or some other
worker has already begun to execute it.

The reason to have the leader prefer parallel plans to safe plans is
that it is more likely to become a bottleneck than the workers.  Thus
it should prefer to do work which can be split up rather than claiming
a whole plan for itself.  But in the case of restricted plans it has
no choice, since no one else can execute those, and it should do them
first, since they may be the limiting factor in finishing the whole
plan.

>> Incidentally, I think it's subtly wrong to think of the parallel_aware
>> flag as telling you whether the plan can absorb multiple workers.
>> That's not really what it's for.  It's to tell you whether the plan is
>> doing *something* parallel aware - that is, whether its Estimate,
>> InitializeDSM, and InitializeWorker callbacks should do anything.  For
>> SeqScan, flipping parallel_aware actually does split the input among
>> all the workers, but for Append it's probably just load balances and
>> for other nodes it might be something else again.  The term I'm using
>> to indicate a path/plan that returns only a subset of the results in
>> each worker is "partial".
>>
> Therefore, a NestLoop that takes underlying ParallelSeqScan and IndexScan
> may not be parallel aware by itself, however, it is exactly partial.

Right.

> This NestLoop will has parallel_degree likely larger than "1", won't it?

Larger than 0.

> It seems to me the "partial" is more clear concept to introduce how sub-
> plan will perform.

Good.

>> Whether or not a path is partial is, in the
>> design embodied in this patch, indicated both by whether
>> path->parallel_degree > 0 and whether the path is in rel->pathlist or
>> rel->partial_pathlist.
>>
> We should have Assert to detect paths with parallel_degree==0 but in
> the rel->partial_pathlist or parallel_degree > 1 but not appear in
> the rel->partial_pathlist?

parallel_degree==0 in the partial_pathlist is bad, but
parallel_degree>0 in the regular pathlist is OK, at least if it's a
Gather node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] check for interrupts in set_rtable_names

2015-11-16 Thread Jeff Janes

On Mon, Nov 16, 2015 at 8:35 AM, Tom Lane  wrote:
> I wrote:
>> I experimented with using a hash table to avoid the O(N^2) behavior.
>> This seems to work quite well, and I think it doesn't change the results
>> (leastwise, it does not break the regression tests).
>
> Looking at this again in the light of morning, it occurred to me that it's
> pretty broken in the face of long (approaching NAMEDATALEN) input
> identifiers.  If the de-duplication digits are beyond the NAMEDATALEN
> threshold, it would actually get into an infinite loop because hash_search
> would ignore them as not part of the key.  That can be fixed by truncating
> the name appropriately.
>
> However: actually, this code had a problem already with long identifiers.
> What happened before was that it would blithely add digits and thereby
> produce an overlength identifier, which indeed looks distinct from the
> other ones in the query --- but if we're dumping a view/rule, the
> identifier won't be distinct after truncation, which means that
> dump/reload could fail, or even worse, silently produce something with
> different semantics than intended.
>
> So the attached updated patch takes care of that problem, not only for
> relation names but also for column names.
>
> I had been leaning towards not back-patching this, but I now feel that
> we must back-patch at least the truncation fix, and we probably might as
> well back-patch it in toto.  Comments?

As-committed it has solved the problem, as best as I can tell.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-16 Thread Robert Haas

On Sun, Nov 15, 2015 at 7:20 PM, and...@anarazel.de  wrote:
>>  /*
>> + * We reserve a few predefined tranche IDs.  These values will never be
>> + * returned by LWLockNewTrancheId.
>> + */
>> +#define LWTRANCHE_MAIN   0
>> +#define LWTRANCHE_BUFFER_CONTENT 1
>> +#define LWTRANCHE_BUFFER_IO_IN_PROGRESS  2
>> +#define LWTRANCHE_LAST_BUILTIN_ID
>> LWTRANCHE_BUFFER_IO_IN_PROGRESS
>
> Nitpick: I'm inclined to use an enum to avoid having to adjust the last
> builtin id when adding a new builtin tranche.

I prefer to do it this way because sometimes enums require a cast.
But if you do the work, I'm not going to fight you over this.

(If I do the work, on the other hand, ...)

> Looks mis-indented now, similarly in a bunch of other places. Maybe
> pg-indent afterwards?

pgindent doesn't change anything for me.

> So, looks good to me.

Great.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-11-16 Thread Kyotaro HORIGUCHI

Hello,

At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada  
wrote in

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Amit Kapila

On Mon, Nov 16, 2015 at 7:39 AM, Robert Haas  wrote:
> On Sun, Nov 15, 2015 at 1:12 AM, Amit Kapila 
wrote:
> > Thanks for the report.
> >
> > I think main reason of the leak in workers seems to be due the reason
> > that one of the buffer used while sending tuples (in function
> > BuildRemapInfo)
> > from worker to master is not getting freed and it is allocated for each
> > tuple worker sends back to master.  I couldn't find use of such a
buffer,
> > so I think we can avoid the allocation of same or atleast we need to
free
> > it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> > issue.
>
> Oops.  Committed.
>

Thanks!

> > Another thing I have noticed is that we need to build the remap info
> > target list contains record type of attrs, so ideally it should not
even go
> > in
> > this path when such attrs are not present.  The reason for the same was
> > that the tuple descriptor stored in TQueueDestReceiver was not updated,
> > attached patch fix_initialization_tdesc_v1 fixes this issue.
>
> I don't understand this part.
>

The code in question is as below:

tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
{
..

if (tqueue->tupledesc != tupledesc ||

tqueue->remapinfo->natts != tupledesc->natts)

{

if (tqueue->remapinfo != NULL)

pfree(tqueue->remapinfo);

tqueue->remapinfo = BuildRemapInfo(tupledesc);

}
..
}

Here the above check always passes as tqueue->tupledesc is not
set due to which it always try to build remap info.  Is there any reason
for doing so?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2015-11-16 Thread Amit Kapila

On Mon, Nov 16, 2015 at 10:36 PM, Jeff Janes  wrote:
>
> On Sat, Nov 14, 2015 at 10:12 PM, Amit Kapila 
wrote:
> > On Fri, Nov 13, 2015 at 11:05 PM, Jeff Janes 
wrote:
> >
> > I think main reason of the leak in workers seems to be due the reason
> > that one of the buffer used while sending tuples (in function
> > BuildRemapInfo)
> > from worker to master is not getting freed and it is allocated for each
> > tuple worker sends back to master.  I couldn't find use of such a
buffer,
> > so I think we can avoid the allocation of same or atleast we need to
free
> > it.  Attached patch remove_unused_buf_allocation_v1.patch should fix the
> > issue.
>
> Thanks, that patch (as committed) has fixed the problem for me.
>

Thanks to you as well for verification.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] vacuumdb sentence

2015-11-16 Thread Peter Eisentraut

On 9/17/15 11:00 PM, Euler Taveira wrote:
> Hi,
> 
> Is there a reason to quote "jobs" at this sentence?
> 
> 190 fprintf(stderr, _("%s: number of parallel \"jobs\" must be at least
> 1\n"),
> progname);
> 
> AFAICS "jobs" is neither an identifier nor an option to justify the
> quotation. Also, another message a few lines above (correctly) does not
> use quotes.

Fixed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-11-16 Thread Robert Haas

On Sun, Nov 15, 2015 at 1:47 AM, Amit Kapila  wrote:
> On Sat, Nov 14, 2015 at 1:19 AM, Andres Freund  wrote:
>> On 2015-10-31 11:02:12 +0530, Amit Kapila wrote:
>> > On Thu, Oct 8, 2015 at 11:05 PM, Simon Riggs 
>> > wrote:
>> > >
>> > > On 1 October 2015 at 23:30, Josh Berkus  wrote:
>> > >>
>> > >> On 10/01/2015 07:43 AM, Robert Haas wrote:
>> > >> > On Thu, Oct 1, 2015 at 9:44 AM, Fujii Masao 
>> > wrote:
>> > >> >> I wonder how much it's worth renaming only the file extension
>> > >> >> while
>> > >> >> there are many places where "visibility map" and "vm" are used,
>> > >> >> for example, log messages, function names, variables, etc.
>> > >> >
>> > >> > I'd be inclined to keep calling it the visibility map (vm) even if
>> > >> > it
>> > >> > also contains freeze information.
>> > >> >
>> >
>> > What is your main worry about changing the name of this map, is it
>> > about more code churn or is it about that we might introduce new issues
>> > or is it about that people are already accustomed to call this map as
>> > visibility map?
>>
>> Several:
>> * Visibility map is rather descriptive, none of the replacement terms
>>   imo come close. Few people will know what a 'freeze' map is.
>> * It increases the size of the patch considerably
>> * It forces tooling that knows about the layout of the database
>>   directory to change their tools
>>
>
> All these points are legitimate.
>
>> On the benfit side the only argument I've heard so far is that it allows
>> to disambiguate the format. But, uh, a look at the major version does
>> that just as well, for far less trouble.
>>
>> > It seems to me quite logical for understanding purpose as well.  Any new
>> > person who wants to work in this area or is looking into it will always
>> > wonder why this map is named as visibility map even though it contains
>> > information about visibility of page as well as frozen state of page.
>>
>> Being frozen is about visibility as well.
>>
>
> OTOH being visible doesn't mean page is frozen.  I understand that frozen is
> related to visibility, but still it is a separate state of page and used for
> different
> purpose.  I think this is a subjective point and we could go either way, it
> is
> just a matter in which way more people are comfortable.

I'm stickin' with what I said before, and what I think Andres is
saying too: renaming the map is a horrible idea.  It produces lots of
code churn for no real benefit.  We usually avoid such changes, and I
think we should do so here, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-16 Thread Kouhei Kaigai

> On Wed, Nov 11, 2015 at 11:13 PM, Kouhei Kaigai  wrote:
> > I agree with we have no reason why only custom-scan is allowed to have
> > serialize/deserialize capability. I can implement an equivalent stuff
> > for foreign-scan also, and it is helpful for extension authors, especially,
> > who tries to implement remote join feature because FDW driver has to
> > keep larger number of private fields than scan.
> >
> > Also, what is the reason why we allow extensions to define a larger
> > structure which contains CustomPath or CustomScanState? It seems to
> > me that these types are not (fully) supported by the current copyfuncs,
> > outfuncs and readfuncs, aren't it?
> > (Although outfuncs.c supports path-nodes, thus CustomPathMethods has
> > TextOut callback but no copy/read handler at this moment.)
> 
> I feel like we're sort of plunging blindly down a path of trying to
> make certain parts of the system extensible and pluggable without
> really having a clear vision of where we want to end up.  Somehow I
> feel like we ought to start by thinking about a framework for *node*
> extensibility; then, within that, we can talk about what that means
> for particular types of nodes.
> 
> For example, suppose do something like this:
> 
> typedef struct
> {
>NodeTag tag;
>char *extnodename;
> } ExtensibleNode;
> 
> typedef stuct
> {
> char *extnodename;
> char *library_name;
> char *function_name;
> Size node_size;
> ExtensibleNode *(*nodeCopy)(ExtensibleNode *);
> bool (*nodeEqual)(ExtensibleNode *, ExtensibleNode *);
> void (*nodeOut)(StringInfo str, ExtensibleNode *);
> ExtensibleNode (*nodeRead)(void);
> } ExtensibleNodeMethods;
> 
> extern void RegisterExtensibleNodeMethods(ExtensibleNodeMethods *);
> extern ExtensibleNodeMethods *GetExtensibleNodeMethods(char *extnodename);
> 
> This provides a generic infrastructure so that we don't have to keep
> inventing the same stuff over and over, rather than coming up with one
> way to do it for ForeignScans and another way for CustomScan and
> another way for CustomScanState.
>
Wow! I love the idea.

I guess we will put a pointer to static ExtensibleNodeMethods structure
on ForeignScan, CustomScan, CustomScanState and etc...

Like:
  typedef struct ForeignScan
  {
  Scanscan;
  Oid fs_server;  /* OID of foreign server */
  List   *fdw_exprs;  /* expressions that FDW may evaluate */
  List   *fdw_private;/* private data for FDW */
  List   *fdw_scan_tlist; /* optional tlist describing scan tuple */
  List   *fdw_recheck_quals;  /* original quals not in scan.plan.qual */
  Bitmapset  *fs_relids;  /* RTIs generated by this scan */
  boolfsSystemCol;/* true if any "system column" is needed */
+ const ExtensibleNodeMethods *methods;
  } ForeignScan;

We may be able to put ExtensibleNodeMethods on the head of CustomScanMethods
structure in case of CustomScan.

Let me write down the steps for deserialization. Please correct me if it is
not what you are thinking.
First, stringToNode picks up a token that shall have node label, like
"SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
tokens belong to CustomScan node, it can make advance the pg_strtok pointer
until "methods" field. The method field is consists of a pair of library_name
and symbol_name, then it tries to load the library and resolve the symbol.
Even if library was not loaded on the worker side, this step allows to ensure
the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.
Next, it looks up the table of extensible nodes by a pair of extnodename,
library_name and symbol_name, to get ExtensibleNodeMethods table in this
process address space.
Once it can get nodeRead() callback, we can continue deserialization of
private fields.

I have two concerns on the above proposition.
1) 'node_size' field implies user defined structure to have fixed length.
   Extension may want to use variable length structure. The example below
   shows GpuJoinState has multiple innerState structure according to the
   number of inner relations joined at once.
   https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240

2) It may be valuable if 'nodeRead(void)' can take an argument of the
   knows/common portion of ForeignScan and etc... It allows extension
   to adjust number of private fields according to the known part.
   For example, I can expect flexible length private fields according
   to the length of custom_subplans list.

How about your thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-11-16 Thread Kyotaro HORIGUCHI

Hello, thank you for many valuable opinions.

I am convinced that bare regular expressions cannot be applied
here:)

At Mon, 16 Nov 2015 18:59:06 +1300, Thomas Munro 
 wrote in

Re: [HACKERS] Rework the way multixact truncations work

2015-11-16 Thread Noah Misch

On Sun, Nov 08, 2015 at 11:59:33AM -0800, Andres Freund wrote:
> On November 8, 2015 11:52:05 AM PST, Noah Misch  wrote:
> >On Sun, Nov 08, 2015 at 11:11:42AM -0800, Andres Freund wrote:
> >> On November 8, 2015 12:54:07 AM PST, Noah Misch  wrote:
> >> 
> >> >I have pushed a stack of branches to
> >> >https://github.com/nmisch/postgresql.git:
> >> >
> >> >mxt0-revert - reverts commits 4f627f8 and aa29c1c
> >> >mxt1-disk-independent - see below
> >> >mxt2-cosmetic - update already-wrong comments and formatting
> >> >mxt3-main - replaces commit 4f627f8
> >> >mxt4-rm-legacy - replaces commit aa29c1c
> >> >
> >> >The plan is to squash each branch into one PostgreSQL commit.  In
> >> >addition to
> >> >examining overall "git diff mxt2-cosmetic mxt3-main", I recommend
> >> >reviewing
> >> >itemized changes and commit log entries in "git log -p --reverse
> >> >--no-merges
> >> >mxt2-cosmetic..mxt3-main".  In particular, when a change involves
> >> >something
> >> >you discussed upthread or was otherwise not obvious, I put a
> >statement
> >> >of
> >> >rationale in the commit log.
> >> 
> >> I'm not following along right now - in order to make cleanups the
> >plan is to revert a couple commits and then redo them prettyfied?
> >
> >Yes, essentially.  Given the volume of updates, this seemed neater than
> >framing those updates as in-tree incremental development.
> 
> I don't like that plan. I don't have a problem doing that in some development 
> branch somewhere, but I fail to see any benefit doing that in 9.5/master. 
> It'll just make the history more convoluted for no benefit.
> 
> I'll obviously still review the changes.

Cleanliness of history is precisely why I did it this way.  If I had framed
the changes as in-tree incremental development, no one "git diff" command
would show the truncation rework or a coherent subset.  To review the whole,
students of this code might resort to a cherry-pick of the repair commits onto
aa29c1c.  That, too, proves dissatisfying; the history would nowhere carry a
finished version of legacy truncation support.  A hacker opting to back-patch
in the future, as commit 4f627f8 contemplated, would need to dig through this
thread for the bits added in mxt3-main and removed in mxt4-rm-legacy.

The benefits may become clearer as you continue to review the branches.

> Even for review it's nor particularly convenient, because now the entirety of 
> the large changes essentially needs to be reviewed anew, given they're not 
> the same. 

Agreed; I optimized for future readers, and I don't doubt this is less
convenient for you and for others already familiar with commits 4f627f8 and
aa29c1c.  I published branches, not squashed patches, mostly because I think
the individual branch commits will facilitate your study of the changes.  I
admit the cost to you remains high.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Konstantin Knizhnik


Hello,

Some time ago at PgConn.Vienna we have proposed eXtensible Transaction 
Manager API (XTM).
The idea is to be able to provide custom implementation of transaction 
managers as standard Postgres extensions,

primary goal is implementation of distritibuted transaction manager.
It should not only support 2PC, but also provide consistent snapshots 
for global transaction executed at different nodes.


Actually, current version of XTM API  propose any particular 2PC model. 
It can be implemented either at coordinator side
(as it is done in our pg_tsdtm  
implementation based on timestamps and not requiring centralized 
arbiter), either by arbiter
(pg_dtm ). In the last case 2PC 
logic is hidden under XTM SetTransactionStatus method:


 bool (*SetTransactionStatus)(TransactionId xid, int nsubxids, 
TransactionId *subxids, XidStatus status, XLogRecPtr lsn);


which encapsulates TransactionIdSetTreeStatus in clog.c.
But you may notice that original TransactionIdSetTreeStatus function is 
void - it is not intended to return anything.
It is called in RecordTransactionCommit in critical section where it is 
not expected that commit may fail.
But in case of DTM transaction may be rejected by arbiter. XTM API 
allows to control access to CLOG, so everybody will see that transaction 
is aborted. But we in any case have to somehow notify client about abort 
of transaction.


We can not just call elog(ERROR,...) in SetTransactionStatus 
implementation because inside critical section it cause Postgres crash 
with panic message. So we have to remember that transaction is rejected 
and report error later after exit from critical section:



/*
 * Now we may update the CLOG, if we wrote a COMMIT record above
 */
if (markXidCommitted) {
committed = TransactionIdCommitTree(xid, nchildren, children);
}
...
/*
 * If we entered a commit critical section, leave it now, and let
 * checkpoints proceed.
 */
if (markXidCommitted)
{
MyPgXact->delayChkpt = false;
END_CRIT_SECTION();
if (!committed) {
CurrentTransactionState->state = TRANS_ABORT;
CurrentTransactionState->blockState = TBLOCK_ABORT_PENDING;
elog(ERROR, "Transaction commit rejected by XTM");
}
}

There is one more problem - at this moment the state of transaction is 
TRANS_COMMIT.
If ERROR handler will try to abort it, then we get yet another fatal 
error: attempt to rollback committed transaction.
So we need to hide the fact that transaction is actually committed in 
local XLOG.


This approach works but looks a little bit like hacker approach. It 
requires not only to replace direct call of TransactionIdSetTreeStatus 
with indirect (though XTM API), but also requires  to make some non 
obvious changes in RecordTransactionCommit.


So what are the alternatives?

1. Move RecordTransactionCommit to XTM. In this case we have to copy 
original RecordTransactionCommit to DTM implementation and patch it 
here. It is also not nice, because it will complicate maintenance of DTM 
implementation.
The primary idea of XTM is to allow development of DTM as standard 
PostgreSQL extension without creating of specific clones of main 
PostgreSQL source tree. But this idea will be compromised if we have 
copy some pieces of PostgreSQL code.
In some sense it is even worser than maintaining separate branch - in 
last case at least we have some way to perfrtom automatic merge.


2. Propose some alternative two-phase commit implementation in 
PostgreSQL core. The main motivation for such "lightweight" 
implementation of 2PC in pg_dtm is that original mechanism of prepared 
transactions in PostgreSQL adds to much overhead.
In our benchmarks we have found that simple credit-debit banking test 
(without any DTM) works almost 10 times slower with PostgreSQL 2PC than 
without it. This is why we try to propose alternative solution (right 
now pg_dtm is 2 times slower than vanilla PostgreSQL, but it not only 
performs 2PC but also provide consistent snapshots).


May be somebody can suggest some other solution?
Or give some comments concerning current approach?

Thank in advance,
Konstantin,
Postgres Professional

[HACKERS] Conversion error of floating point numbers in pl/pgsql

2015-11-16 Thread Kyotaro HORIGUCHI

Hello. I found that 9.5 has an undocumented difference from 9.4
in type cast in pl/pgsql and I think it might better be mentioned
as a change of behavior in release notes.

Whether do you think it is worth mentioning or not in release notes?


=
9.4 and 9.5 has difference in casting between floating point
numbers.

CREATE OR REPLACE FUNCTION hoge () RETURNS text AS $$ DECLARE vr real; vf8 
float8; BEGIN vr := 0.1; vf8 = vr; RETURN 'hoge = '|| vf8 ; END; $$ LANGUAGE 
plpgsql;

9.5=# select hoge();
   hoge   
--
 hoge = 0.10001490116

9.4=# select hoge();
hoge

 hoge = 0.1

This is stemming from the difference between '0.1'::real::float8
and '0.1'::real::text::float8, made in exec_cast_value().

=# select '0.1'::real::float8, '0.1'::real::text::float8;
  float8   | float8 
---+
 0.10001490116 |0.1

This example itself looks somewhat artifitial but it would be
rather common to load real values in a table into float8
variables in a function for further calculations.

This is a side effect of the commit
1345cc67bbb014209714af32b5681b1e11eaf964 and the relase notes has
the following discription corresponds to this commit in the
Migration section, with no mention of this.

> Use assignment cast behavior for data type conversions in
> PL/pgSQL assignments, rather than converting to and from text
> (Tom Lane)
> 
> This change causes conversions of Booleans to strings to produce
> true or false, not t or f. Other type conversions may succeed in
> more cases than before; for example, assigning a numeric value
> 3.9 to an integer variable will now assign 4 rather than
> failing. If no assignment-grade cast is defined for the
> particular source and destination types, PL/pgSQL will fall back
> to its old I/O conversion behavior.

Whether do you think it is worth mentioning or not?

Though the attached patch adds a description for that, it should
be rewritten even if this is worth mentioning.

+ This change also may bring different results of type casts
+ between floating point numbers having different conversion
+ errors.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/release-9.5.sgml b/doc/src/sgml/release-9.5.sgml
index 64057c3..5c86222 100644
--- a/doc/src/sgml/release-9.5.sgml
+++ b/doc/src/sgml/release-9.5.sgml
@@ -108,7 +108,9 @@
   an integer variable will now assign 4 rather than failing.  If no
   assignment-grade cast is defined for the particular source and
   destination types, PL/pgSQL will fall back to its old
-  I/O conversion behavior.
+  I/O conversion behavior. This change also may bring different results of
+  type casts between floating point numbers having different conversion
+  errors.
  
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-11-16 Thread Fabien COELHO




Hmmm...

Maybe I'm a little bit too optimistic here, because it seems that I'm 
suggesting to create a dead lock if the checkpointer has both buffers to 
flush in waiting and wishes to close the very same file that holds them.


So on wanting to close the file the checkpointer should rather flushes the 
outstanding flushes in wait and then close the fd, which suggest some global 
variable to hold flush context so that this can be done.


Hmmm.


On third (fourth, fifth:-) thoughts:

The vfd (virtual file descriptor?) structure in the checkpointer could 
keep a pointer to the current flush if it concerns this fd, so that if it 
decides to close if while there is a write in progress (I'm still baffled 
at why and when the checkpointer process would take such a decision, maybe 
while responding to some signals, because it seems that there is no such 
event in the checkpointer loop itself...) then on close the process could 
flush before close, or just close which probably would induce flushing, 
but at least cleanup the structure so that the the closed fd would not be 
flushed after being closed and result in an error.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Skip ALTER x SET SCHEMA if the schema didn't change

2015-11-16 Thread Marti Raudsepp

Hi Haribabu Kommi

Thank you so much for the review and patch update. I should have done that
myself, but I've been really busy for the last few weeks. :(

Regards,
Marti

On Mon, Nov 16, 2015 at 4:46 AM, Haribabu Kommi 
wrote:

> On Thu, Nov 5, 2015 at 10:20 PM, Haribabu Kommi
>  wrote:
> > On Tue, Sep 29, 2015 at 12:17 AM, Marti Raudsepp 
> wrote:
> >> Hi list
> >>
> >> The attached patch changes the behavior of multiple ALTER x SET SCHEMA
> >> commands, to skip, rather than fail, when the old and new schema is
> >> the same.
> >>
> >> The advantage is that it's now easier to write DDL scripts that are
> indempotent.
> >>
> >> This already matches the behavior of ALTER EXTENSION SET SCHEMA in
> >> earlier versions, as well as many other SET-ish commands, e.g. ALTER
> >> TABLE SET TABLESPACE, OWNER TO, CLUSTER ON, SET (storage_parameter...)
> >> etc. I don't see why SET SCHEMA should be treated any differently.
> >>
> >> The code is written such that object_access_hook is still called for
> >> each object.
> >>
> >> Regression tests included. I couldn't find any documentation that
> >> needs changing.
> >
> > I went through the patch, following are my observations,
> >
> > Patch applied with hunks and compiled with out warnings.
> > Basic tests are passed.
> >
> > In AlterTableNamespaceInternal function, if a table or matview called
> > for set schema,
> > If the object contains any constraints, the constraint gets updated
> > with new schema.
> >
> > In AlterTypeNamespaceInternal function, the InvokeObjectPostAlterHook
> function
> > doesn't get called if the type is of composite type, domain and array
> > types as because
> > it just returns from top of the function.
>
> Most of the community members didn't find any problem in changing the
> behavior, so here I attached updated patch with the above two corrections.
>
> Regards,
> Hari Babu
> Fujitsu Australia
>

Re: [HACKERS] Getting sorted data from foreign server for merge join

2015-11-16 Thread Ashutosh Bapat

On Mon, Nov 9, 2015 at 9:39 PM, Robert Haas  wrote:

> On Fri, Nov 6, 2015 at 2:03 PM, Kevin Grittner  wrote:
> > Has anyone taken a close look at what happens if the two sides of
> > the merge join have different implementations of the same collation
> > name?  Is there anything we should do to defend against the
> > problem?
>
> The issue of FDWs vs. collations has been thought about to some degree
> (see FDW_COLLATE_NONE/SAFE/UNSAFE), but I don't quite understand that
> stuff in detail> .
> >
>

collations arising from a foreign table's var are considered to be safer
(FDW_COLLATE_SAFE) to push down to the foreign server , since they are
either local default collation or are assumed to be same on foreign and
local server as per user declaration. The onus of making that sure is on
the user who declares particular collation for a foreign table var. As said
upthread, different glibc implementations can cause collation ordering
mismatch, this patch will be susceptible to the same problem as
master/standby problem.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Simon Riggs

On 15 November 2015 at 14:50, Robert Haas  wrote:

> On Sun, Nov 15, 2015 at 5:41 AM, Simon Riggs 
> wrote:
> > Hmm, if that's where we're at, I'll summarize my thoughts.
> >
> > All of this discussion presupposes we are distributing/load balancing
> > queries so that reads and writes might occur on different nodes.
>
> Agreed.  I think that's a pretty common pattern, though certainly not
> the only one.
>

It looks to me this functionality is only of use in a pooler. Please
explain how else this would be used.

> > Your option (2) is wider but also worse in some ways. It can be
> implemented
> > in a pooler.
> >
> > Your option (3) doesn't excite me much. You've got a load of stuff that
> > really should happen in a pooler. And at its core we have
> synchronous_commit
> > = apply but with a timeout rather than a wait.
>
> I don't see how either option (2) or option (3) could be implemented
> in a pooler.  How would that work?
>

My starting thought was that (1) was the only way forwards. Through
discussion, I now see that its not the best solution for the general case.

The pooler knows which statements are reads and writes, it also knows about
transaction boundaries, so it is possible for it to perform the waits for
either (2) or (3). The pooler *needs* to know which nodes it can route
queries to, so it looks to me that the pooler is the best place to put
waits and track status of nodes, no matter when we wait. I don't see any
benefit in having other nodes keep track of node status since that will
just replicate work that *must* be performed in the pooler.

I would like to see a load balancing pooler in Postgres.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Simon Riggs

On 16 November 2015 at 11:01, Craig Ringer  wrote:

> On 16 November 2015 at 18:44, Simon Riggs  wrote:
>
>>
>> The pooler knows which statements are reads and writes
>>
>
> I think that's an iffy assumption.
>

It's not an assumption, its a requirement. If it can't do this in some
manner then you can't use a load balancing pooler.

Randomly submitting things works as well, since it leads to a write error
when you try to write data on a read only server, so you do then learn
whether it is a read or a write. Once you know its a write, you submit to
master. But you still need to be careful of other effects, so that isn't
recommended.

It's one we tend to make because otherwise read/write pooling won't work,
> but in PostgreSQL there's really no way to know when calling a function.
>
> What does
>
> SELECT get_user_stats()
>
> do? The pooler has _no_ _idea_ unless manually configured with knowledge
> about particular user defined functions.
>
> In the absence of such knowledge it can:
>
> - send the work to a replica and report the ERROR to the user if it fails
> due to an attempted write;
> - send the work to a replica, capture an ERROR due to attempted write, and
> retry on the master;
> - send everything it's not sure about to the master
>
> If a pooler had insight into the catalogs and if we had readonly /
> readwrite attributes on functions, it could be smarter.
>

pgpool supports white/black function listing for exactly this reason.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Getting sorted data from foreign server for merge join

2015-11-16 Thread Craig Ringer

On 16 November 2015 at 17:47, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:
>
>
> collations arising from a foreign table's var are considered to be safer
> (FDW_COLLATE_SAFE) to push down to the foreign server , since they are
> either local default collation or are assumed to be same on foreign and
> local server as per user declaration. The onus of making that sure is on
> the user who declares particular collation for a foreign table var. As said
> upthread, different glibc implementations can cause collation ordering
> mismatch, this patch will be susceptible to the same problem as
> master/standby problem.
>
>
Yeah. It's less bad than the related problems we already have:

* Upgrading glibc can cause your indexes to no longer match your local
collations
* Different glibc versions on master and replica(s) can have the same effect

I don't see a problem with adding another way this same issue can be
expressed, since there's no sane way to fix it _properly_ without either
versioned collations in glibc or bringing collation onboard into Pg.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Craig Ringer

On 16 November 2015 at 18:44, Simon Riggs  wrote:

>
> The pooler knows which statements are reads and writes
>

I think that's an iffy assumption. It's one we tend to make because
otherwise read/write pooling won't work, but in PostgreSQL there's really
no way to know when calling a function.

What does

SELECT get_user_stats()

do? The pooler has _no_ _idea_ unless manually configured with knowledge
about particular user defined functions.

In the absence of such knowledge it can:

- send the work to a replica and report the ERROR to the user if it fails
due to an attempted write;
- send the work to a replica, capture an ERROR due to attempted write, and
retry on the master;
- send everything it's not sure about to the master

If a pooler had insight into the catalogs and if we had readonly /
readwrite attributes on functions, it could be smarter.

> I would like to see a load balancing pooler in Postgres.
>

Given the number of times I say "no, no, don't raise max_connections to
2000 to solve your performance problems, lower it to around 100 and put
pgbouncer in front if your application doesn't support connection pooling
internally"  yes!

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Atri Sharma

> I think the general idea is that if Commit is WAL logged, then the
> operation is considered to committed on local node and commit should
> happen on any node, only once prepare from all nodes is successful.
> And after that transaction is not supposed to abort.  But I think you are
> trying to optimize the DTM in some way to not follow that kind of
protocol.
> By the way, how will arbiter does the recovery in a scenario where it
> crashes, won't it need to contact all nodes for the status of in-progress
or
> prepared transactions?
> I think it would be better if more detailed design of DTM with respect to
> transaction management and recovery could be updated on wiki for having
> discussion on this topic.  I have seen that you have already updated many
> details of the system, but still the complete picture of DTM is not clear.

I agree.

I have not been following this discussion but from what I have read above I
think the recovery model in this design is broken. You have to follow some
protocol, whichever you choose.

I think you can try using something like Paxos,  if you are looking at a
higher reliable model but don't want the overhead of 3PC.

Re: [HACKERS] [PATCH] SQL function to report log message

2015-11-16 Thread Jim Nasby


On 11/15/15 10:56 PM, dinesh kumar wrote:

So, shall we make this pg_report_log TO pg_write_log OR pg_ereport OR
 from you.


Why not pg_raise to mirror plpgsql? (The function does have the same 
semantics, right? It's not doing something like only sending to the log 
and not the client?)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-11-16 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 16 Nov 2015 12:19:23 -0300, Alvaro Herrera  
wrote in <20151116151923.GX614468@alvherre.pgsql>
> Thomas Munro wrote:
> > New version attached, merging recent changes.
> 
> I wonder about the TailMatches and Matches macros --- wouldn't it be
> better to have a single one, renaming TailMatches to Matches and
> replacing the current Matches() with an initial token that corresponds
> to anchoring to start of command?  Just wondering, not terribly attached
> to the idea.

Does it looks like this?

> if (Match(BOL, "ALTER", "TABLE", EOL)) ...

It would be doable giving special meaning to
word_matches(BOL/EOL, *_wd). And I give +1 to that than having so
many similar macros.

The following is my delusion..

It could develop to some mini-laguages like the following, which
is a kind of subset of regular expressions, that is powerful than
current mechanism but not meesier than regular expressions by
narrowing its functionarity. Addition to that the custom minilang
could have specilized functionarity for matching SQL statements.

> if (Match("^ALTER TABLE \id$"))...

It would be nice to have tokens to match to optional words.

> if (Match("^ALTER TABLE(IF EXISTS) \id$"))...
> if (Match("CREATE(OR REPLACE)FUNCTION \id (\CSL)$")

Mmm. this might be another kind of complexity?  This is also
accomplished by multiple matching descriptions.

> if (Match("^,ALTER,TABLE,\id,$") ||
> Match("^,ALTER,TABLE,IF,EXISTS,\id,$"))...

Interpreting this kind of mini-language into regular expressions
could be doable..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing COPY .. WITH PERMISSIVE

2015-11-16 Thread dinesh kumar

On Tue, Nov 17, 2015 at 1:02 AM, Michael Paquier 
wrote:

> On Mon, Nov 16, 2015 at 11:27 AM, Michael Paquier wrote:
> > In short, it seems that this patch is better rejected. And I am
> > planning to do so if there are no complaints.
>
> OK, done so in the CF app.
> Dinesh, please do not be discouraged by that. Everybody has patches
> rejected, and I know a bit of it :)
>

Sure Michael :-)

Thanks.

> --
> Michael
>



-- 

Regards,
Dinesh
manojadinesh.blogspot.com

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Amit Kapila

On Tue, Nov 17, 2015 at 12:12 PM, konstantin knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Thank you for your response.
>
>
> On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
>
> I'm not entirely clear on what you're saying here.  I admit I've
> not kept in close touch with the distributed processing discussions
> lately -- is there a write-up and/or diagram to give an overview of
> where we're at with this effort?
>
>
> https://wiki.postgresql.org/wiki/DTM
>
>
> If you are saying that DTM tries to roll back a transaction after
> any participating server has entered the RecordTransactionCommit()
> critical section, then IMO it is broken.  Full stop.  That can't
> work with any reasonable semantics as far as I can see.
>
>
> DTM is not trying to rollback committed transaction.
> What he tries to do is to hide this commit.
> As I already wrote, the idea was to implement "lightweight" 2PC because
> prepared transactions mechanism in PostgreSQL adds too much overhead and
> cause soe problems with recovery.
>
> The transaction is normally committed in xlog, so that it can always be
> recovered in case of node fault.
> But before setting correspondent bit(s) in CLOG and releasing locks we
> first contact arbiter to get global status of transaction.
> If it is successfully locally committed by all nodes, then arbiter
> approves commit and commit of transaction normally completed.
> Otherwise arbiter rejects commit. In this case DTM marks transaction as
> aborted in CLOG and returns error to the client.
> XLOG is not changed and in case of failure PostgreSQL will try to replay
> this transaction.
> But during recovery it also tries to restore transaction status in CLOG.
> And at this placeDTM contacts arbiter to know status of transaction.
>

I think the general idea is that if Commit is WAL logged, then the
operation is considered to committed on local node and commit should
happen on any node, only once prepare from all nodes is successful.
And after that transaction is not supposed to abort.  But I think you are
trying to optimize the DTM in some way to not follow that kind of protocol.
By the way, how will arbiter does the recovery in a scenario where it
crashes, won't it need to contact all nodes for the status of in-progress or
prepared transactions?
I think it would be better if more detailed design of DTM with respect to
transaction management and recovery could be updated on wiki for having
discussion on this topic.  I have seen that you have already updated many
details of the system, but still the complete picture of DTM is not clear.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread konstantin knizhnik

Thank you for your response.

On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
> I'm not entirely clear on what you're saying here.  I admit I've
> not kept in close touch with the distributed processing discussions
> lately -- is there a write-up and/or diagram to give an overview of
> where we're at with this effort?

https://wiki.postgresql.org/wiki/DTM

> 
> If you are saying that DTM tries to roll back a transaction after
> any participating server has entered the RecordTransactionCommit()
> critical section, then IMO it is broken.  Full stop.  That can't
> work with any reasonable semantics as far as I can see.

DTM is not trying to rollback committed transaction.
What he tries to do is to hide this commit.
As I already wrote, the idea was to implement "lightweight" 2PC because 
prepared transactions mechanism in PostgreSQL adds too much overhead and cause 
soe problems with recovery.

The transaction is normally committed in xlog, so that it can always be 
recovered in case of node fault.
But before setting correspondent bit(s) in CLOG and releasing locks we first 
contact arbiter to get global status of transaction.
If it is successfully locally committed by all nodes, then arbiter approves 
commit and commit of transaction normally completed.
Otherwise arbiter rejects commit. In this case DTM marks transaction as aborted 
in CLOG and returns error to the client.
XLOG is not changed and in case of failure PostgreSQL will try to replay this 
transaction.
But during recovery it also tries to restore transaction status in CLOG.
And at this placeDTM contacts arbiter to know status of transaction.
If it is marked as aborted in arbiter's CLOG, then it wiull be also marked as 
aborted in local CLOG.
And according to PostgreSQL visibility rules no other transaction will see 
changes made by this transaction.

> 
>> We can not just call elog(ERROR,...) in SetTransactionStatus
>>   implementation because inside critical section it cause Postgres
>>   crash with panic message. So we have to remember that transaction is
>>   rejected and report error later after exit from critical section:
> 
> I don't believe that is a good plan.  You should not enter the
> critical section for recording that a commit is complete until all
> the work for the commit is done except for telling the all the
> servers that all servers are ready.

It is good point. 
May be it is the reason of performance scalability problems we have noticed 
with DTM.

>> In our benchmarks we have found that simple credit-debit banking
>>   test (without any DTM) works almost 10 times slower with PostgreSQL
>>   2PC than without it. This is why we try to propose alternative
>>   solution (right now pg_dtm is 2 times slower than vanilla
>>   PostgreSQL, but it not only performs 2PC but also provide consistent
>>   snapshots).
> 
> Are you talking about 10x the latency on a commit, or that the
> overall throughput under saturation load is one tenth of running
> without something to guarantee the transactional integrity of the
> whole set of nodes?  The former would not be too surprising, while
> the latter would be rather amazing.

Sorry, some clarification.
We get 10x slowdown of performance caused by 2pc on very heavy load on the IBM 
system with 256 cores.
At "normal" servers slowdown of 2pc is smaller - about 2x.

> 
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-16 Thread Jim Nasby


On 11/12/15 1:11 PM, Thomas Munro wrote:

It's true that a pooling system/middleware could spy on your sessions
and insert causality token handling imposing a global ordering of
visibility for you, so that naive users don't have to deal with them.
Whenever it sees a COMMIT result (assuming they are taught to return
LSNs), it could update a highest-LSN-seen variable, and transparently
insert a wait for that LSN into every transaction that it sees
beginning.  But then you would have to push all your queries through a
single point that can see everything across all Postgres servers, and
maintain this global high LSN.


I think that depends on what you're doing. Frequently you don't care 
about anyone elses writes, just your own. In that case, there's no need 
for a shared connection pooler, you just have to come back to the same one.


There's also a 4th option: until a commit has made it out to some number 
of slaves, re-direct all reads from a session back to the master. That 
might sound horrible for master performance, but in reality I think it'd 
normally be fine. Generally, you only care about this when you're going 
to read data that you've just written, which means the data's still in 
shared buffers.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

71 matches

Mail list logo