date:20150226

Hello, I measured the performance of this patch considering
markpos/restorepos. The loss seems to be up to about 10%
unfortunately.

At Thu, 26 Feb 2015 17:49:23 + (UTC), Kevin Grittner kgri...@ymail.com 
wrote in 440831854.629116.1424972963735.javamail.ya...@mail.yahoo.com
 Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/15/2015 02:19 AM, 
 Kevin Grittner wrote:
  Hmm. Did your test case actually exercise mark/restore? The memcpy()s 
  are not that small. Especially if it's an index-only scan, you're 
  copying a large portion of the page. Some scans call markpos on every 
  tuple, so that could add up.
 
 
 I have results from the `make world` regression tests and a 48-hour
 customer test.  Unfortunately I don't know how heavily either of those
 exercise this code.  Do you have a suggestion for a way to test whether
 there is a serious regression for something approaching a worst case?

ammarkpos/amrestrpos are called in merge joins. By the steps
shown below, I had 1M times of markpos and no restorepos for 1M
result rows, and had 500k times of markpos and the same number of
times of restorepos for 2M rows result by a bit different
configuration. I suppose we can say that they are the worst case
considering maskpos/restrpos. The call counts can be taken using
the attached patch.

Both binaries ware compiled with -O2. shared_buffers=1GB and all
shared pages used in the query were hit when measuring.

The numbers were taken 12 times for each cofiguration and took
averages and stddevs of 10 numbers excluding best and worst.

Case 1. 500k markpos/restorepos for 2M result rows.

Index scan: The patched loses about 2%. (1.98%)
  master:  6166 ms (std dev = 3.2 ms)
  Patched: 6288 ms (std dev = 3.7 ms)

IndesOnly scan: The patches loses about 2%. (2.14%)
  master:  5946 ms (std dev = 5.0 ms)
  Patched: 6073 ms (std dev = 3.3 ms)

The patched version is slower by about 2%. Of course all of it is
not the effect of memcpy but I don't know the distribution.


Case 2. 1M markpos, no restorepos for 1M result rows.

IndesOnly scan: The patches loses about 10%.
  master:  3655 ms (std dev = 2.5 ms)
  Patched: 4038 ms (std dev = 2.6 ms)

The loss is about 10%. The case looks a bit impractical but
unfortunately the number might be unignorable. The distribution
of the loss is unknown, too.


regards,

===
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
delete from t1;
delete from t2;
-- This causes 1M times of markpos and no restorepos
INSERT INTO t1 (SELECT a FROM generate_series(0, 99) a);
INSERT INTO t2 (SELECT a FROM generate_series(0, 99) a);
-- This causes 500k times of markpos and the same number of restorepos
-- INSERT INTO t1 (SELECT a/2 FROM generate_series(0, 99) a);
-- INSERT INTO t2 (SELECT a/2 FROM generate_series(0, 99) a);
CREATE INDEX it1 ON t1 (a);
CREATE INDEX it2 ON t2 (a);
ANALYZE t1;
ANALYZE t1;
SET enable_seqscan to false;
SET enable_material to false;
SET enable_hashjoin to false;
SET enable_nestloop to false;
SET enable_indexonlyscan to false; -- omit this to do indexonly scan

EXPLAIN (ANALYZE) SELECT t1.a, t2.a FROM t1 JOIN t2 on (t1.a = t2.a);


  QUERY PLAN 
--
 Merge Join  (cost=2.83..322381.82 rows=3031231 width=8) (actual 
time=0.013..5193.566 rows=200 loops=1)
   Merge Cond: (t1.a = t2.a)
   -  Index Scan using it1 on t1  (cost=0.43..65681.87 rows=167 width=4) 
(actual time=0.004..727.557 rows=100
oops=1)
   -  Index Scan using it2 on t2  (cost=0.43..214642.89 rows=3031231 width=4) 
(actual time=0.004..1478.361 rows=1
 loops=1)
 Planning time: 25.585 ms
 Execution time: 6299.521 ms
(6 rows)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 3af462b..04d6cec 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -543,15 +543,19 @@ btendscan(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+
 /*
  *	btmarkpos() -- save current scan position
  */
+int hoge_markpos_count = 0;
+int hoge_restrpos_count = 0;
 Datum
 btmarkpos(PG_FUNCTION_ARGS)
 {
 	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
 	BTScanOpaque so = (BTScanOpaque) scan-opaque;
 
+	hoge_markpos_count++;
 	/* we aren't holding any read locks, but gotta drop the pin */
 	if (BTScanPosIsValid(so-markPos))
 	{
@@ -585,7 +589,7 @@ btrestrpos(PG_FUNCTION_ARGS)
 {
 	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
 	BTScanOpaque so = (BTScanOpaque) scan-opaque;
-
+	hoge_restrpos_count++;
 	/* Restore the marked positions of any array keys */
 	if (so-numArrayKeys)
 		_bt_restore_array_keys(scan);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 33b172b..e7c1b99 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -258,14 +258,19 @@

Re: [HACKERS] Reduce pinning in btree indexes

2015-02-26 Thread Andrew Gierth

 Kyotaro == Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:

 Kyotaro ammarkpos/amrestrpos are called in merge joins. By the steps
 Kyotaro shown below, I had 1M times of markpos and no restorepos for
 Kyotaro 1M result rows, and had 500k times of markpos and the same
 Kyotaro number of times of restorepos for 2M rows result by a bit
 Kyotaro different configuration. I suppose we can say that they are
 Kyotaro the worst case considering maskpos/restrpos. The call counts
 Kyotaro can be taken using the attached patch.

You might want to try running the same test, but after patching
ExecSupportsMarkRestore to return false for index scans. This will cause
the planner to insert a Materialize node to handle the mark/restore.

This way, you can get an idea of how much gain (if any) the direct
support of mark/restore in the scan is actually providing.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-02-26 Thread David Rowley

On 3 February 2015 at 22:23, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hi, I had a look on this patch. Although I haven't understood
 whole the stuff and all of the related things, I will comment as
 possible.


Great, thank you for taking the time to look and review the patch.


 Performance:

 I looked on the performance gain this patch gives. For several
 on-memory joins, I had gains about 3% for merge join, 5% for hash
 join, and 10% for nestloop (@CentOS6), for simple 1-level joins
 with aggregation similar to what you mentioned in previous
 mail like this.

 =# SELECT count(*) FROM t1 JOIN t2 USING (id);

  Of course, the gain would be trivial when returning many tuples,
 or scans go to disk.


That's true, but joins where rows are filtered after the join condition
will win here too:

For example select * from t1 inner join t2 on t1.id=t2.id where t1.value 
t2.value;

Also queries with a GROUP BY clause. (Since grouping is always performed
after the join :-( )

I haven't measured the loss by additional computation when the
 query is regarded as not a single join.


I think this would be hard to measure, but likely if it is measurable then
you'd want to look at just planning time, rather than planning and
execution time.



 Explain representation:

 I don't like that the 'Unique Join: occupies their own lines in
 the result of explain, moreover, it doesn't show the meaning
 clearly. What about the representation like the following? Or,
 It might not be necessary to appear there.

Nested Loop ...
  Output: 
  -   Unique Jion: Yes
- Seq Scan on public.t2 (cost = ...
  - - Seq Scan on public.t1 (cost = 
  + - Seq Scan on public.t1 (unique) (cost = 



Yeah I'm not too big a fan of this either. I did at one evolution of the
patch I had Unique Left Join in the join node's line in the explain
output, but I hated that more and changed it just to be just in the VERBOSE
output, and after I did that I didn't want to change the join node's line
only when in verbose mode.  I do quite like that it's a separate item for
the XML and JSON explain output. That's perhaps quite useful when the
explain output must be processed by software.

I'm totally open for better ideas on names, but I just don't have any at
the moment.


 Coding:

 The style looks OK. Could applied on master.
 It looks to work as you expected for some cases.

 Random comments follow.

 - Looking specialjoin_is_unique_join, you seem to assume that
   !delay_upper_joins is a sufficient condition for not being
   unique join.  The former simplly indicates that don't
   commute with upper OJs and the latter seems to indicate that
   the RHS is unique, Could you tell me what kind of
   relationship is there between them?


The rationalisation around that are from the (now changed) version of
join_is_removable(), where the code read:

/*
 * Must be a non-delaying left join to a single baserel, else we aren't
 * going to be able to do anything with it.
 */
if (sjinfo-jointype != JOIN_LEFT ||
sjinfo-delay_upper_joins)
return false;

I have to admit that I didn't go and investigate why delayed upper joins
cannot be removed by left join removal code, I really just assumed that
we're just unable to prove that a join to such a relation won't match more
than one outer side's row. I kept this to maintain that behaviour as I
assumed it was there for a good reason.




 - The naming unique join seems a bit obscure for me, but I
   don't have no better idea:( However, the member name
   has_unique_join seems to be better to be is_unique_join.


Yeah, I agree with this. I just can't find something short enough that
means based on the join condition, the inner side of the join will never
produce more than 1 row for any single outer row. Unique join was the best
I came up with. I'm totally open for better ideas.

I agree that is_unique_join is better than has_unique_join. I must have
just copied the has_eclass_joins struct member without thinking too hard
about it. I've now changed this in my local copy of the patch.

- eclassjoin_is_unique_join involves seemingly semi-exhaustive
   scan on ec members for every element in joinlist. Even if not,
   it might be thought to be a bit too large for the gain. Could
   you do the equivelant things by more small code?


I'd imagine some very complex queries could have many equivalence classes
and members, though I hadn't really thought that this number would be so
great that processing here would suffer much. There's quite a few fast
paths out, like the test to ensure that both rels are mentioned
in ec_relids. Though for a query which is so complex to have a great number
of eclass' and members, likely there would be quite a few relations
involved and planning would be slower anyway. I'm not quite sure how else I
could write this and still find the unique join cases each time. We can't
really just give up half way through looking.

Thanks again for the review.

Regards

Re: [HACKERS] Reduce pinning in btree indexes

Hi,

At Fri, 27 Feb 2015 05:56:18 +, Andrew Gierth and...@tao11.riddles.org.uk 
wrote in 874mq77vuu@news-spur.riddles.org.uk
  Kyotaro == Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 
  Kyotaro ammarkpos/amrestrpos are called in merge joins. By the steps
  Kyotaro shown below, I had 1M times of markpos and no restorepos for
  Kyotaro 1M result rows, and had 500k times of markpos and the same
  Kyotaro number of times of restorepos for 2M rows result by a bit
  Kyotaro different configuration. I suppose we can say that they are
  Kyotaro the worst case considering maskpos/restrpos. The call counts
  Kyotaro can be taken using the attached patch.
 
 You might want to try running the same test, but after patching
 ExecSupportsMarkRestore to return false for index scans. This will cause
 the planner to insert a Materialize node to handle the mark/restore.

Mmm? The patch bt-nopin-v1.patch seems not contain the change for
ExecSupportMarkRestore and the very simple function remain
looking to return true for T_Index(Only)Scan after the patch
applied.

Explain shows that the plan does not involve materializing step
and addition to it, the counters I put into both btmarkpos() and
btrestrpos() showed that they are actually called during the
execution, like this, for both unpatched and patched.

| LOG:  markpos=100, restrpos=0
| STATEMENT:  EXPLAIN (ANALYZE,BUFFERS) SELECT t1.a, t2.a FROM t1 JOIN t2 on 
(t1.a = t2.a);

To make sure, I looked into btmarkpos and btrestrpos to confirm
that they really does the memcpy() we're talking about, then
recompiled whole the source tree.

 This way, you can get an idea of how much gain (if any) the direct
 support of mark/restore in the scan is actually providing.

Does the scan mean T_Material node? But no material in plan and
counters in bt*pos were incremented in fact as mentioned above..

Could you point out any other possible mistake in my thought?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized

On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote:
 On 02/20/2015 05:21 PM, Andres Freund wrote:
 There's one bit that I'm not so sure about though: To avoid duplication
 I've added Parse(Commit/Abort)Record(), but unfortunately that has to be
 available both in front and backend code - so it's currently living in
 xactdesc.c. I think we can live with that, but it's certainly not
 pretty.

 Yeah, that's ugly. Why does frontend code need that? The old format
 isn't exactly trivial for frontend code to decode either.

 pg_xlogdump outputs subxacts and such; I don't forsee other
 usages. Sure, we could copy the code around, but I think that's worse
 than having it in xactdesc.c. Needs a comment explaining why it's there
 if I haven't added one already.

FWIW, I think they would live better in frontend code for client applications.

That's a nice patch. +1 for merging them. Here are a couple of comments:

+/* Parse the WAL format of a xact abort into a easier to understand format. */
+void
+ParseCommitRecord(uint8 info, xl_xact_commit *xlrec,
xl_xact_parsed_commit *parsed)
I think that you mean here of an xact commit, not abort.

+ * Emit, but don't insert, a abort record.
s/a abort/an abort/
XactEmitAbortRecord has some problems with tabs replaced by 4 spaces
at a couple of places.

+/* free opcode 0x70 */
+
+#define XLOG_XACT_OPMASK   0x70
There is a contradiction here, 0x70 is not free.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduce pinning in btree indexes

2015-02-26 Thread Andrew Gierth

 Kyotaro == Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:

  You might want to try running the same test, but after patching
  ExecSupportsMarkRestore to return false for index scans. This will
  cause the planner to insert a Materialize node to handle the
  mark/restore.

 Kyotaro Mmm? The patch bt-nopin-v1.patch seems not contain the change
 Kyotaro for ExecSupportMarkRestore and the very simple function remain
 Kyotaro looking to return true for T_Index(Only)Scan after the patch
 Kyotaro applied.

Right. I'm suggesting you change that, in order to determine what
performance cost, if any, would result from abandoning the idea of doing
mark/restore entirely.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-02-26 Thread Stefan Kaltenbrunner

On 02/26/2015 01:59 PM, Michael Paquier wrote:
 On Thu, Feb 26, 2015 at 9:15 PM, Asif Naeem anaeem...@gmail.com wrote:
 This thread seems relevant, Please guide me to how can access older CF pages
 The MSVC portion of this fix got completely lost in the void:
 https://commitfest.postgresql.org/action/patch_view?id=1330

 Above link result in the following error i.e.

 Not found
 The specified URL was not found.

 Please do let me know if I missed something. Thanks.
 
 Try commitfest-old instead, that is where the past CF app stores its
 data, like that:
 https://commitfest-old.postgresql.org/action/patch_view?id=1330

hmm maybe we should have some sort of handler the redirects/reverse
proxies to the old commitfest app for this.



Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

2015-02-26 Thread Andrew Gierth

 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom We've discussed doing $SUBJECT off and on for nearly ten years,
 Tom the oldest thread I could find about it being here:
 Tom 
http://www.postgresql.org/message-id/flat/1186435268.16321.37.ca...@dell.linuxdev.us.dell.com
 Tom It's come up again every time we found another leak of dead child
 Tom contexts, which happened twice last year for example.  And that
 Tom patch I'm pushing for expanded out-of-line objects really needs
 Tom this to become the default behavior anywhere that we can detoast
 Tom objects.

So, this is also changing (indirectly) the effect of ReScanExprContext
so that deletes child contexts too. In the grouping sets work I noticed
I had to explicitly add some MemoryContextDeleteChildren after
ReScanExprContext in order to clean up properly; this obviously makes
that redundant.

(that looks like another data point in favour of this change)

I guess the only question is whether anything currently relies on
ReScanExprContext's current behavior.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-02-26 Thread Haribabu Kommi

On Sat, Feb 7, 2015 at 8:26 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 I am sending a review of this patch.

Thanks for the review. sorry for the delay.

 4. Regress tests

 test rules... FAILED  -- missing info about new  view

Thanks. Corrected.

 My objections:

 1. data type for database field should be array of name or array of text.

 When name contains a comma, then this comma is not escaped

 currently: {omega,my stupid extremly, name2,my stupid name}

 expected: {my stupid name,omega,my stupid extremly, name2}

 Same issue I see in options field

Changed including the user column also.

 2. Reload is not enough for content refresh - logout is necessary

 I understand, why it is, but it is not very friendly, and can be very
 stressful. It should to work with last reloaded data.

At present the view data is populated from the variable parsed_hba_lines
which contains the already parsed lines of pg_hba.conf file.

During the reload, the hba entries are reloaded only in the postmaster process
and not in every backend, because of this reason the reloaded data is
not visible until
the session is disconnected and connected.

Instead of changing the SIGHUP behavior in the backend to load the hba entries
also, whenever the call is made to the view, the pg_hba.conf file is
parsed to show
the latest reloaded data in the view. This way it doesn't impact the
existing behavior
but it may impact the performance because of file load into memory every time.

Updated patch is attached. comments?

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views

2015-02-26 Thread Eric Grinstein

Thank you for your answers.
I would be very interested in tracking the staleness of the MV.
You see, I work in a research group in database tuning, and we have
implemented some solutions to take advantage of MV's and speed up queries.
The query rewrite feature would be extremely desirable for us.
Do you think that implementing the staleness check as suggested by Thomas
could get us started in the query rewrite business? Do you think I should
make a proposal
or there are more interesting subjects to GSoC? I'd be happy to hear
project suggestions, especially
related to the optimizer, tuning, etc.

Eric

2015-02-20 22:35 GMT-02:00 Tomas Vondra tomas.von...@2ndquadrant.com:

 On 21.2.2015 00:20, Kevin Grittner wrote:
  Tomas Vondra tomas.von...@2ndquadrant.com wrote:
 
  I share the view that this would be very valuable, but the scope
  far exceeds what can be done within a single GSoC project. But
  maybe we could split that into multiple pieces, and Eric would
  implement only the first piece?
 
  For example the 'is_stale' flag for a MV would be really useful,
  making it possible to refresh only the MVs that actually need a
  refresh.
 
  You may be on to something there.  Frankly, though, I'm not sure
  that we could even reach consensus within the community on a
  detailed design for how we intend to track staleness (that will
  hold up both now and once we have incremental maintenance of
  materialized views working) within the time frame of a GSoC
  project.  This would need to be done with an eye toward how it
  might be used in direct references (will we allow a staleness
  limit on a reference from a query?), for use in a rewrite, and how
  it will interact with changes to base tables and with both REFRESH
  statements and incremental maintenance at various levels of
  eagerness.  I'm not sure that staleness management wouldn't be
  better left until we have some of those other parts for it to work
  with.

 Doing that properly is going to be nontrivial, no doubt about that. I
 was thinking about keeping a simple list of updated tables (oids) and
 then at commit time, deciding which MVs to depend on that and setting
 some sort of flag (or XID) for all those MVs. But maybe there's a better
 way.

  Questions to consider:
 
  Some other products allow materialized views to be partitioned and
  staleness to be tracked by partition, and will check which partitions
  will be accessed in determining staleness. Is that something we want
  to allow for?

 I think we need to get this working for simple MVs, especially because
 we don't have partitioned MVs (or the type of declarative partitioning
 the other products do have).

  Once we have incremental maintenance, an MV maintained in an eager
  fashion (changes are visible in the MV as soon as the transaction
  modifying the underlying table commit) could be accessed with a MVCC
  snapshots, with different snapshots seeing different versions. It
  seems pretty clear that such an MV would always be considered
  fresh, so there would be no need to constantly flipping to stale
  and back again as the underlying table were changed and the changes
  were reflected in the MV. How do we handle that?

 Yes, incrementally updated MVs might be used more easily, without
 tracking staleness. But we don't have that now, and it's going to take a
 significant amount of time to get there.

 Also, not all MVs can be updated incrementally, so either we allow only
 simple MVs to be used for rewrites, or we'll have to implement the
 'stale' flag anyway.

  If changes to an MV are less eager (they are queued for application
  after COMMIT, as time permits) would we want to track the xid of how
  far along they are, so that we can tell whether a particular snapshot
  is safe to use? Do we want to allow a non-MVCC snapshot that shows
  the latest version of each row? Only if staleness is minimal?

 Maybe. When I talk about 'flag' I actually mean a simple way to
 determine whether the MV is up-to-date or not. Snapshots and XIDs are
 probably the right way to do that in MVCC-based system.

  What about MVs which don't have incremental maintenance? We can still
  determine what xid they are current as of, from the creation or the
  latest refresh. Do we want to track that instead of a simple boolean
  flag?

 How would we use the 'as of' XID? IMHO it's unacceptable to quietly use
 stale data unless the user explicitly references the MV, so we'd have to
 assume we can't use that MV.

 regards

 --
 Tomas Vondrahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering


On 2/26/15 4:34 PM, Andres Freund wrote:

On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use.  It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.


I agree that it's a sane order of developing things. But: I don't think
it makes sense to commit it without the capability to change the
order. Not so much because it'll not serve a purpose at that point, but
rather because it'll essentially untestable. And this is a patch that
needs a fair amount of changes, both automated, and manual.


It's targeted for 9.6 anyway, so we have a while to decide on what's 
committed when. It's possible that there's no huge debate on the syntax.



At least during development I'd even add a function that randomizes the
physical order of columns, and keeps the logical one. Running the
regression tests that way seems likely to catch a fair number of bugs.


Yeah, I've thought that would be a necessary part of testing. Not really 
sure how we'd go about it with existing framework though...



+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you really
wanted to change LCO. Worst case you could create a C function to do it.


I don't think that's sufficient for testing purposes. Not only for the
patch itself, but also for getting it right in new code.


We'll want to do testing that it doesn't make sense for the API to 
support. For example, randomizing the storage ordering; that's not a 
sane use case. Ideally we wouldn't even expose physical ordering because 
the code would be smart enough to figure it out.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MemoryContext reset/delete callbacks

On 2015-02-27 01:54:27 +0100, Andres Freund wrote:
 On 2015-02-26 19:28:50 -0500, Tom Lane wrote:
/*
  *** typedef struct MemoryContextData
  *** 59,72 
  MemoryContext firstchild;   /* head of linked list of children */
  MemoryContext nextchild;/* next child of same parent */
  char   *name;   /* context name (just for 
  debugging) */
  boolisReset;/* T = no space alloced since 
  last reset */
#ifdef USE_ASSERT_CHECKING
  !   boolallowInCritSection; /* allow palloc in critical 
  section */
#endif
} MemoryContextData;
 
 It's a bit sad to push AllocSetContextData onto four cachelines from the
 current three... That stuff is hot. But I don't really see a way around
 it right now. And it seems like it'd give us more amunition to improve
 things than the small loss of speed it implies.

Actually:
struct MemoryContextData {
NodeTagtype; /* 0 4 */

/* XXX 4 bytes hole, try to pack */

MemoryContextMethods * methods;  /* 8 8 */
MemoryContext  parent;   /*16 8 */
MemoryContext  firstchild;   /*24 8 */
MemoryContext  nextchild;/*32 8 */
char * name; /*40 8 */
bool   isReset;  /*48 1 */
bool   allowInCritSection;   /*49 1 */

/* size: 56, cachelines: 1, members: 8 */
/* sum members: 46, holes: 1, sum holes: 4 */
/* padding: 6 */
/* last cacheline: 56 bytes */
};

If we move isReset and allowInCritSection after type, we'd stay at the
same size...

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On Thu, Feb 26, 2015 at 9:15 PM, Asif Naeem anaeem...@gmail.com wrote:
 This thread seems relevant, Please guide me to how can access older CF pages
 The MSVC portion of this fix got completely lost in the void:
 https://commitfest.postgresql.org/action/patch_view?id=1330

 Above link result in the following error i.e.

 Not found
 The specified URL was not found.

 Please do let me know if I missed something. Thanks.

Try commitfest-old instead, that is where the past CF app stores its
data, like that:
https://commitfest-old.postgresql.org/action/patch_view?id=1330
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-26 Thread David Steele

On 2/25/15 11:40 PM, Alvaro Herrera wrote:
 Fujii Masao wrote:
 On Tue, Feb 24, 2015 at 1:29 AM, David Steele da...@pgmasters.net wrote:
 
 1) Follow Oracle's as session option and only log each statement type
 against an object the first time it happens in a session.  This would
 greatly reduce logging, but might be too little detail.  It would
 increase the memory footprint of the module to add the tracking.
 
 Doesn't this mean that a user could conceal malicious activity simply by
 doing a innocuous query that silences all further activity?

True enough, but I thought it might be useful as an option.  I tend to
lock down users pretty tightly so there's not much they can do without
somehow getting superuser access, at which point all bets are off anyway.

In the case where users are highly constrained, the idea here would be
to just keeps tabs on the sort of things they are reading/writing for
financial audits and ISO compliance.

 2) Only log once per call to the backend.  Essentially, we would only
 log the statement you see in pg_stat_activity.  This could be a good
 option because it logs what the user accesses directly, rather than
 functions, views, etc. which hopefully are already going through a
 review process and can be audited that way.
 
 ... assuming the user does not create stuff on the fly behind your back.

Sure, but then the thing they created/modified would also be logged
somewhere earlier (assuming pg_audit.log = 'all').  It would require
some analysis to figure out what they did but I think that would be true
in many cases.

 Would either of those address your concerns?

 Before discussing how to implement, probably we need to consider the
 spec about this. For example, should we log even nested statements for
 the audit purpose? If yes, how should we treat the case where
 the user changes the setting so that only DDL is logged, and then
 the user-defined function which internally executes DDL is called?
 Since the top-level SQL (calling the function) is not the target of
 audit, we should not log even the nested DDL?
 
 Clearly if you log only DROP TABLE, and then the malicious user hides
 one such call inside a function that executes the drop (which is called
 via a SELECT top-level SQL), you're not going to be happy.

If the purpose of the auditing it to catch malicious activity then all
statements would need to be logged.  If only top-level statements are
logged then it might be harder to detect, but it would still be logged.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] mogrify and indent features for jsonb

2015-02-26 Thread Thom Brown

On 26 February 2015 at 15:09, Dmitry Dolgov 9erthali...@gmail.com wrote:

 Hi, Thom.

  Would this support deleting type and the value 'dd'

 With this patch you can delete them one by one:

  select '{a: 1, b: 2, c: {type: json, stuff: test}, d:
 [aa,bb,cc,dd]}'::jsonb - '{c, type}'::text[] - '{d, -1}'::text[];
  ?column?
 ---
  {a: 1, b: 2, c: {stuff: test}, d: [aa, bb, cc]}
 (1 row)


Doesn't work for me:

# select '{a: 1, b: 2, c: {type: json, stuff: test}, d:
[aa,bb,cc,dd]}'::jsonb - '{c, type}'::text[] - '{d, -1}'::text[];
ERROR:  operator does not exist: jsonb - text[]
LINE 1: ...ff: test}, d: [aa,bb,cc,dd]}'::jsonb - '{c, typ...
 ^
HINT:  No operator matches the given name and argument type(s). You might
need to add explicit type casts.


  Is there a way to take the json:
  '{a: 1, b: 2, c: {type: json, stuff: test}, d:
 [aa,bb,cc,dd]}'
  and add ee to d without replacing it?

 No, looks like there is no way to add a new element to array with help of
 this patch. I suppose this feature can be implemented easy enough inside
 the jsonb_concat function:

  select '{a: 1, b: 2, c: {type: json, stuff: test}, d:
 [aa,bb,cc,dd]}'::jsonb || '{d: [ee]}'::jsonb

 but I'm not sure, that it will be the best way.


Yeah, I think that may be problematic.  I agree with Josh that there's
probably no sane mix of operators for this, as I would expect your example
to replace d: [aa,bb,cc,dd] with d: [ee] rather than append
to it.  Hmm... unless we used a + operator, but then I'm not sure what
should happen in the instance of '{d: [aa]}' + '{d: bb}'.

-- 
Thom

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-26 Thread David Steele

On 2/25/15 10:42 PM, Fujii Masao wrote:
 On Tue, Feb 24, 2015 at 1:29 AM, David Steele da...@pgmasters.net wrote:
 On 2/18/15 10:25 AM, David Steele wrote:
 On 2/18/15 6:11 AM, Fujii Masao wrote:
 The pg_audit doesn't log BIND parameter values when prepared statement is 
 used.
 Seems this is an oversight of the patch. Or is this intentional?

 It's actually intentional - following the model I talked about in my
 earlier emails, the idea is to log statements only.  This also follows
 on 2ndQuadrant's implementation.

 Unfortunately, I think it's beyond the scope of this module to log bind
 variables.
 
 Maybe I can live with that at least in the first version.
 
 I'm following not only 2ndQuadrant's implementation, but
 Oracle's as well.
 
 Oracle's audit_trail (e.g., = db, extended) can log even bind values.
 Also log_statement=on in PostgreSQL also can log bind values.
 Maybe we can reuse the same technique that log_statement uses.

I'll look at how this is done in the logging code and see if it can be
used in pg_audit.

 Imagine the case where you call the user-defined function which executes
 many nested statements. In this case, pg_audit logs only top-level 
 statement
 (i.e., issued directly by client) every time nested statement is executed.
 In fact, one call of such UDF can cause lots of *same* log messages. I 
 think
 this is problematic.

 I agree - not sure how to go about addressing it, though.  I've tried to
 cut down on the verbosity of the logging in general, but of course it
 can still be a problem.

 Using security definer and a different logging GUC for the defining role
 might work.  I'll add that to my unit tests and see what happens.

 That didn't work - but I didn't really expect it to.

 Here are two options I thought of:

 1) Follow Oracle's as session option and only log each statement type
 against an object the first time it happens in a session.  This would
 greatly reduce logging, but might be too little detail.  It would
 increase the memory footprint of the module to add the tracking.

 2) Only log once per call to the backend.  Essentially, we would only
 log the statement you see in pg_stat_activity.  This could be a good
 option because it logs what the user accesses directly, rather than
 functions, views, etc. which hopefully are already going through a
 review process and can be audited that way.

 Would either of those address your concerns?
 
 Before discussing how to implement, probably we need to consider the
 spec about this. For example, should we log even nested statements for
 the audit purpose? If yes, how should we treat the case where
 the user changes the setting so that only DDL is logged, and then
 the user-defined function which internally executes DDL is called?
 Since the top-level SQL (calling the function) is not the target of
 audit, we should not log even the nested DDL?

I think logging nested statements should at least be an option.  And
yes, I think that nested statements should be logged even if the
top-level SQL is not (depending on configuration). The main case for
this would be DO blocks which can be run by anybody.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running

2015-02-26 Thread Alvaro Herrera


FWIW a fix for this has been posted to all active branches:

Author: Andres Freund and...@anarazel.de
Branch: master [fd6a3f3ad] 2015-02-26 12:50:07 +0100
Branch: REL9_4_STABLE [d72115112] 2015-02-26 12:50:07 +0100
Branch: REL9_3_STABLE [abce8dc7d] 2015-02-26 12:50:07 +0100
Branch: REL9_2_STABLE [d67076529] 2015-02-26 12:50:07 +0100
Branch: REL9_1_STABLE [5c8dabecd] 2015-02-26 12:50:08 +0100
Branch: REL9_0_STABLE [82e0d6eb5] 2015-02-26 12:50:08 +0100

Reconsider when to wait for WAL flushes/syncrep during commit.

Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.

This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.

This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...

This behavior also is involved in a rather complicated = 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.

To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records.  If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.

The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.

Discussion: 20150223165359.gf30...@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau

Per complaint from maumau and Thom Brown

Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators

Simon Riggs si...@2ndquadrant.com writes:
 On 20 February 2015 at 20:44, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, assuming that we're satisfied with just having a way to warn
 when the behavior changed (and not, in particular, a switch that can
 select old or new behavior)

 I'm in favour of your proposed improvements, but I'm having a problem
 thinking about random application breakage that would result.

 Having a warn_if_screwed parameter that we disable by default won't
 help much because if you are affected you can't change that situation.
 There are too many applications to test all of them and not all
 applications can be edited, even if they were tested.

I find this argument to be unhelpful, because it could be made in exactly
the same words against any non-backwards-compatible change whatsoever.
Nonetheless, we do make non-backwards-compatible changes all the time.

 I think the way to do this is to have a pluggable parser, so users can
 choose 1) old parser, 2) new, better parser, 3) any other parser they
 fancy that they maintain to ensure application compatibility. We've
 got a pluggable executor and optimizer, so why not a parser too?
 Implementing that in the same way we have done for executor and
 optimizer is a 1 day patch, so easily achievable for 9.5.

I don't find that particularly attractive either.  It seems quite unlikely
to me that anyone would actually use such a hook; replacing the whole
parser would be essentially unmaintainable.  Nobody uses the hooks you
mention to replace the whole planner or whole executor --- there are
wrappers out there, which is a different thing altogether.  But you could
not undo the issue at hand here without at the very least a whole new
copy of gram.y, which would need to be updated constantly if you wanted
it to keep working across more than one release.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mogrify and indent features for jsonb

2015-02-26 Thread Dmitry Dolgov

Hi, Thom.

 Would this support deleting type and the value 'dd'

With this patch you can delete them one by one:

 select '{a: 1, b: 2, c: {type: json, stuff: test}, d:
[aa,bb,cc,dd]}'::jsonb - '{c, type}'::text[] - '{d, -1}'::text[];
 ?column?
---
 {a: 1, b: 2, c: {stuff: test}, d: [aa, bb, cc]}
(1 row)


 Is there a way to take the json:
 '{a: 1, b: 2, c: {type: json, stuff: test}, d:
[aa,bb,cc,dd]}'
 and add ee to d without replacing it?

No, looks like there is no way to add a new element to array with help of
this patch. I suppose this feature can be implemented easy enough inside
the jsonb_concat function:

 select '{a: 1, b: 2, c: {type: json, stuff: test}, d:
[aa,bb,cc,dd]}'::jsonb || '{d: [ee]}'::jsonb

but I'm not sure, that it will be the best way.


On 26 February 2015 at 01:13, Josh Berkus j...@agliodbs.com wrote:

 On 02/25/2015 03:13 AM, Thom Brown wrote:
  Can you think of a reasonable syntax for doing that via operators?  I
  can imagine that as a json_path function, i.e.:
 
  jsonb_add_to_path(jsonb, text[], jsonb)
 
  or where the end of the path is an array:
 
  jsonb_add_to_path(jsonb, text[], text|int|float|bool)
 
  But I simply can't imagine an operator syntax which would make it
 clear
  what the user intended.
 
 
  No, there probably isn't a sane operator syntax for such an operation.
  A function would be nice.  I'd just want to avoid hacking away at arrays
  by exploding them, adding a value then re-arraying them and replacing
  the value.

 Well, anyway, that doesn't seem like a reason to block the patch.
 Rather, it's a reason to create another one for 9.6 ...


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Primary not sending to synchronous standby

2015-02-26 Thread Thom Brown

On 26 February 2015 at 13:08, Andres Freund and...@2ndquadrant.com wrote:

 On 2015-02-23 17:09:24 +, Thom Brown wrote:
  On 23 February 2015 at 16:53, Andres Freund and...@2ndquadrant.com
 wrote:
   Comments? This is obviously just a POC, but I think something like this
   does make a great deal of sense.
  
   Thom, does that help?

  Yeah, this appears to eliminate the problem, at least in the case I
  reported.

 I've pushed a somewhat more evolved version of this after more testing.


Thanks.  I'll give it another round of testing later.

-- 
Thom

Re: [HACKERS] Primary not sending to synchronous standby

On 2015-02-23 17:09:24 +, Thom Brown wrote:
 On 23 February 2015 at 16:53, Andres Freund and...@2ndquadrant.com wrote:
  Comments? This is obviously just a POC, but I think something like this
  does make a great deal of sense.
 
  Thom, does that help?

 Yeah, this appears to eliminate the problem, at least in the case I
 reported.

I've pushed a somewhat more evolved version of this after more testing.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning WIP patch


On 2/26/15 3:22 AM, Andres Freund wrote:

On 2015-02-26 02:20:21 -0600, Jim Nasby wrote:

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things like
the ability to have FKs to properly partitioned tables. Insert tuple routing
could also be useful.


The problem there imo isn't so much inheritance, but lack of working
unique checks across partitions. That's something we can implement
independent of this, it's just not trivial.


There's been discussion of allowing for uniqueness when we can guarantee 
no overlap between partitions, and the partition key is part of the 
unique constraint. That's the particular use case I was thinking of.


I suspect there's other partitioning features that would be useful in a 
generic inheritance setup as well; that's why I'd love to see both 
features work together... but I fear there's enough work to get there 
that it may not happen, and I don't want us to accidentally start mixing 
the two and have users start relying on it.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning WIP patch


On 2/26/15 3:09 AM, Amit Langote wrote:

Yes. If it helps, the exact use-case I have in mind is using list-based
partitioning + additional columns in some/all children (different
between children). For example, if you need to track different types of
customer payment methods, you'd have a payment parent table, a list
partition for credit  debit cards, a different list partition for bank
accounts, etc.

The reason I'd like to do this with partitioning vs plain inheritance is
presumably as we build out partitioning we'll get very useful things
like the ability to have FKs to properly partitioned tables. Insert
tuple routing could also be useful.


Unless I'm missing something again, isn't allowing partitions to have
heterogeneous rowtypes a problem in the long run? I'm afraid I'm
confused as to your stand regarding inheritance vs. new partitioning. To
be specific, children with heterogeneous schemas sounds much like what
inheritance would be good for as you say. But then isn't that why we
have to plan children individually which you said new partitioning
should get away from?


Apologies if I haven't been clear enough. What I'd like to see is the 
best of both worlds; fast partitioning when not using inheritance, and 
perhaps somewhat slower when using inheritance, but still with the 
features partitioning gives you.


But my bigger concern from a project standpoint is that we not put 
ourselves in a position of supporting something that we really don't 
want to support (a partitioning system that's got inheritance mixed in). 
As much as I'd personally like to have both features together, I think 
it would be bad for the community to go down that road without careful 
thought.



With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us


That post looks like Tom figured out a way to eliminate a problem that
hurts inheritance, so that's good.

My fear is that at some point we'll hit a problem with partitioning that
we can't solve in the inheritance model. If we allow inheritance
features into partitioning now we'll painted into a corner. If we
disallow those features now we can always re-enable them if we get to
the point where we're in the clear.

Does that make sense?

Yes, it does. In fact, I do intend to keep them separate the first
attempt of which is to choose to NOT transform a PARTITION OF parent
clause into INHERITS parent. Any code that may look like it's trying to
do that is because the patch is not fully baked yet.


Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN 
on a partition. If we release something without that being restricted 
it'll probably cause trouble later on.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql versus domains

2015-02-26 Thread Pavel Stehule

Hi

2015-02-26 18:31 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 At the behest of Salesforce, I've been looking into improving plpgsql's
 handling of variables of domain types, which is currently pretty awful.
 It's really slow, because any assignment to a domain variable goes through
 the slow double-I/O-conversion path in exec_cast_value, even if no actual
 work is needed.  And I noticed that the domain's constraints get looked up
 only once per function per session; for example this script gets
 unexpected results:

 create domain di as int;

 create function foo1(int) returns di as $$
 declare d di;
 begin
   d := $1;
   return d;
 end
 $$ language plpgsql immutable;

 select foo1(0); -- call once to set up cache

 alter domain di add constraint pos check (value  0);

 select 0::di;   -- fails, as expected

 select foo1(0); -- should fail, does not

 \c -

 select foo1(0); -- now it fails

 The reason for this is that domain_in looks up the domain's constraints
 and caches them under the calling FmgrInfo struct.  That behavior was
 designed to ensure we'd do just one constraint-lookup per query, which
 I think is reasonable.  But plpgsql sets up an FmgrInfo in the variable's
 PLpgSQL_type record, which persists for the whole session unless the
 function's parse tree is flushed for some reason.  So the constraints
 only get looked up once.

 The rough sketch I have in mind for fixing this is:

 1. Arrange to cache the constraints for domain types in typcache.c,
 so as to reduce the number of trips to the actual catalogs.  I think
 typcache.c will have to flush these cache items upon seeing any sinval
 change in pg_constraint, but that's still cheaper than searching
 pg_constraint for every query.

 2. In plpgsql, explicitly model a domain type as being its base type
 plus some constraints --- that is, stop depending on domain_in() to
 enforce the constraints, and do it ourselves.  This would mean that
 the FmgrInfo in a PLpgSQL_type struct would reference the base type's
 input function rather than domain_in, so we'd not have to be afraid
 to cache that.  The constraints would be represented as a separate list,
 which we'd arrange to fetch from the typcache once per transaction.
 (I think checking for new constraints once per transaction is sufficient
 for reasonable situations, though I suppose that could be argued about.)
 The advantage of this approach is first that we could avoid an I/O
 conversion when the incoming value to be assigned matches the domain's
 base type, which is the typical case; and second that a domain without
 any CHECK constraints would become essentially free, which is also a
 common case.


I like this variant

There can be some good optimization with  scalar types: text, varchars,
numbers and reduce IO cast.




 3. We could still have domains.c responsible for most of the details;
 the domain_check() API may be good enough as-is, though it seems to lack
 any simple way to force re-lookup of the domain constraints once per xact.

 Thoughts, better ideas?

 Given the lack of field complaints, I don't feel a need to try to
 back-patch a fix for this, but I do want to see it fixed for 9.5.


+1

Regards

Pavel



 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mogrify and indent features for jsonb

On 02/26/2015 07:25 AM, Thom Brown wrote:
 Yeah, I think that may be problematic.  I agree with Josh that there's
 probably no sane mix of operators for this, as I would expect your
 example to replace d: [aa,bb,cc,dd] with d: [ee] rather
 than append to it.  Hmm... unless we used a + operator, but then I'm not
 sure what should happen in the instance of '{d: [aa]}' + '{d: bb}'.

Yeah, that's what I would expect too.  In fact, I could would count on
replacement.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning WIP patch

On 02/25/2015 07:15 PM, Amit Langote wrote:
 On 26-02-2015 AM 05:15, Josh Berkus wrote:
 On 02/24/2015 12:13 AM, Amit Langote wrote:
 Here is an experimental patch that attempts to implement this.

 This looks awesome. 
 
 Thanks!
 
 I would love to have it for 9.5, but I guess the
 patch isn't nearly baked enough for that?

 
 I'm not quite sure what would qualify as baked enough for 9.5 though we
 can surely try to reach some consensus on various implementation aspects
 and perhaps even get it ready in time for 9.5.

Well, we don't have long at all to do that.  I guess I'm asking what
kind of completeness of code we have; is this basically done pending API
changes and bugs, or are there major bits (like, say, pg_dump  and
EXPLAIN support) which are completely unimplemented?

 where key_spec consists of partition key column names and optional
 operator class per column. Currently, there are restrictions on the
 key_spec such as allowing only column names (not arbitrary expressions
 of them), only one column for list strategy, etc.

 What's the obstacle to supporting expressions and/or IMMUTABLE
 functions?  I think it's fine to add this feature without them
 initially, I'm just asking about the roadmap for eventually supporting
 expressions in the key spec.

 
 Only one concern I can remember someone had raised is that having to
 evaluate an expression for every row during bulk-inserts may end up
 being pretty expensive. Though, we might have to live with that.

Well, it's not more expensive than having to materialize the value from
a trigger and store it on disk.  The leading one here would be functions
over timestamp; for example, the data has a timestamptz, but you want to
partition by week.

 
 I think one idea is to learn from ability to use expressions in indexes.

Sure.  So a feature to implement for the 2nd release.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduce pinning in btree indexes

2015-02-26 Thread Kevin Grittner

Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/15/2015 02:19 AM, 
Kevin Grittner wrote:
 Interestingly, the btree README points out that using the old TID
 with a new tuple poses no hazard for a scan using an MVCC snapshot,
 because the new tuple would not be visible to a snapshot created
 that long ago.

 The first question is: Do we really need that interlock for the non-MVCC 

 snapshots either?

We don't need the interlock for non-MVCC snapshots if the conditions we
limit or filter on at the index level are rechecked when we get to the heap
tuple; otherwise we do.

 If we do: For non-MVCC snapshots, we need to ensure that all index scans 

 that started before the VACUUM did complete before the VACUUM does.

Isn't it enough to be sure that if we're not going to recheck any index
tests against the heap tuple that any process-local copies of index entries
which exist at the start of the index scan phase of the vacuum are not used
after the vacuum enters the phase of freeing line pointers -- that is, any
scan which has read a page when the vacuum starts to scan the index moves
on to another page before vacuum finishes scanning that index?


 I wonder if we could find a different mechanism to enforce that. Using the 

 pin-interlock for that has always seemed a bit silly to me.

It certainly is abusing the semantics of the pin to treat it as a type of
lock on the contents.


 Perhaps grab a new heavy-weight lock on the table whenever a non-MVCC index
 scan on  the table begins, and have VACUUM wait on it.


-1

The problem I'm most interested in fixing based on user feedback is a vacuum
blocking when a cursor which is using an index scan is sitting idle.  Not
only does the vacuum of that table stall, but if it is an autovacuum worker,
that worker is prevented from cleaning up any other tables.  I have seen all
autovacuum workers blocked in exactly this way, leading to massive bloat.
What you suggest is just using a less awkward way to keep the problem.

 I found that the LP_DEAD hinting
 would be a problem with an old TID, but I figured we could work
 around that by storing the page LSN into the scan position structure
 when the page contents were read, and only doing hinting if that
 matched when we were ready to do the hinting.  That wouldn't work
 for an index which was not WAL-logged, so the patch still holds
 pins for those.


 Or you could use GetFakeLSNForUnloggedRel().

I'm unfamiliar with that, but will take a look.  (I will be back at
my usual development machine late tomorrow.)

 Robert pointed out that the visibility information
 for an index-only scan wasn't checked while the index page READ
 lock was held, so those scans also still hold the pins.


 Why does an index-only scan need to hold the pin?

Robert already answered this one, but there is a possible solution
-- provide some way to check the heap's visibility map for the TIDs
read from an index page before releasing the read lock on that page.

 Finally, there was an optimization for marking buffer position
 for possible restore that was incompatible with releasing the pin.
 I use quotes because the optimization adds overhead to every move
 to the next page in order set some variables in a structure when a
 mark is requested instead of running two fairly small memcpy()
 statements.  The two-day benchmark of the customer showed no
 performance hit, and looking at the code I would be amazed if the
 optimization yielded a measurable benefit.  In general, optimization
 by adding overhead to moving through a scan to save time in a mark
 operation seems dubious.

 Hmm. Did your test case actually exercise mark/restore? The memcpy()s 
 are not that small. Especially if it's an index-only scan, you're 
 copying a large portion of the page. Some scans call markpos on every 
 tuple, so that could add up.


I have results from the `make world` regression tests and a 48-hour
customer test.  Unfortunately I don't know how heavily either of those
exercise this code.  Do you have a suggestion for a way to test whether
there is a serious regression for something approaching a worst case?

 At some point we could consider building on this patch to recheck
 index conditions for heap access when a non-MVCC snapshot is used,
 check the visibility map for referenced heap pages when the TIDs
 are read for an index-only scan, and/or skip LP_DEAD hinting for
 non-WAL-logged indexes.  But all those are speculative future work;
 this is a conservative implementation that just didn't modify
 pinning where there were any confounding factors.

 Understood. Still, I'd like to see if we can easily get rid of the 
 pinning altogether.


That would be great, but I felt that taking care of the easy cases in
on patch and following with patches to handle the trickier cases as
separate follow-on patches made more sense than trying to do everything
at once.  Assuming we sort out the mark/restore issues for the initial
patch, it provides infrastructure to keep the

Re: [HACKERS] Partitioning WIP patch

On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
 On 26-02-2015 AM 05:15, Josh Berkus wrote:
  On 02/24/2015 12:13 AM, Amit Langote wrote:
  Here is an experimental patch that attempts to implement this.

  I would love to have it for 9.5, but I guess the
  patch isn't nearly baked enough for that?

 I'm not quite sure what would qualify as baked enough for 9.5 though we
 can surely try to reach some consensus on various implementation aspects
 and perhaps even get it ready in time for 9.5.

I think it's absolutely unrealistic to get this into 9.5. There's barely
been any progress on the current (last!) commitfest - where on earth
should the energy come to make this patch ready? And why would that be
fair against all the others that have submitted in time?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to create virtual indexes on postgres


On 2/26/15 6:17 AM, Sreerama Manoj wrote:

But, it runs with  Postgres 9.1 version...But I use 9.4..I think I cant
use that. Or as an alternative Is there any provision in postgres to
know use(Increase in Performance) of an index before creating that index.


No. It might not be too hard to port the hypothetical index work to 9.4 
though.


Also, just to let you know, this is really a topic for pgsql-general, 
not pgsql-hackers. It's also best to reply to list emails in-line, or at 
the bottom, not at the top.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql versus domains

On 2015-02-26 13:53:18 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Hm. A bit sad to open code that in plpgsql instead of making it
  available more generally.
 
 The actual checking wouldn't be open-coded, it would come from
 domain_check() (or possibly a modified version of that).  It is true
 that plpgsql would know more about domains than it does today, but

It'd still teach plpgsql more about types than it knows right now. But
on a second thought that ship has sailed long ago - and the amount of
knowledge seems relatively small. There's much more stuff about
composites there already.

 and I'm not planning to fight two compatibility battles concurrently ;-)

Heh ;)

  You're probably going to kill me because of the increased complexity,
  but how about making typecache.c smarter, more in the vein of
  relcache.c, and store the relevant info in there?
 
 I thought that's what I was proposing in point #1.

Sure, but my second point was to avoid duplicating that information into
-fcinfo and such and instead reference the typecache entry from
there. So that if the typecache entry is being rebuilt (a new mechanism
I'm afraid), whatever uses -fcinfo gets the updated
functions/definition.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql versus domains

Hi,

On 2015-02-26 12:31:46 -0500, Tom Lane wrote:
 It's really slow, because any assignment to a domain variable goes through
 the slow double-I/O-conversion path in exec_cast_value, even if no actual
 work is needed.

Hm, that's surely not nice for some types. Doing syscache lookups every
time can't help either.

 And I noticed that the domain's constraints get looked up
 only once per function per session; for example this script gets
 unexpected results:
 The reason for this is that domain_in looks up the domain's constraints
 and caches them under the calling FmgrInfo struct.

That's probably far from the only such case we have :(

 1. Arrange to cache the constraints for domain types in typcache.c,
 so as to reduce the number of trips to the actual catalogs.  I think
 typcache.c will have to flush these cache items upon seeing any sinval
 change in pg_constraint, but that's still cheaper than searching
 pg_constraint for every query.

That sounds sane.

 2. In plpgsql, explicitly model a domain type as being its base type
 plus some constraints --- that is, stop depending on domain_in() to
 enforce the constraints, and do it ourselves.  This would mean that
 the FmgrInfo in a PLpgSQL_type struct would reference the base type's
 input function rather than domain_in, so we'd not have to be afraid
 to cache that.  The constraints would be represented as a separate list,
 which we'd arrange to fetch from the typcache once per transaction.

Hm. A bit sad to open code that in plpgsql instead of making it
available more generally.

 3. We could still have domains.c responsible for most of the details;
 the domain_check() API may be good enough as-is, though it seems to lack
 any simple way to force re-lookup of the domain constraints once per xact.
 
 Thoughts, better ideas?

You're probably going to kill me because of the increased complexity,
but how about making typecache.c smarter, more in the vein of
relcache.c, and store the relevant info in there? And then, to avoid
repeated lookups, store a reference to that in fcinfo? The lifetime of
objects wouldn't be entirely trivial, but it doesn't sound impossible.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql versus domains

2015-02-26 Thread Pavel Stehule

2015-02-26 19:53 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 Andres Freund and...@2ndquadrant.com writes:
  On 2015-02-26 12:31:46 -0500, Tom Lane wrote:
  2. In plpgsql, explicitly model a domain type as being its base type
  plus some constraints --- that is, stop depending on domain_in() to
  enforce the constraints, and do it ourselves.  This would mean that
  the FmgrInfo in a PLpgSQL_type struct would reference the base type's
  input function rather than domain_in, so we'd not have to be afraid
  to cache that.  The constraints would be represented as a separate list,
  which we'd arrange to fetch from the typcache once per transaction.

  Hm. A bit sad to open code that in plpgsql instead of making it
  available more generally.

 The actual checking wouldn't be open-coded, it would come from
 domain_check() (or possibly a modified version of that).  It is true
 that plpgsql would know more about domains than it does today, but
 I'm not sure I see another use-case for this type of logic.

 To some extent this is a workaround for the fact that plpgsql does type
 conversions the way it does (ie, by I/O rather than by using the parser's
 cast mechanisms).  We've talked about changing that, but people seem to
 be afraid of the compatibility consequences, and I'm not planning to fight
 two compatibility battles concurrently ;-)


I understand, but in this case, a compatibility can be enforced simply - we
can support a only know cast mode and  IO cast mode.

IO cast is simple for lot of people, but there is a lot of performance
issues and few errors related to this topic. But this is offtopic now.

But, what can be related - for plpgsql_check is necessary some simple check
of a assigning - that should to return error or warning

This part of plpgsql_check is too complex now.


Regards

Pavel


  You're probably going to kill me because of the increased complexity,
  but how about making typecache.c smarter, more in the vein of
  relcache.c, and store the relevant info in there?

 I thought that's what I was proposing in point #1.

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering


On 2/23/15 5:09 PM, Tomas Vondra wrote:

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?


For better or worse, table structure is a form of documentation for a 
system. As such, it's very valuable to group related fields in a table 
together. When creating a table, that's easy, but as soon as you need to 
alter your careful ordering can easily end up out the window.


Perhaps to some that just sounds like pointless window dressing, but my 
experience is that on a complex system the less organized things are the 
more bugs you get due to overlooking something.


The other reason for this patch (which it maybe doesn't support 
anymore?) is to allow Postgres to use an optimal physical ordering of 
fields on a page to reduce space wasted on alignment, as well as taking 
nullability and varlena into account.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql versus domains

Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-26 12:31:46 -0500, Tom Lane wrote:
 2. In plpgsql, explicitly model a domain type as being its base type
 plus some constraints --- that is, stop depending on domain_in() to
 enforce the constraints, and do it ourselves.  This would mean that
 the FmgrInfo in a PLpgSQL_type struct would reference the base type's
 input function rather than domain_in, so we'd not have to be afraid
 to cache that.  The constraints would be represented as a separate list,
 which we'd arrange to fetch from the typcache once per transaction.

 Hm. A bit sad to open code that in plpgsql instead of making it
 available more generally.

The actual checking wouldn't be open-coded, it would come from
domain_check() (or possibly a modified version of that).  It is true
that plpgsql would know more about domains than it does today, but
I'm not sure I see another use-case for this type of logic.

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the parser's
cast mechanisms).  We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to fight
two compatibility battles concurrently ;-)

 You're probably going to kill me because of the increased complexity,
 but how about making typecache.c smarter, more in the vein of
 relcache.c, and store the relevant info in there?

I thought that's what I was proposing in point #1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite index and min()


On 2/26/15 1:34 AM, James Sewell wrote:

I have the following table:


I'm moving this discussion to -general.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Fujii Masao

On Tue, Feb 24, 2015 at 6:46 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Hello ,

I've not read this logic yet, but ISTM there is a bug in that new WAL format 
because I got the following error and the startup process could not replay 
any WAL records when I set up replication and enabled wal_compression.

LOG:  record with invalid length at 0/3B0
LOG:  record with invalid length at 0/3000518
LOG:  Invalid block length in record 0/30005A0
LOG:  Invalid block length in record 0/3000D60 ...

 Please fine attached patch which replays WAL records.

Even this patch doesn't work fine. The standby emit the following
error messages.

LOG:  invalid block_id 255 at 0/3B0
LOG:  record with invalid length at 0/30017F0
LOG:  invalid block_id 255 at 0/3001878
LOG:  record with invalid length at 0/30027D0
LOG:  record with invalid length at 0/3002E58
...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning WIP patch

On 26-02-2015 PM 05:20, Jim Nasby wrote:
 On 2/25/15 7:57 PM, Amit Langote wrote:
 AIUI, as far as we stay with a design where partitions (children) are
 individually planned, that might be OK. But, I guess things will get
 more complicated. I think the role of a parent in planning would remain
 limited to drive partition-pruning. Am I missing something?
 
 Isn't the point of adding explicit partitioning to make it faster than
 plain inheritance? Presumably as part of that we'll eventually want to
 NOT plan children individually.
 

Yes, we'd definitely want to get to a point where planning children
individually is not necessary. But I am afraid we will have to get there
a step at a time. IMHO, solving one problem of partition-pruning would
be a good start. And that will definitely be part of parent's planning
using partition bounds list (not pruning children one-by-one with
relation_excluded_by_constraints()).

 I would certainly prefer that we support the capabilities of inheritance
 along with partitioning (because in some cases you want both). But it's
 going to limit what we can do internally.

 Just to clarify are you referring to inheritance relationship between a
 partitioned table and partitions?
 
 Yes. If it helps, the exact use-case I have in mind is using list-based
 partitioning + additional columns in some/all children (different
 between children). For example, if you need to track different types of
 customer payment methods, you'd have a payment parent table, a list
 partition for credit  debit cards, a different list partition for bank
 accounts, etc.
 
 The reason I'd like to do this with partitioning vs plain inheritance is
 presumably as we build out partitioning we'll get very useful things
 like the ability to have FKs to properly partitioned tables. Insert
 tuple routing could also be useful.
 

Unless I'm missing something again, isn't allowing partitions to have
heterogeneous rowtypes a problem in the long run? I'm afraid I'm
confused as to your stand regarding inheritance vs. new partitioning. To
be specific, children with heterogeneous schemas sounds much like what
inheritance would be good for as you say. But then isn't that why we
have to plan children individually which you said new partitioning
should get away from?

 With explicit partitioning, shouldn't we go in direction where we remove
 some restrictions imposed by inheritance (think multiple inheritance)? I
 recall a link Alvaro had started the discussion with think link to a
 Tom's remark about something very related:

 http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us
 
 That post looks like Tom figured out a way to eliminate a problem that
 hurts inheritance, so that's good.
 
 My fear is that at some point we'll hit a problem with partitioning that
 we can't solve in the inheritance model. If we allow inheritance
 features into partitioning now we'll painted into a corner. If we
 disallow those features now we can always re-enable them if we get to
 the point where we're in the clear.
 
 Does that make sense?

Yes, it does. In fact, I do intend to keep them separate the first
attempt of which is to choose to NOT transform a PARTITION OF parent
clause into INHERITS parent. Any code that may look like it's trying to
do that is because the patch is not fully baked yet.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to create virtual indexes on postgres

2015-02-26 Thread Sreerama Manoj

Hi,
 I use Postgres 9.4 database.Now,I am optimizing the queries by using
the results of explain and explain analyze,Sometimes I am creating
Indexes to optimize them. But, I was not successful sometimes as even I
create Index to optimize them, the planner is not using them .

So my question was can we know whether the planner  will use the index
before actually creating a real Index..or can we create virtual or
Hypothetical Index those can only be known to the planner and not the
user or Is there any alternative to do it..If present,share with me.

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-02-26 Thread Ashutosh Bapat

Added to 2015-06 commitfest to attract some reviews and comments.

On Tue, Feb 17, 2015 at 2:56 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:

 Hi All,

 Here are the steps and infrastructure for achieving atomic commits across
 multiple foreign servers. I have tried to address most of the concerns
 raised in this mail thread before. Let me know, if I have left something.
 Attached is a WIP patch implementing the same for postgres_fdw. I have
 tried to make it FDW-independent.

 A. Steps during transaction processing
 

 1. When an FDW connects to a foreign server and starts a transaction, it
 registers that server with a boolean flag indicating whether that server is
 capable of participating in a two phase commit. In the patch this is
 implemented using function RegisterXactForeignServer(), which raises an
 error, thus aborting the transaction, if there is at least one foreign
 server incapable of 2PC in a multiserver transaction. This error thrown as
 early as possible. If all the foreign servers involved in the transaction
 are capable of 2PC, the function just updates the information. As of now,
 in the patch the function is in the form of a stub.

 Whether a foreign server is capable of 2PC, can be
 a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it
 can build the capabilities which can be used for all the servers using
 file_fdw
 b. a decision based on server version type etc. thus FDW can decide that
 by looking at the server properties for each server
 c. a user decision where the FDW can allow a user to specify it in the
 form of CREATE/ALTER SERVER option. Implemented in the patch.

 For a transaction involving only a single foreign server, the current code
 remains unaltered as two phase commit is not needed. Rest of the discussion
 pertains to a transaction involving more than one foreign servers.
 At the commit or abort time, the FDW receives call backs with the
 appropriate events. FDW then takes following actions on each event.

 2. On XACT_EVENT_PRE_COMMIT event, the FDW coins one prepared transaction
 id per foreign server involved and saves it along with xid, dbid, foreign
 server id and user mapping and foreign transaction status = PREPARING
 in-memory. The prepared transaction id can be anything represented as byte
 string. Same information is flushed to the disk to survive crashes. This is
 implemented in the patch as prepare_foreign_xact(). Persistent and
 in-memory storages and their usages are discussed later in the mail. FDW
 then prepares the transaction on the foreign server. If this step is
 successful, the foreign transaction status is changed to PREPARED. If the
 step is unsuccessful, the local transaction is aborted and each FDW will
 receive XACT_EVENT_ABORT (discussed later). The updates to the foreign
 transaction status need not be flushed to the disk, as they can be inferred
 from the status of local transaction.

 3. If the local transaction is committed, the FDW callback will get
 XACT_EVENT_COMMIT event. Foreign transaction status is changed to
 COMMITTING. FDW tries to commit the foreign transaction with the prepared
 transaction id. If the commit is successful, the foreign transaction entry
 is removed. If the commit is unsuccessful because of local/foreign server
 crash or network failure, the foreign prepared transaction resolver takes
 care of the committing it at later point of time.

 4. If the local transaction is aborted, the FDW callback will get
 XACT_EVENT_ABORT event. At this point, the FDW may or may not have prepared
 a transaction on foreign server as per step 1 above. If it has not prepared
 the transaction, it simply aborts the transaction on foreign server; a
 server crash or network failure doesn't alter the ultimate result in this
 case. If FDW has prepared the foreign transaction, it updates the foreign
 transaction status as ABORTING and tries to rollback the prepared
 transaction. If the rollback is successful, the foreign transaction entry
 is removed. If the rollback is not successful, the foreign prepared
 transaction resolver takes care of aborting it at later point of time.

 B. Foreign prepared transaction resolver
 ---
 In the patch this is implemented as a built-in function pg_fdw_resolve().
 Ideally the functionality should be run by a background worker process
 frequently.

 The resolver looks at each entry and invokes the FDW routine to resolve
 the transaction. The FDW routine returns boolean status: true if the
 prepared transaction was resolved (committed/aborted), false otherwise.
 The resolution is as follows -
 1. If foreign transaction status is COMMITTING or ABORTING, commits or
 aborts the prepared transaction resp through the FDW routine. If the
 transaction is successfully resolved, it removes the foreign transaction
 entry.
 2. Else, it checks if the local transaction was committed or

Re: [HACKERS] Partitioning WIP patch

On 2015-02-26 02:20:21 -0600, Jim Nasby wrote:
 The reason I'd like to do this with partitioning vs plain inheritance is
 presumably as we build out partitioning we'll get very useful things like
 the ability to have FKs to properly partitioned tables. Insert tuple routing
 could also be useful.

The problem there imo isn't so much inheritance, but lack of working
unique checks across partitions. That's something we can implement
independent of this, it's just not trivial.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-02-26 Thread Dean Rasheed

On 26 February 2015 at 05:41, Stephen Frost sfr...@snowman.net wrote:
 Dean,

 * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 Here's an updated patch with a new test for this bug. I've been
 developing the fixes for these RLS issues as one big patch, but I
 suppose it would be easy to split up, if that's preferred.

 Thanks for working on all of this!

 I've brought this up to date with master and addressed the little bit
 of bitrot.  I'm still reviewing it but I'm generally happy with the
 approach and would certainly welcome any additional thoughts from you or
 feedback from others.


Thanks for doing that. I had been meaning to update it myself, but
kept getting distracted by my day job.

I think this should probably be committed as 2 separate patches,
because there are really 2 separate issues being fixed here, and the
commit comment only mentions the second one:

1). The planner changes and the first new regression test (update with
from clause self join). This is a fix to the inheritance planner code,
which wasn't properly handling the case of a query containing both
target and non-target relations with RLS and inheritance.

2). The rewriter changes and the other regression tests (S.b. view on
top of Row-level security). This is more than just a code tidy-up --
there's a real bug there. If you run those tests against HEAD, the
update against the s.b. view fails to apply the RLS policy of the
underlying table because the old recursion detection code thinks it
has already handled that RTE.

I should get more time to look at this over the weekend.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-02-26 Thread Dean Rasheed

On 26 February 2015 at 05:43, Stephen Frost sfr...@snowman.net wrote:
 Dean,

 * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 Attached is a patch to make RLS checks run before attempting to
 insert/update any data rather than afterwards.

 Excellent, this I really like and it's a pretty straight-forward change.
 I wonder if there are some documentation updates which need to be done
 for this also?  I'm planning to look as I vauguely recall mentioning the
 ordering of operations somewhere along the way.

 I also addressed the bitrot from the column-priv leak patch.  Would be
 great to have you take a look at the latest and let me know if you have
 any further comments or suggestions.  I'm definitely looking forward to
 getting these changes in soon.


Thanks for the update. I don't recall any mention of ordering in the
docs, but if there isn't anything there it makes sense to add
something.

I haven't thought about this patch in a while, but I still like the
look of it. It's definitely neater to do these checks earlier and it's
good to be able to get RLS-specific errors too. I'll take a look at
the latest updates.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring GUC unit conversions

2015-02-26 Thread Fujii Masao

On Fri, Feb 13, 2015 at 10:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 In the redesign checkpoint_segments patch, Robert suggested keeping the
 settings' base unit as number of segments, but allow conversions from MB,
 GB etc. I started looking into that and found that adding a new unit to
 guc.c is quite messy. The conversions are done with complicated
 if-switch-case constructs.

 Attached is a patch to refactor that, making the conversions table-driven.
 This doesn't change functionality, it just makes the code nicer.

Isn't it good idea to allow even wal_keep_segments to converse from MB, GB etc?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of GetUserId() Usage

2015-02-26 Thread Jeevan Chalke

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I have reviewed the patch.
Patch is excellent in shape and does what is expected and discussed.
Also changes are straight forward too.

So looks good to go in.

However I have one question:

What is the motive for splitting the function return value from
SIGNAL_BACKEND_NOPERMISSION into
SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION?

Is that required for some other upcoming patches OR just for simplicity?

Currently, we have combined error for both which is simply split into two.
No issue as such, just curious as it does not go well with the subject.

You can mark this for ready for committer.

The new status of this patch is: Waiting on Author


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How about to have relnamespace and relrole?

Hello, thank you for reviewing.

The attatched are the third version of this patch.

0001-Add-regrole_v3.patch
0002-Add-regnamespace_v3.patch

- Rearranged into regrole patch and regnamespace patch as seen
  above, each of them consists of changes for code, docs,
  regtests. regnamespace patch depends on the another patch.

- Removed the irrelevant change and corrected mistakes in
  comments.

- Renamed role_or_oid to role_name_or_oid in regrolein().

- Changed the example name for regrole in the docmentation to
  'smithee' as an impossible-in-reality-but-well-known name, from
  'john', the most common in US (according to Wikipedia).



At Tue, 24 Feb 2015 16:30:44 +0530, Jeevan Chalke 
jeevan.cha...@enterprisedb.com wrote in 
CAM2+6=v-fwqfkwf0pi3dacq-zry5wxcplvluqfkf2mf57_6...@mail.gmail.com
 I agree on Tom's concern on MVCC issue, but we already hit that when we
 introduced regclass and others. So I see no additional issue with these
 changes as such. About planner slowness, I guess updated documentation looks
 perfect for that.
 
 So I went ahead reviewing these patches.
 
 All patches are straight forward and since we have similar code already
 exists, I did not find any issue at code level. They are consistent with
 other
 functions.

One concern is about arbitrary allocation of OIDs for the new
objects - types, functions. They are isolated from other similar
reg* types, but there's no help for it without global renumbering
of static(system) OIDs.

 Patches applies with patch -p1. make, make install, make check has
 no issues. Testing was fine too.
 
 However here are few review comments on the patches attached:
 
 Review points on 0001-Add-regrole.patch
 ---
 1.
 +#include utils/acl.h
 
 Can you please add it at appropriate place as #include list is an ordered
 list

regrolein calls reg_role_oid in acl.c, which is declared in acl.h.

 2.
 +char   *role_or_oid = PG_GETARG_CSTRING(0);
 
 Can we have variable named as role_name_or_oid, like other similar
 functions?

I might somehow have thought it a bit long. Fixed.

 3.
 +/*
 + * Normal case: parse the name into components and see if it matches
 any
 + * pg_role entries in the current search path.
 + */
 
 I believe, roles are never searched per search path. Thus need to update
 comments here.

Ouch. I forgot to edit it properly. I edit it. The correct
catalog name is pg_authid.

+   /* Normal case: see if the name matches any pg_authid entry. */

I also edited comments for regnamespacein.


 Review points on 0002-Add-regnamespace.patch
 ---
 1.
 + * regnamespacein- converts classname to class OID
 
 Typos. Should be nspname instead of classname and namespase OID instead of
 class OID

Thank you for pointing out. I fixed the same mistake in
regrolein, p_ts_dict was there instead of pg_authid..  The names
of oid kinds appear in multiple forms, that is, oprname and
opr_name. Although I don't understand the reason, I followed the
convention.

 Review points on 0003-Check-newpatch
 ---
 1.
 @@ -1860,6 +1860,7 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
  OidattrdefOid;
  ObjectAddress colobject,
  defobject;
 +Oidexprtype;
 
 This seems unrelated. Please remove.

It's a trace of the previous code to ruling out the new oid
types. Removed.

 Apart from this, it will be good if you have ONLY two patches,
 (1) For regrole and (2) For regnamespace specific
 With all related changes into one patch. I mean, all role related changes
 i.e.
 0001 + 0003 role related changes + 0004 role related changes and docs update
 AND
 0002 + 0003 nsp related changes + 0004 nsp related changes

I prudently separated it since I wasn't confident on the
pertinence of ruling out. I rearranged them as your advise
including docs.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 4d56a68e2bf2b7ee0da0447ad9f82f0b46277133 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Wed, 18 Feb 2015 14:38:32 +0900
Subject: [PATCH 1/2] Add regrole

Add new OID aliass type regrole. The new type has the scope of whole
the database cluster so it doesn't behave as the same as the existing
OID alias types which have database scope, concerning object
dependency. To get rid of confusion, inhibit constants of the new type
from appearing where dependencies are made involving it.

Documentation about this new type is added and some existing
descriptions are modified according to the restriction of the
type. Addition to it, put a note about possible MVCC violation and
optimization issues, which are general over the all reg* types.
---
 doc/src/sgml/datatype.sgml| 55 +---
 src/backend/bootstrap/bootstrap.c |  2 +
 src/backend/catalog/dependency.c  | 22 
 src/backend/utils/adt/regproc.c   | 98 +++
 src/backend/utils/adt/selfuncs.c  |  2 +
 src/backend/utils/cache/catcache.c

Re: [HACKERS] How about to have relnamespace and relrole?

Sorry, I fixed a silly typo in documentation in the previous version.

- of theses types has a significance... 
+ of these types has a significance...

# My fingers frequently slip as above..

I incremented the version of this revised patch to get rid of
confusion.

===
Hello, thank you for reviewing.

The attatched are the fourth version of this patch.

0001-Add-regrole_v4.patch
0002-Add-regnamespace_v4.patch

- Rearranged into regrole patch and regnamespace patch as seen
  above, each of them consists of changes for code, docs,
  regtests. regnamespace patch depends on the another patch.

- Removed the irrelevant change and corrected mistakes in
  comments.

- Renamed role_or_oid to role_name_or_oid in regrolein().

- Changed the example name for regrole in the docmentation to
  'smithee' as an impossible-in-reality-but-well-known name, from
  'john', the most common in US (according to Wikipedia).



At Tue, 24 Feb 2015 16:30:44 +0530, Jeevan Chalke 
jeevan.cha...@enterprisedb.com wrote in 
CAM2+6=v-fwqfkwf0pi3dacq-zry5wxcplvluqfkf2mf57_6...@mail.gmail.com
 I agree on Tom's concern on MVCC issue, but we already hit that when we
 introduced regclass and others. So I see no additional issue with these
 changes as such. About planner slowness, I guess updated documentation looks
 perfect for that.
 
 So I went ahead reviewing these patches.
 
 All patches are straight forward and since we have similar code already
 exists, I did not find any issue at code level. They are consistent with
 other
 functions.

One concern is about arbitrary allocation of OIDs for the new
objects - types, functions. They are isolated from other similar
reg* types, but there's no help for it without global renumbering
of static(system) OIDs.

 Patches applies with patch -p1. make, make install, make check has
 no issues. Testing was fine too.
 
 However here are few review comments on the patches attached:
 
 Review points on 0001-Add-regrole.patch
 ---
 1.
 +#include utils/acl.h
 
 Can you please add it at appropriate place as #include list is an ordered
 list

regrolein calls reg_role_oid in acl.c, which is declared in acl.h.

 2.
 +char   *role_or_oid = PG_GETARG_CSTRING(0);
 
 Can we have variable named as role_name_or_oid, like other similar
 functions?

I might somehow have thought it a bit long. Fixed.

 3.
 +/*
 + * Normal case: parse the name into components and see if it matches
 any
 + * pg_role entries in the current search path.
 + */
 
 I believe, roles are never searched per search path. Thus need to update
 comments here.

Ouch. I forgot to edit it properly. I edit it. The correct
catalog name is pg_authid.

+   /* Normal case: see if the name matches any pg_authid entry. */

I also edited comments for regnamespacein.


 Review points on 0002-Add-regnamespace.patch
 ---
 1.
 + * regnamespacein- converts classname to class OID
 
 Typos. Should be nspname instead of classname and namespase OID instead of
 class OID

Thank you for pointing out. I fixed the same mistake in
regrolein, p_ts_dict was there instead of pg_authid..  The names
of oid kinds appear in multiple forms, that is, oprname and
opr_name. Although I don't understand the reason, I followed the
convention.

 Review points on 0003-Check-newpatch
 ---
 1.
 @@ -1860,6 +1860,7 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
  OidattrdefOid;
  ObjectAddress colobject,
  defobject;
 +Oidexprtype;
 
 This seems unrelated. Please remove.

It's a trace of the previous code to ruling out the new oid
types. Removed.

 Apart from this, it will be good if you have ONLY two patches,
 (1) For regrole and (2) For regnamespace specific
 With all related changes into one patch. I mean, all role related changes
 i.e.
 0001 + 0003 role related changes + 0004 role related changes and docs update
 AND
 0002 + 0003 nsp related changes + 0004 nsp related changes

I prudently separated it since I wasn't confident on the
pertinence of ruling out. I rearranged them as your advise
including docs.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

On Wed, Feb 25, 2015 at 7:46 AM, Gilles Darold gilles.dar...@dalibo.com wrote:
 This is a far better patch and the test to export/import of the
 postgis_topology extension works great for me.

 Thanks for the work.

Attached is a patch that uses an even better approach by querying only
once all the FK dependencies of tables in extensions whose data is
registered as dumpable by getExtensionMembership(). Could you check
that it works correctly? My test case passes but an extra check would
be a good nice. The patch footprint is pretty low so we may be able to
backport this patch easily.
-- 
Michael
From 72e369ee725301003cf065b0bf9875724455dac1 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 17 Feb 2015 07:39:23 +
Subject: [PATCH 1/3] Fix ordering of tables part of extensions linked with
 constraints

Additional checks on FK constraints potentially linking between them
extension objects are done and data dump ordering is ensured. Note
that this does not take into account foreign keys of tables that are
not part of an extension linking to an extension table.
---
 src/bin/pg_dump/pg_dump.c | 80 +--
 1 file changed, 78 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2b53c72..bbcd600 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15236,7 +15236,8 @@ dumpRule(Archive *fout, DumpOptions *dopt, RuleInfo *rinfo)
 }
 
 /*
- * getExtensionMembership --- obtain extension membership data
+ * getExtensionMembership --- obtain extension membership data and check FK
+ * dependencies among extension tables.
  */
 void
 getExtensionMembership(Archive *fout, DumpOptions *dopt, ExtensionInfo extinfo[],
@@ -15368,7 +15369,9 @@ getExtensionMembership(Archive *fout, DumpOptions *dopt, ExtensionInfo extinfo[]
 		  parsePGArray(extcondition, extconditionarray, nconditionitems) 
 			nconfigitems == nconditionitems)
 		{
-			int			j;
+			int			j, i_conrelid, i_confrelid;
+			PQExpBuffer query2;
+			bool		first_elt = true;
 
 			for (j = 0; j  nconfigitems; j++)
 			{
@@ -15423,6 +15426,79 @@ getExtensionMembership(Archive *fout, DumpOptions *dopt, ExtensionInfo extinfo[]
 	}
 }
 			}
+
+			/*
+			 * Now that all the TableInfoData objects have been created for
+			 * all the extensions, check their FK dependencies and register
+			 * them to ensure correct data ordering.  Note that this is not
+			 * a problem for user tables not included in an extension
+			 * referencing with a FK tables in extensions as their constraint
+			 * is declared after dumping their data. In --data-only mode the
+			 * table ordering is ensured as well thanks to
+			 * getTableDataFKConstraints().
+			 */
+			query2 = createPQExpBuffer();
+
+			/*
+			 * Query all the foreign key dependencies for all the extension
+			 * tables found previously. Only tables whose data need to be
+			 * have to be tracked.
+			 */
+			appendPQExpBuffer(query2,
+	  SELECT conrelid, confrelid 
+	  FROM pg_catalog.pg_constraint 
+	  WHERE contype = 'f' AND conrelid IN ();
+
+			for (j = 0; j  nconfigitems; j++)
+			{
+TableInfo  *configtbl;
+Oid			configtbloid = atooid(extconfigarray[j]);
+
+configtbl = findTableByOid(configtbloid);
+if (configtbl == NULL || configtbl-dataObj == NULL)
+	continue;
+
+if (first_elt)
+{
+	appendPQExpBuffer(query2, %u, configtbloid);
+	first_elt = false;
+}
+else
+	appendPQExpBuffer(query2, , %u, configtbloid);
+			}
+			appendPQExpBuffer(query2, ););
+
+			res = ExecuteSqlQuery(fout, query2-data, PGRES_TUPLES_OK);
+			ntups = PQntuples(res);
+
+			i_conrelid = PQfnumber(res, conrelid);
+			i_confrelid = PQfnumber(res, confrelid);
+
+			/* Now get the dependencies and register them */
+			for (j = 0; j  ntups; j++)
+			{
+Oid			conrelid, confrelid;
+TableInfo  *reftable, *contable;
+
+conrelid = atooid(PQgetvalue(res, j, i_conrelid));
+confrelid = atooid(PQgetvalue(res, j, i_confrelid));
+contable = findTableByOid(conrelid);
+reftable = findTableByOid(confrelid);
+
+if (reftable == NULL ||
+	reftable-dataObj == NULL ||
+	contable == NULL ||
+	contable-dataObj == NULL)
+	continue;
+
+/*
+ * Make referencing TABLE_DATA object depend on the
+ * referenced table's TABLE_DATA object.
+ */
+addObjectDependency(contable-dataObj-dobj,
+	reftable-dataObj-dobj.dumpId);
+			}
+			resetPQExpBuffer(query2);
 		}
 		if (extconfigarray)
 			free(extconfigarray);
-- 
2.3.0

From 1d8fe1c39ec5049c39810f94153d347971738616 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 17 Feb 2015 22:29:28 +0900
Subject: [PATCH 2/3] Make prove_check install contents of current directory as
 well

This is useful for example for TAP tests in extensions.
---
 src/Makefile.global.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/Makefile.global.in

[HACKERS] BDR Multiple database

2015-02-26 Thread Jirayut Nimsaeng

Hi all,

I'm using PostgreSQL BDR 9.4.1 to test BDR capability right now

$ psql --version
psql (PostgreSQL) 9.4.1

We want to use BDR with multiple database but now all the document didn't
show any example how to config BDR with multiple database. We've tried with
many combination as below but still no luck. Anyone can point us this?

1st combination

bdr.connections = 'bdrnode02db1'
bdr.bdrnode02db1_dsn = 'dbname=db1 host=172.17.42.1 port=49264
user=postgres'
bdr.connections = 'bdrnode02db2'
bdr.bdrnode02db2_dsn = 'dbname=db2 host=172.17.42.1 port=49264
user=postgres'


2nd combination

bdr.connections = 'bdrnode02'
bdr.bdrnode02_dsn = 'dbname=db1 host=172.17.42.1 port=49264 user=postgres'
bdr.bdrnode02_dsn = 'dbname=db2 host=172.17.42.1 port=49264 user=postgres'


Regards,
Jirayut

Re: [HACKERS] Partitioning WIP patch


On 2/25/15 7:57 PM, Amit Langote wrote:

On 26-02-2015 AM 10:31, Jim Nasby wrote:

On 2/25/15 7:24 PM, Amit Langote wrote:

Does ALTER TABLE parent_monthly_x_201401 ADD COLUMN foo still
operate the same as today? I'd like to see us continue to support that,
but perhaps it would be wise to not paint ourselves into that corner
just yet.

Nothing prevents that from working, at least at the moment.


Ok, but is that what we really want? If we release it that way we'll be
stuck with it forever.



AIUI, as far as we stay with a design where partitions (children) are
individually planned, that might be OK. But, I guess things will get
more complicated. I think the role of a parent in planning would remain
limited to drive partition-pruning. Am I missing something?


Isn't the point of adding explicit partitioning to make it faster than 
plain inheritance? Presumably as part of that we'll eventually want to 
NOT plan children individually.



I would certainly prefer that we support the capabilities of inheritance
along with partitioning (because in some cases you want both). But it's
going to limit what we can do internally.


Just to clarify are you referring to inheritance relationship between a
partitioned table and partitions?


Yes. If it helps, the exact use-case I have in mind is using list-based 
partitioning + additional columns in some/all children (different 
between children). For example, if you need to track different types of 
customer payment methods, you'd have a payment parent table, a list 
partition for credit  debit cards, a different list partition for bank 
accounts, etc.


The reason I'd like to do this with partitioning vs plain inheritance is 
presumably as we build out partitioning we'll get very useful things 
like the ability to have FKs to properly partitioned tables. Insert 
tuple routing could also be useful.



With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us


That post looks like Tom figured out a way to eliminate a problem that 
hurts inheritance, so that's good.


My fear is that at some point we'll hit a problem with partitioning that 
we can't solve in the inheritance model. If we allow inheritance 
features into partitioning now we'll painted into a corner. If we 
disallow those features now we can always re-enable them if we get to 
the point where we're in the clear.


Does that make sense?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to create virtual indexes on postgres

On Thu, Feb 26, 2015 at 6:20 PM, Sreerama Manoj
manoj.sreerama...@gmail.com wrote:
 So my question was can we know whether the planner  will use the index
 before actually creating a real Index..or can we create virtual or
 Hypothetical Index those can only be known to the planner and not the user
 or Is there any alternative to do it..If present,share with me.

No, the index needs to be created to allow the planner to use it. This
reminds me of this project though, that if I recall correctly has
patches Postgres to allow the use of hypothetical indexes:
https://sourceforge.net/projects/hypotheticalind/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-02-26 Thread Asif Naeem

Hi,

This thread seems relevant, Please guide me to how can access older CF
pages e.g.

Thread
http://www.postgresql.org/message-id/flat/51f19059.7050...@pgexperts.com#51f19059.7050...@pgexperts.com
mentions
following link i.e.

The MSVC portion of this fix got completely lost in the void:
 https://commitfest.postgresql.org/action/patch_view?id=1330


Above link result in the following error i.e.

Not found
 The specified URL was not found.


Please do let me know if I missed something. Thanks.

Regards,
Muhammad Asif Naeem

On Tue, Feb 24, 2015 at 11:59 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 2/22/15 3:12 PM, Magnus Hagander wrote:
  Would you suggest removing the automated system completely, or keep it
  around and just make it possible to override it (either by removing the
  note that something is a patch, or by making something that's not listed
  as a patch become marked as such)?

 I would remove it completely.  It has been demonstrated to not work.



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to create virtual indexes on postgres

2015-02-26 Thread Sreerama Manoj

But, it runs with  Postgres 9.1 version...But I use 9.4..I think I cant use
that. Or as an alternative Is there any provision in postgres to know
use(Increase in Performance) of an index before creating that index.

On Thu, Feb 26, 2015 at 5:37 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Feb 26, 2015 at 6:20 PM, Sreerama Manoj
 manoj.sreerama...@gmail.com wrote:
  So my question was can we know whether the planner  will use the index
  before actually creating a real Index..or can we create virtual or
  Hypothetical Index those can only be known to the planner and not the
 user
  or Is there any alternative to do it..If present,share with me.

 No, the index needs to be created to allow the planner to use it. This
 reminds me of this project though, that if I recall correctly has
 patches Postgres to allow the use of hypothetical indexes:
 https://sourceforge.net/projects/hypotheticalind/
 --
 Michael

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

2015-02-26 Thread Alvaro Herrera

Tom Lane wrote:

 Thoughts?  Any objections to pushing this?

Is there any reason at all to keep
MemoryContextResetButPreserveChildren()?  Since your patch doesn't add
any callers, it seems pretty likely that there's none anywhere.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC idea - Simulated annealing to search for query plans

2015-02-26 Thread Grzegorz Parka

Dear Hackers,

I'm Grzegorz Parka, BSc Engineer of Technical Physics and student of
Computer Science at WUT, Poland. Last year I've been a bit into
evolutionary algorithms and during my research I found out about GEQO in
Postgres. I also found out that there are plans to try a different attempt
to find optimal query plans and thought it could be a good thing to use it
as a project for GSoC.

I'm interested in one of old TODO items related to the optimizer -
'Consider compressed annealing to search for query plans'. I believe this
would be potentially beneficial to Postgres to check if such a heuristic
could really choose better query plans than GEQO does. Judging by the
results that simulated annealing gives on the travelling salesman problem,
it looks like a simpler and potentially more effective way of combinatorial
optimization.

As deliverables of such a project I would provide a simple implementation
of basic simulated annealing optimizer and some form of quantitative
comparison with GEQO.

I see that this may be considerably bigger than most of GSoC projects, but
I would like to know your opinion. Do you think that this would be
beneficial enough to make a proper GSoC project? I would also like to know
if you have any additional ideas about this project.

Best regards,
Grzegorz Parka

Re: [HACKERS] Precedence of standard comparison operators

Andres Freund and...@anarazel.de writes:
 On February 26, 2015 10:29:18 PM CET, Peter Eisentraut pete...@gmx.net 
 wrote:
 My suggestion was to treat this like the standard_conforming_string
 change.  That is, warn for many years before changing.

 I don't think scs is a good example to follow.

Yeah.  For one thing, there wouldn't be any way to suppress the warning
other than to parenthesize your code, which I would find problematic
because it would penalize standard-conforming queries.  I'd prefer an
arrangement whereby once you fix your code to be standard-conforming,
you're done.

A possible point of compromise would be to leave the warning turned on
by default, at least until we get a better sense of how this would
play out in the real world.  I continue to suspect that we're making
a mountain out of, if not a molehill, at least a hillock.  I think most
sane people would have parenthesized their queries to start with rather
than go look up whether IS DISTINCT FROM binds tighter than = ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

Jim Nasby jim.na...@bluetreble.com writes:
 On 2/26/15 4:01 PM, Alvaro Herrera wrote:
 Josh Berkus wrote:
 Oh, I didn't realize there weren't commands to change the LCO.  Without
 at least SQL syntax for LCO, I don't see why we'd take it; this sounds
 more like a WIP patch.

 The reason for doing it this way is that changing the underlying
 architecture is really hard, without having to bear an endless hackers
 bike shed discussion about the best userland syntax to use.  It seems a
 much better approach to do the actually difficult part first, then let
 the rest to be argued to death by others and let those others do the
 easy part (and take all the credit along with that); that way, that
 discussion does not kill other possible uses that the new architecture
 allows.

 +1. This patch is already several years old; lets not delay it further.

I tend to agree with that, but how are we going to test things if there's
no mechanism to create a table in which the orderings are different?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators


On 2/26/15 4:09 PM, Tom Lane wrote:

Andres Freund and...@anarazel.de writes:

On February 26, 2015 10:29:18 PM CET, Peter Eisentraut pete...@gmx.net wrote:

My suggestion was to treat this like the standard_conforming_string
change.  That is, warn for many years before changing.



I don't think scs is a good example to follow.


Yeah.  For one thing, there wouldn't be any way to suppress the warning
other than to parenthesize your code, which I would find problematic
because it would penalize standard-conforming queries.  I'd prefer an
arrangement whereby once you fix your code to be standard-conforming,
you're done.

A possible point of compromise would be to leave the warning turned on
by default, at least until we get a better sense of how this would
play out in the real world.  I continue to suspect that we're making
a mountain out of, if not a molehill, at least a hillock.  I think most
sane people would have parenthesized their queries to start with rather
than go look up whether IS DISTINCT FROM binds tighter than = ...


Question of sanity aside, I can tell you that many business queries are 
written with little more sophistication than monkeys with typewriters, 
where you keep the monkeys fed with bananas and paper until your query 
results (not the SQL) looks like what someone expects it to look like. 
Then the output of that version is held as sacrosanct, and any future 
SQL changes are wrong unless they produce the expected result changes. 
In my experience this happens because some poor business person just 
needs to get their job done and either isn't allowed to bother the data 
people or the data people are just too busy, so they're stuck doing it 
themselves. From what I've seen, SQL written by developers is often a 
bit better than this... but not a lot. And part of that salvation is 
because application queries tend to be a lot less complex than 
reporting/BI ones.


I don't have a great feel for how much of an impact this specific change 
would have on that... the examples so far have all been pretty esoteric. 
I suspect most people writing such wonderful SQL don't know about IS 
DISTINCT FROM nor would they try doing things like bool_a  bool_b = 
bool_c. So there may actually be very little code to fix, but I think we 
at least need a way for users to verify that.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

Andres Freund and...@2ndquadrant.com writes:
On 2015-02-26 17:01:53 -0500, Tom Lane wrote:
We've discussed doing $SUBJECT off and on for nearly ten years,
the oldest thread I could find about it being here:
http://www.postgresql.org/message-id/flat/1186435268.16321.37.ca...@dell.linuxdev.us.dell.com
It's come up again every time we found another leak of dead child
contexts, which happened twice last year for example. And that patch
I'm pushing for expanded out-of-line objects really needs this to
become the default behavior anywhere that we can detoast objects.

I don't object to the behavioural change per se, rather the contrary,
but I find it a pretty bad idea to change the meaning of an existing
functioname this way.

That's pretty much the whole point I think.

Or are you arguing for an alternative proposal in which we remove
MemoryContextReset (or at least rename it to something new) and thereby
intentionally break all code that uses MemoryContextReset? I can't say
that I find that an improvement.

... Without a compiler erroring out people won't
notice that suddenly MemoryContextReset deletes much more; leading to
possibly hard to find errors. Context resets frequently are in error
paths, and those won't necessarily be hit when running with assertions
enabled.

With all due respect, that's utterly wrong. I have looked at every single
MemoryContextReset call in the codebase, and as far as I can see the
*only* one that is in an error path is elog.c:336, which I'm quite certain
is wrong anyway (the other reset of ErrorContext in that file uses
MemoryContextResetAndDeleteChildren, this one should too).

I see no reason whatever to believe that this change is likely to do
anything except fix heretofore-unnoticed memory leaks.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

Even this patch doesn't work fine. The standby emit the following
error messages.

 Yes this bug remains unsolved. I am still working on resolving this.

 Following chunk IDs have been added in the attached patch as suggested
 upthread.
 +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
 +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
 +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08

 XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
 XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
 and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.

Before sending a new version, be sure that this get fixed by for
example building up a master with a standby replaying WAL, and running
make installcheck-world or similar. If the standby does not complain
at all, you have good chances to not have bugs. You could also build
with WAL_DEBUG to check record consistency.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators

On 2015-02-26 20:13:34 +, Simon Riggs wrote:
 On 26 February 2015 at 15:56, Tom Lane t...@sss.pgh.pa.us wrote:
 
  I think the way to do this is to have a pluggable parser, so users can
  choose 1) old parser, 2) new, better parser, 3) any other parser they
  fancy that they maintain to ensure application compatibility. We've
  got a pluggable executor and optimizer, so why not a parser too?
  Implementing that in the same way we have done for executor and
  optimizer is a 1 day patch, so easily achievable for 9.5.
 
  I don't find that particularly attractive either.  It seems quite unlikely
  to me that anyone would actually use such a hook; replacing the whole
  parser would be essentially unmaintainable.  Nobody uses the hooks you
  mention to replace the whole planner or whole executor --- there are
  wrappers out there, which is a different thing altogether.  But you could
  not undo the issue at hand here without at the very least a whole new
  copy of gram.y, which would need to be updated constantly if you wanted
  it to keep working across more than one release.

I can see a point in having the ability to plug in a parser - I just
don't think it has much to do with this topic. It'd be a nightmare to
maintain two versions of our parser; I don't think anybody wants to
patch more than one.

 Whole planner replacement is used for extensions by CitusData and NTT,
 and will definitely be used in the future for other apps.

s/planner/parser/? Because replacing the planner can already be done as
a whole without much problems.

 Whole executor replacement is also used by one extension to produce
 DDL triggers.

Hm?

 In any case, whole planner replacement would be very desirable for
 people trying to minimize code churn in their applications. As soon as
 it exists, I will use to make some MySQL-only apps work seamlessly
 against PostgreSQL, such as WordPress. It doesn't need to be 100%
 perfect MySQL, it just needs to be enough to make the app work.
 Maintaining a code base that can work against multiple databases is
 hard. Writing it against one main database and maintaining a parser
 layer would be much easier.

Assuming you meant parser: Maybe. I have my doubt somebody will actually
invest the significant amount of time to develop something like
that. But I don't have a problem providing the capability; there seems
little point in allowing to replace the optimizer but not the planner. I
just don't see that it has much to do with this discussion.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering


On 2/26/15 4:01 PM, Alvaro Herrera wrote:

Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so.  I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.


Oh, I didn't realize there weren't commands to change the LCO.  Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.


The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use.  It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.


+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you really 
wanted to change LCO. Worst case you could create a C function to do it.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

Hi,

On 2015-02-26 17:01:53 -0500, Tom Lane wrote:
 We've discussed doing $SUBJECT off and on for nearly ten years,
 the oldest thread I could find about it being here:
 http://www.postgresql.org/message-id/flat/1186435268.16321.37.ca...@dell.linuxdev.us.dell.com
 It's come up again every time we found another leak of dead child
 contexts, which happened twice last year for example.  And that patch
 I'm pushing for expanded out-of-line objects really needs this to
 become the default behavior anywhere that we can detoast objects.

I don't object to the behavioural change per se, rather the contrary,
but I find it a pretty bad idea to change the meaning of an existing
functioname this way. Without a compiler erroring out people won't
notice that suddenly MemoryContextReset deletes much more; leading to
possibly hard to find errors. Context resets frequently are in error
paths, and those won't necessarily be hit when running with assertions
enabled.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

Andres Freund and...@2ndquadrant.com writes:
 I'd really not even be surprised if a committer backpatches a
 MemoryContextReset() addition, not realizing it behaves differently in
 the back branches.

As far as that goes, the only consequence would be a possible memory leak
in the back branches; it would not be a really fatal problem.  I'd rather
live with that risk than with the sort of intentional API break (and
ensuing back-patch pain) that you're proposing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-26 17:45:26 -0500, Tom Lane wrote:
 With all due respect, that's utterly wrong.  I have looked at every single
 MemoryContextReset call in the codebase, and as far as I can see the
 *only* one that is in an error path is elog.c:336, which I'm quite certain
 is wrong anyway (the other reset of ErrorContext in that file uses
 MemoryContextResetAndDeleteChildren, this one should too).

 Sure, in the pg codebase. But there definitely are extensions using
 memory contexts. And there's code that has to work across several
 branches.  Backpatching alone imo presents a risk; there's nothing to
 warn you three years down the road that the MemoryContextReset() you
 just backpatched doesn't work the same in the backbranches.

 If the changes breaks some code it's likely actually a good thing:
 Because, as you say, using MemoryContextReset() will likely be the wrong
 thing, and they'll want to fix it for the existing branches as well.

Is that likely to happen?  I doubt it.  They'll just mutter under their
breath about useless API breakage and move on.

Basically, this is a judgment call, and I disagree with your judgment.
I think changing the behavior of MemoryContextReset is exactly what we
want to have happen.  (And I've got to say that I'm losing patience with
backwards-compatibility arguments taken to this degree.  We might as
well stop development entirely.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning WIP patch

On Fri, Feb 27, 2015 at 3:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
 On 26-02-2015 AM 05:15, Josh Berkus wrote:
  On 02/24/2015 12:13 AM, Amit Langote wrote:
  Here is an experimental patch that attempts to implement this.

  I would love to have it for 9.5, but I guess the
  patch isn't nearly baked enough for that?

 I'm not quite sure what would qualify as baked enough for 9.5 though we
 can surely try to reach some consensus on various implementation aspects
 and perhaps even get it ready in time for 9.5.

 I think it's absolutely unrealistic to get this into 9.5. There's barely
 been any progress on the current (last!) commitfest - where on earth
 should the energy come to make this patch ready? And why would that be
 fair against all the others that have submitted in time?

+1. There are many other patches pending the in CF app waiting for
feedback, while this one showed up after the last CF deadline for 9.5
and needs design and spec decisions that should not be taken lightly
at the end of a major release development cycle. Please let's not rush
into something we may regret.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

We've discussed doing $SUBJECT off and on for nearly ten years,
the oldest thread I could find about it being here:
http://www.postgresql.org/message-id/flat/1186435268.16321.37.ca...@dell.linuxdev.us.dell.com
It's come up again every time we found another leak of dead child
contexts, which happened twice last year for example.  And that patch
I'm pushing for expanded out-of-line objects really needs this to
become the default behavior anywhere that we can detoast objects.

So attached is a patch to do it.  I've verified that this passes
make check-world, not that that proves all that much :-(

I did not make an attempt to
s/MemoryContextResetAndDeleteChildren/MemoryContextReset/ globally,
as it's certainly unnecessary to do that for testing purposes.
I'm not sure whether we should do so, or skip the code churn
(there's about 30 occurrences in HEAD).  We'd want to keep the
macro in any case to avoid unnecessary breakage of 3rd-party code.

Thoughts?  Any objections to pushing this?

regards, tom lane

diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index 45e610d..e01bcd4 100644
*** a/src/backend/utils/mmgr/README
--- b/src/backend/utils/mmgr/README
*** lifetimes that only partially overlap ca
*** 125,133 
  from different trees of the context forest (there are some examples
  in the next section).
  
! For convenience we will also want operations like reset/delete all
! children of a given context, but don't reset or delete that context
! itself.
  
  
  Globally Known Contexts
--- 125,137 
  from different trees of the context forest (there are some examples
  in the next section).
  
! Actually, it turns out that resetting a given context should almost
! always imply deleting (not just resetting) any child contexts it has.
! So MemoryContextReset() means that, and if you really do want a forest of
! empty contexts you need to call MemoryContextResetButPreserveChildren().
! 
! For convenience we also provide operations like reset/delete all children
! of a given context, but don't reset or delete that context itself.
  
  
  Globally Known Contexts
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 202bc78..f40c33e 100644
*** a/src/backend/utils/mmgr/mcxt.c
--- b/src/backend/utils/mmgr/mcxt.c
*** MemoryContextInit(void)
*** 132,145 
  
  /*
   * MemoryContextReset
   *		Release all space allocated within a context and its descendants,
   *		but don't delete the contexts themselves.
   *
   * The type-specific reset routine handles the context itself, but we
   * have to do the recursion for the children.
   */
  void
! MemoryContextReset(MemoryContext context)
  {
  	AssertArg(MemoryContextIsValid(context));
  
--- 132,176 
  
  /*
   * MemoryContextReset
+  *		Release all space allocated within a context and delete all its
+  *		descendant contexts (but not the named context itself).
+  *
+  * The type-specific reset routine handles the context itself, but we
+  * have to do the recursion for the children.
+  */
+ void
+ MemoryContextReset(MemoryContext context)
+ {
+ 	AssertArg(MemoryContextIsValid(context));
+ 
+ 	/* save a function call in common case where there are no children */
+ 	if (context-firstchild != NULL)
+ 		MemoryContextDeleteChildren(context);
+ 
+ 	/* Nothing to do if no pallocs since startup or last reset */
+ 	if (!context-isReset)
+ 	{
+ 		(*context-methods-reset) (context);
+ 		context-isReset = true;
+ 		VALGRIND_DESTROY_MEMPOOL(context);
+ 		VALGRIND_CREATE_MEMPOOL(context, 0, false);
+ 	}
+ }
+ 
+ /*
+  * MemoryContextResetButPreserveChildren
   *		Release all space allocated within a context and its descendants,
   *		but don't delete the contexts themselves.
   *
+  * Note: this was formerly the behavior of plain MemoryContextReset(), but
+  * we found out that you almost always want to delete children not keep them.
+  * This API may indeed have no use at all, but we keep it for the moment.
+  *
   * The type-specific reset routine handles the context itself, but we
   * have to do the recursion for the children.
   */
  void
! MemoryContextResetButPreserveChildren(MemoryContext context)
  {
  	AssertArg(MemoryContextIsValid(context));
  
*** MemoryContextResetChildren(MemoryContext
*** 171,177 
  	AssertArg(MemoryContextIsValid(context));
  
  	for (child = context-firstchild; child != NULL; child = child-nextchild)
! 		MemoryContextReset(child);
  }
  
  /*
--- 202,208 
  	AssertArg(MemoryContextIsValid(context));
  
  	for (child = context-firstchild; child != NULL; child = child-nextchild)
! 		MemoryContextResetButPreserveChildren(child);
  }
  
  /*
*** MemoryContextDeleteChildren(MemoryContex
*** 226,248 
  }
  
  /*
-  * MemoryContextResetAndDeleteChildren
-  *		Release all space allocated within a context and delete all
-  *		its descendants.
-  *
-  * This is a common combination case where we want to preserve

Re: [HACKERS] json_populate_record issue - TupleDesc reference leak

Andrew Dunstan and...@dunslane.net writes:
 This doesn't look quite right. Shouldn't we unconditionally release the 
 Tupledesc before the returns at lines 2118 and 2127, just as we do at 
 the bottom of the function at line 2285?

I think Pavel's patch is probably OK as-is, because the tupdesc returned
by get_call_result_type isn't reference-counted; but I agree the code
would look cleaner your way.  If the main exit isn't bothering to
distinguish this then the early exits should not either.

What I'm wondering about, though, is this bit at line 2125:

/* same logic as for json */
if (!have_record_arg  rec)
PG_RETURN_POINTER(rec);

If that's supposed to be the same logic as in the other path, then how
is it that !have_record_arg has anything to do with whether the JSON
object is empty?  Either the code is broken, or the comment is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:
 On 2/26/15 4:01 PM, Alvaro Herrera wrote:
 The reason for doing it this way is that changing the underlying
 architecture is really hard, without having to bear an endless hackers
 bike shed discussion about the best userland syntax to use.  It seems a
 much better approach to do the actually difficult part first, then let
 the rest to be argued to death by others and let those others do the
 easy part (and take all the credit along with that); that way, that
 discussion does not kill other possible uses that the new architecture
 allows.

I agree that it's a sane order of developing things. But: I don't think
it makes sense to commit it without the capability to change the
order. Not so much because it'll not serve a purpose at that point, but
rather because it'll essentially untestable. And this is a patch that
needs a fair amount of changes, both automated, and manual.

At least during development I'd even add a function that randomizes the
physical order of columns, and keeps the logical one. Running the
regression tests that way seems likely to catch a fair number of bugs.

 +1. This patch is already several years old; lets not delay it further.
 
 Plus, I suspect that you could hack the catalog directly if you really
 wanted to change LCO. Worst case you could create a C function to do it.

I don't think that's sufficient for testing purposes. Not only for the
patch itself, but also for getting it right in new code.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

On 2015-02-26 18:05:46 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-02-26 17:45:26 -0500, Tom Lane wrote:
  If the changes breaks some code it's likely actually a good thing:
  Because, as you say, using MemoryContextReset() will likely be the wrong
  thing, and they'll want to fix it for the existing branches as well.
 
 Is that likely to happen?  I doubt it.  They'll just mutter under their
 breath about useless API breakage and move on.

Maybe.

 Basically, this is a judgment call, and I disagree with your judgment.

Right. That's ok, happens all the time.

 And I've got to say that I'm losing patience with
 backwards-compatibility arguments taken to this degree.  We might as
 well stop development entirely.

Meh, normally you're on the side I'm on right now...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] MemoryContext reset/delete callbacks

We discussed this idea a couple weeks ago.  The core of it is that when a
memory context is being deleted, you might want something extra to happen
beyond just pfree'ing everything in the context.  I'm thinking in
particular that this might provide a nice solution to the problem we
discussed earlier today of managing cached information about a domain's
CHECK constraints: a function could make a reference-counted link to some
data in its FmgrInfo cache, and use a memory context callback on the
context containing the FmgrInfo to ensure that the reference count gets
decremented when needed and not before.

Attached is a draft patch for this.  I've not really tested it more than
seeing that it compiles, but it's so simple that there are unlikely to be
bugs as such in it.  There are some design decisions that could be
questioned though:

1. I used ilists for the linked list of callback requests.  This creates a
dependency from memory contexts to lib/ilist.  That's all right at the
moment since lib/ilist does no memory allocation, but it might be
logically problematic.  We could just use explicit struct foo * links
with little if any notational penalty, so I wonder if that would be
better.

2. I specified that callers have to allocate the storage for the callback
structs.  This saves a palloc cycle in just about every use-case I've
thought of, since callers generally will need to palloc some larger struct
of their own and they can just embed the MemoryContextCallback struct in
that.  It does seem like this offers a larger surface for screwups, in
particular putting the callback struct in the wrong memory context --- but
there's plenty of surface for that type of mistake anyway, if you put
whatever the void *arg is pointing at in the wrong context.
So I'm OK with this but could also accept a design in which
MemoryContextRegisterResetCallback takes just a function pointer and a
void *arg and palloc's the callback struct for itself in the target
context.  I'm unsure whether that adds enough safety to justify a separate
palloc.

3. Is the void *arg API for the callback func sufficient?  I considered
also passing it the MemoryContext, but couldn't really find a use-case
to justify that.

4. I intentionally didn't provide a MemoryContextDeregisterResetCallback
API.  Since it's just a singly-linked list, that could be an expensive
operation and so I'd rather discourage it.  In the use cases I've thought
of, you'd want the callback to remain active for the life of the context
anyway, so there would be no need.  (And, of course, there is slist_delete
for the truly determined ...)

Comments?

regards, tom lane

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 202bc78..624d08a 100644
*** a/src/backend/utils/mmgr/mcxt.c
--- b/src/backend/utils/mmgr/mcxt.c
*** MemoryContext CurTransactionContext = NU
*** 54,59 
--- 54,60 
  /* This is a transient link to the active portal's memory context: */
  MemoryContext PortalContext = NULL;
  
+ static void MemoryContextCallResetCallbacks(MemoryContext context);
  static void MemoryContextStatsInternal(MemoryContext context, int level);
  
  /*
*** MemoryContextReset(MemoryContext context
*** 150,155 
--- 151,157 
  	/* Nothing to do if no pallocs since startup or last reset */
  	if (!context-isReset)
  	{
+ 		MemoryContextCallResetCallbacks(context);
  		(*context-methods-reset) (context);
  		context-isReset = true;
  		VALGRIND_DESTROY_MEMPOOL(context);
*** MemoryContextDelete(MemoryContext contex
*** 196,201 
--- 198,211 
  	MemoryContextDeleteChildren(context);
  
  	/*
+ 	 * It's not entirely clear whether 'tis better to do this before or after
+ 	 * delinking the context; but an error in a callback will likely result in
+ 	 * leaking the whole context (if it's not a root context) if we do it
+ 	 * after, so let's do it before.
+ 	 */
+ 	MemoryContextCallResetCallbacks(context);
+ 
+ 	/*
  	 * We delink the context from its parent before deleting it, so that if
  	 * there's an error we won't have deleted/busted contexts still attached
  	 * to the context tree.  Better a leak than a crash.
*** MemoryContextResetAndDeleteChildren(Memo
*** 243,248 
--- 253,308 
  }
  
  /*
+  * MemoryContextRegisterResetCallback
+  *		Register a function to be called before next context reset/delete.
+  *		Such callbacks will be called in reverse order of registration.
+  *
+  * The caller is responsible for allocating a MemoryContextCallback struct
+  * to hold the info about this callback request, and for filling in the
+  * func and arg fields in the struct to show what function to call with
+  * what argument.  Typically the callback struct should be allocated within
+  * the specified context, since that means it will automatically be freed
+  * when no longer needed.
+  *
+  * There is no API for deregistering a callback once registered.  If you
+  * want

Re: [HACKERS] Partitioning WIP patch

On 27-02-2015 AM 03:18, Josh Berkus wrote:
 On 02/25/2015 07:15 PM, Amit Langote wrote:
 I'm not quite sure what would qualify as baked enough for 9.5 though we
 can surely try to reach some consensus on various implementation aspects
 and perhaps even get it ready in time for 9.5.
 
 Well, we don't have long at all to do that.  I guess I'm asking what
 kind of completeness of code we have; is this basically done pending API
 changes and bugs, or are there major bits (like, say, pg_dump  and
 EXPLAIN support) which are completely unimplemented?
 

I would say I am not entirely sure/satisfied about some decisions I have
made (or not) when writing even the basic patch. Yes,
pg_dump/EXPLAIN/psql, etc. are not touched. So, it seems it might not be
fair to claim it's actually something for 9.5. Let me just call it WIP
for a while while keep I working on it and receive feedback.

 Only one concern I can remember someone had raised is that having to
 evaluate an expression for every row during bulk-inserts may end up
 being pretty expensive. Though, we might have to live with that.
 
 Well, it's not more expensive than having to materialize the value from
 a trigger and store it on disk.  The leading one here would be functions
 over timestamp; for example, the data has a timestamptz, but you want to
 partition by week.
 

 I think one idea is to learn from ability to use expressions in indexes.
 
 Sure.  So a feature to implement for the 2nd release.

Actually, I'm trying to add that and see how it works. I will post an
updated patch soon if it looks good enough.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

2015-02-26 Thread Alvaro Herrera

Josh Berkus wrote:
 On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
  This patch decouples these three things so that they
  can changed freely -- but provides no user interface to do so.  I think
  that trying to only decouple the thing we currently have in two pieces,
  and then have a subsequent patch to decouple again, is additional
  conceptual complexity for no gain.
 
 Oh, I didn't realize there weren't commands to change the LCO.  Without
 at least SQL syntax for LCO, I don't see why we'd take it; this sounds
 more like a WIP patch.

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use.  It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Thoughts?  Any objections to pushing this?

 Is there any reason at all to keep
 MemoryContextResetButPreserveChildren()?  Since your patch doesn't add
 any callers, it seems pretty likely that there's none anywhere.

The only reason to keep it is to have an out if it turns out that some
third-party code actually needs that behavior.

On reflection, maybe a better API to offer for that eventuality is a
function named something like MemoryContextResetOnly(), which would
leave child contexts completely alone.  Then, if you want the old
functionality, you could get it with MemoryContextResetOnly plus
MemoryContextResetChildren.

BTW, the original thread discussed the idea of moving context bookkeeping
blocks into the immediate parent context, but the usefulness of
MemoryContextSetParent() negates the thought that that would be a good
plan.  So there's no real issue here other than potential backwards
compatibility for external code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

On 2015-02-26 23:31:16 +0100, Andres Freund wrote:
 Without a compiler erroring out people won't notice that suddenly
 MemoryContextReset deletes much more; leading to possibly hard to find
 errors. Context resets frequently are in error paths, and those won't
 necessarily be hit when running with assertions enabled.

I'd really not even be surprised if a committer backpatches a
MemoryContextReset() addition, not realizing it behaves differently in
the back branches.

How about MemoryContextReset(bool reset_children)?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

2015-02-26 Thread Kevin Grittner

Tomas Vondra tomas.von...@2ndquadrant.com wrote:

 Over the time I've heard various use cases for this patch, but in most
 cases it was quite speculative. If you have an idea where this might be
 useful, can you explain it here, or maybe point me to a place where it's
 described?


One use case is to be able to suppress default display of columns that are
used for internal purposes.  For example, incremental maintenance of
materialized views will require storing a count(t) column, and sometimes
state information for aggregate columns, in addition to what the users
explicitly request.  At the developers' meeting there was discussion of
whether and how to avoid displaying these by default, and it was felt that
when we have this logical column ordering it would be good to have a way to
suppress default display.  Perhaps this could be as simple as a special
value for logical position.
-- 

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

On 2015-02-26 17:45:26 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Or are you arguing for an alternative proposal in which we remove
 MemoryContextReset (or at least rename it to something new) and thereby
 intentionally break all code that uses MemoryContextReset?

Yes, that I am.

After a bit of thought I just sent the suggestion to add a parameter to
MemoryContextReset(). That way the compiler will warn.

  ... Without a compiler erroring out people won't
  notice that suddenly MemoryContextReset deletes much more; leading to
  possibly hard to find errors. Context resets frequently are in error
  paths, and those won't necessarily be hit when running with assertions
  enabled.
 
 With all due respect, that's utterly wrong.  I have looked at every single
 MemoryContextReset call in the codebase, and as far as I can see the
 *only* one that is in an error path is elog.c:336, which I'm quite certain
 is wrong anyway (the other reset of ErrorContext in that file uses
 MemoryContextResetAndDeleteChildren, this one should too).

Sure, in the pg codebase. But there definitely are extensions using
memory contexts. And there's code that has to work across several
branches.  Backpatching alone imo presents a risk; there's nothing to
warn you three years down the road that the MemoryContextReset() you
just backpatched doesn't work the same in the backbranches.

If the changes breaks some code it's likely actually a good thing:
Because, as you say, using MemoryContextReset() will likely be the wrong
thing, and they'll want to fix it for the existing branches as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset

Andres Freund and...@2ndquadrant.com writes:
 ... Without a compiler erroring out people won't
 notice that suddenly MemoryContextReset deletes much more; leading to
 possibly hard to find errors.

BTW, so far as *data* is concerned, the existing call deletes all data in
the child contexts already.  The only not-already-buggy operation you could
perform before that would no longer work is to allocate fresh data in one
of those child contexts, assuming you still had a pointer to such a
context.  I've not seen any coding pattern in which that's likely.  The
problem is exactly that whoever's resetting the parent context isn't aware
of child contexts having been attached to it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning WIP patch

On 27-02-2015 AM 03:24, Andres Freund wrote:
 On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
 On 26-02-2015 AM 05:15, Josh Berkus wrote:
 I would love to have it for 9.5, but I guess the
 patch isn't nearly baked enough for that?
 
 I'm not quite sure what would qualify as baked enough for 9.5 though we
 can surely try to reach some consensus on various implementation aspects
 and perhaps even get it ready in time for 9.5.
 
 I think it's absolutely unrealistic to get this into 9.5. There's barely
 been any progress on the current (last!) commitfest - where on earth
 should the energy come to make this patch ready? And why would that be
 fair against all the others that have submitted in time?
 

I realize and I apologize that it was irresponsible of me to have said
that; maybe got a bit too excited. I do not want to unduly draw people's
time on something that's not quite ready while there are other things
people have worked hard on to get in time. In all earnestness, I say we
spend time perfecting those things.

I'll add this into CF-JUN'15. I will keep posting updates meanwhile so
that when that commitfest finally starts, we will have something worth
considering.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MemoryContext reset/delete callbacks

Hi,

On 2015-02-26 19:28:50 -0500, Tom Lane wrote:
 We discussed this idea a couple weeks ago.

Hm, didn't follow that discussion.

 The core of it is that when a memory context is being deleted, you
 might want something extra to happen beyond just pfree'ing everything
 in the context.

I've certainly wished for this a couple times...

 Attached is a draft patch for this.  I've not really tested it more than
 seeing that it compiles, but it's so simple that there are unlikely to be
 bugs as such in it.  There are some design decisions that could be
 questioned though:
 
 1. I used ilists for the linked list of callback requests.  This creates a
 dependency from memory contexts to lib/ilist.  That's all right at the
 moment since lib/ilist does no memory allocation, but it might be
 logically problematic.  We could just use explicit struct foo * links
 with little if any notational penalty, so I wonder if that would be
 better.

Maybe I'm partial here, but I don't see a problem. Actually the reason I
started the ilist stuff was that I wrote a different memory context
implementation ;). Wish I'd time/energy to go back to that...

 2. I specified that callers have to allocate the storage for the callback
 structs.  This saves a palloc cycle in just about every use-case I've
 thought of, since callers generally will need to palloc some larger struct
 of their own and they can just embed the MemoryContextCallback struct in
 that.  It does seem like this offers a larger surface for screwups, in
 particular putting the callback struct in the wrong memory context --- but
 there's plenty of surface for that type of mistake anyway, if you put
 whatever the void *arg is pointing at in the wrong context.
 So I'm OK with this but could also accept a design in which
 MemoryContextRegisterResetCallback takes just a function pointer and a
 void *arg and palloc's the callback struct for itself in the target
 context.  I'm unsure whether that adds enough safety to justify a separate
 palloc.

I'm unworried about this. Yes, one might be confused for a short while,
but compared to the complexity of using any such facility sanely it
seems barely relevant.

 3. Is the void *arg API for the callback func sufficient?  I considered
 also passing it the MemoryContext, but couldn't really find a use-case
 to justify that.

Hm, seems sufficient to me.

 4. I intentionally didn't provide a MemoryContextDeregisterResetCallback
 API.  Since it's just a singly-linked list, that could be an expensive
 operation and so I'd rather discourage it.  In the use cases I've thought
 of, you'd want the callback to remain active for the life of the context
 anyway, so there would be no need.  (And, of course, there is slist_delete
 for the truly determined ...)

Yea, that seems fine. If you don't want the callback to do anything
anymore, it's easy enough to just set a flag somewhere.

   /*
 *** typedef struct MemoryContextData
 *** 59,72 
   MemoryContext firstchild;   /* head of linked list of children */
   MemoryContext nextchild;/* next child of same parent */
   char   *name;   /* context name (just for 
 debugging) */
   boolisReset;/* T = no space alloced since 
 last reset */
   #ifdef USE_ASSERT_CHECKING
 ! boolallowInCritSection; /* allow palloc in critical 
 section */
   #endif
   } MemoryContextData;

It's a bit sad to push AllocSetContextData onto four cachelines from the
current three... That stuff is hot. But I don't really see a way around
it right now. And it seems like it'd give us more amunition to improve
things than the small loss of speed it implies.


I guess it might needa a warning about being careful what you directly
free if you use this. Might also better fit in a layer above this...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

2015-02-26 Thread David Steele

On 2/26/15 1:49 PM, Jim Nasby wrote:
 On 2/23/15 5:09 PM, Tomas Vondra wrote:
 Over the time I've heard various use cases for this patch, but in most
 cases it was quite speculative. If you have an idea where this might be
 useful, can you explain it here, or maybe point me to a place where it's
 described?
 
 For better or worse, table structure is a form of documentation for a
 system. As such, it's very valuable to group related fields in a table
 together. When creating a table, that's easy, but as soon as you need to
 alter your careful ordering can easily end up out the window.
 
 Perhaps to some that just sounds like pointless window dressing, but my
 experience is that on a complex system the less organized things are the
 more bugs you get due to overlooking something.

I agree with Jim's comments.  I've generally followed column ordering
that goes something like:

1) primary key
2) foreign keys
3) flags
4) other programmatic data fields (type, order, etc.)
5) non-programmatic data fields (name, description, etc.)

The immediate practical benefit of this is that users are more likely to
see fields that they need without scrolling right.  Documentation is
also clearer because fields tend to go from most to least important
(left to right, top to bottom).  Also, if you are consistent enough then
users just *know* where to look.

I wrote a function a while back that reorders columns in tables (it not
only deals with reordering, but triggers, constraints, indexes, etc.,
though there are some caveats).  It's painful and not very efficient,
but easy to use.

Most dimension tables that I've worked with are in the millions of rows
so reordering is not problem.  With fact tables, I assess on a
case-by-case basis. It would be nice to not have to do that triage.

The function is attached if anyone is interested.

-- 
- David Steele
da...@pgmasters.net
/
 CATALOG_TABLE_COLUMN_MOVE Function
create
 or replace function _utility.catalog_table_column_move
(
strSchemaName text, 
strTableName text, 
strColumnName text, 
strColumnNameBefore text
)
returns void as $$
declare
rIndex record;
rConstraint record;
rColumn record;
strSchemaTable text = strSchemaName || '.' || strTableName;
strDdl text;
strClusterIndex text;
begin
-- Raise notice that a reorder is in progress
raise notice 'Reorder columns in table %.% (% before %)', strSchemaName, 
strTableName, strColumnName, strColumnNameBefore;

-- Get the cluster index
select pg_index.relname
  into strClusterIndex
  from pg_namespace
   inner join pg_class
on pg_class.relnamespace = pg_namespace.oid
   and pg_class.relname = strTableName
   inner join pg_index pg_index_map
on pg_index_map.indrelid = pg_class.oid
   and pg_index_map.indisclustered = true
   inner join pg_class pg_index
on pg_index.oid = pg_index_map.indexrelid
 where pg_namespace.nspname = strSchemaName;

if strClusterIndex is null then
raise exception 'Table %.% must have a cluster index before 
reordering', strSchemaName, strTableName;
end if;

-- Disable all user triggers
strDdl = 'alter table ' || strSchemaTable || ' disable trigger user';
raise notice 'Disable triggers [%]', strDdl;
execute strDdl;

-- Create temp table to hold ddl
create temp table temp_catalogtablecolumnreorder
(
type text not null,
name text not null,
ddl text not null
);

-- Save index ddl in a temp table
raise notice 'Save indexes';

for rIndex in
with index as
(
select _utility.catalog_index_list_get(strSchemaName, strTableName) 
as name
),
index_ddl as
(
select index.name,
   
_utility.catalog_index_create_get(_utility.catalog_index_get(strSchemaName, 
index.name)) as ddl
  from index
)
select index.name,
   index_ddl.ddl
  from index
   left outer join index_ddl
on index_ddl.name = index.name
   and index_ddl.ddl not like '%[function]%'
loop
raise notice 'Save %', rIndex.name;
insert into temp_catalogtablecolumnreorder values ('index', 
rIndex.name, rIndex.ddl);
end loop;

-- Save constraint ddl in a temp table
raise notice 'Save constraints';

for rConstraint in
with constraint_list as
(
select _utility.catalog_constraint_list_get(strSchemaName, 
strTableName, '{p,u,f,c}') as name
),
constraint_ddl as
(
select constraint_list.name,

Re: [HACKERS] GSoC idea - Simulated annealing to search for query plans

On 02/26/2015 01:59 PM, Grzegorz Parka wrote:
 Dear Hackers,
 
 I'm Grzegorz Parka, BSc Engineer of Technical Physics and student of
 Computer Science at WUT, Poland. Last year I've been a bit into
 evolutionary algorithms and during my research I found out about GEQO in
 Postgres. I also found out that there are plans to try a different
 attempt to find optimal query plans and thought it could be a good thing
 to use it as a project for GSoC.
 
 I'm interested in one of old TODO items related to the optimizer -
 'Consider compressed annealing to search for query plans'. I believe
 this would be potentially beneficial to Postgres to check if such a
 heuristic could really choose better query plans than GEQO does. Judging
 by the results that simulated annealing gives on the travelling salesman
 problem, it looks like a simpler and potentially more effective way of
 combinatorial optimization.
 
 As deliverables of such a project I would provide a simple
 implementation of basic simulated annealing optimizer and some form of
 quantitative comparison with GEQO.

You might look at the earlier attempt to make the GEQO replacement
pluggable.  That project failed to complete sufficiently to be a
feature though, but it did enough to show that our current GEQO
implementation was suboptimal.

I'm currently searching for this project ... it was a GSOC project, but
I think before they required posting to Google Code.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC idea - Simulated annealing to search for query plans

Josh Berkus j...@agliodbs.com writes:
 On 02/26/2015 01:59 PM, Grzegorz Parka wrote:
 I'm interested in one of old TODO items related to the optimizer -
 'Consider compressed annealing to search for query plans'.

 You might look at the earlier attempt to make the GEQO replacement
 pluggable.  That project failed to complete sufficiently to be a
 feature though, but it did enough to show that our current GEQO
 implementation was suboptimal.

 I'm currently searching for this project ... it was a GSOC project, but
 I think before they required posting to Google Code.

I seem to recall somebody demo'ing a simulated-annealing GEQO replacement
at PGCon a couple years back.  It never got to the point of being a
submitted patch though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib/fuzzystrmatch/dmetaphone.c license

On Thu, Feb 26, 2015 at 8:56 AM, Stephen Frost sfr...@snowman.net wrote:
 * Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 2/25/15 4:10 PM, Andrew Dunstan wrote:
 
 On 02/25/2015 11:59 AM, Joe Conway wrote:
 
 It's largely because of such uncertainties that I have been advised
 in the past (by those with appropriate letters after their names)
 to stop using the Artistic licence. This is why I spent nearly a
 year working on changing pgAdmin to the PostgreSQL licence.
 I committed this (1 July 2004), but cannot remember any details about
 a license discussion. And I searched the list archives and curiously
 cannot find any email at all about it either. Maybe Andrew remembers
 something.
 
 I doubt we want to rip it out without some suitable replacement -- do we?
 
 
 
 That's more than 10 years ago. I remember creating this for my then work
 at the North Carolina State Highway Patrol and sending it to Joe, but
 that's about the extent of my recollection.
 
 If the Artistic License isn't acceptable. I guess we'd have to try to
 get the code relicensed, or reimplement the function ourselves. There
 are numerous implementations out there we could copy from or use as a
 basis for reimplementation, including several licensed under the Apache
 2.0 license - is that compatible with ours?

 Perhaps a company large enough to have in-house counsel
 (EnterpriseDB?) could get a quick legal opinion on the license
 before we start pursuing other things? Perhaps this is just a
 non-issue...

 For my 2c (IANAL), I'm not convinced that it's an issue either..  I've
 certainly not heard of anyone complaining about it either, so..

 That said, we could also through SPI which might get us a bit of
 pro-bono work, if we really wanted to pursue this.  Just a hunch, but as
 they tend to be conservative (lawyers in general), I expect the answer
 we'd get is yes, it might conflict and if you want to avoid any issues
 you wouldn't include it.

Exactly :), and I just had a discussion with some legal folks about
that because it has been an issue raised internally. So the boring
stuff being... The Perl License has two meanings: GPL v2.0 or Artistic
License 1.0, and there can be problems if fuzzystrmatch.so links with
something that has portions licensed as Apache v2 because Apache v2
and GPL v2.0 are incompatible, or anything who-know-what incompatible
with Apache v2. By choosing the Artistic license there are no problems
visibly, at least for the case I have been pinged about.

 To that end, I'd suggest -core simply formally ask the authors about it.
 Once we have that answer, we can figure out what to do.  In my
 experience, at least, it's a lot better to go that route and figure out
 what the original authors really *intended* than to go get a lawyer to
 weigh in on it.  One of those approaches is both free and gives a clear
 answer, while the other is expensive and doesn't provide any real
 certainty. :)

I would go this way as well, aka ask the authors and see if it is
possible to remove the license notice and keep everything licensed
under PostgreSQL license to avoid any future problems...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC idea - Simulated annealing to search for query plans

2015-02-26 Thread Fabrízio de Royes Mello

On Thu, Feb 26, 2015 at 10:27 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2015-02-26 20:23:33 -0500, Tom Lane wrote:
  Josh Berkus j...@agliodbs.com writes:
   On 02/26/2015 01:59 PM, Grzegorz Parka wrote:
   I'm interested in one of old TODO items related to the optimizer -
   'Consider compressed annealing to search for query plans'.
 
   You might look at the earlier attempt to make the GEQO replacement
   pluggable.  That project failed to complete sufficiently to be a
   feature though, but it did enough to show that our current GEQO
   implementation was suboptimal.
 
   I'm currently searching for this project ... it was a GSOC project,
but
   I think before they required posting to Google Code.
 
  I seem to recall somebody demo'ing a simulated-annealing GEQO
replacement
  at PGCon a couple years back.  It never got to the point of being a
  submitted patch though.

 Yea, it was Jan Urbański (CCed).


And the project link: https://github.com/wulczer/saio

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] Partitioning WIP patch

On 27-02-2015 AM 03:01, Jim Nasby wrote:
 On 2/26/15 3:09 AM, Amit Langote wrote:
 Unless I'm missing something again, isn't allowing partitions to have
 heterogeneous rowtypes a problem in the long run? I'm afraid I'm
 confused as to your stand regarding inheritance vs. new partitioning. To
 be specific, children with heterogeneous schemas sounds much like what
 inheritance would be good for as you say. But then isn't that why we
 have to plan children individually which you said new partitioning
 should get away from?
 
 Apologies if I haven't been clear enough. What I'd like to see is the
 best of both worlds; fast partitioning when not using inheritance, and
 perhaps somewhat slower when using inheritance, but still with the
 features partitioning gives you.
 

I get the distinction, thanks.

Actually I wasn't quite thinking of altering the way any part of the
current partitioning based on inheritance works nor am I proposing to
get rid of it. It all stays as is. Not sure how we could say if it will
support features of the new partitioning before those features actually
begin to materialize.

 But my bigger concern from a project standpoint is that we not put
 ourselves in a position of supporting something that we really don't
 want to support (a partitioning system that's got inheritance mixed in).
 As much as I'd personally like to have both features together, I think
 it would be bad for the community to go down that road without careful
 thought.
 
 Yes, it does. In fact, I do intend to keep them separate the first
 attempt of which is to choose to NOT transform a PARTITION OF parent
 clause into INHERITS parent. Any code that may look like it's trying to
 do that is because the patch is not fully baked yet.
 
 Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN
 on a partition. If we release something without that being restricted
 it'll probably cause trouble later on.

Yes, I agree. More generally, I think the patch/approach is in need of a
clear separation of internal implementation concerns and user-facing
notions even at this point. This may be one of them. For example, with
the patch, a partition is defined as:

CREATE TABLE name PARTITION OF parent ...

Unless that turns into something like:

CREATE PARTITION name OF parent ...

we may not be able to put all the restrictions we'd want to put on a
partition for the sake of what would be partitioning internals.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-26 Thread Etsuro Fujita


On 2015/02/26 11:38, Stephen Frost wrote:

I've pushed an update for this to master and 9.4 and improved the
comments and the commit message as discussed.

Would be great if you could test and let me know if you run into any
issues!


OK, thanks!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-02-26 Thread Fabrízio de Royes Mello

Hi all,

This simple patch add CINE for ALTER TABLE ... ADD COLUMN.

So now we can:

ALTER TABLE foo
ADD COLUMN IF NOT EXISTS c1 integer;

and/or ...

ALTER TABLE foo
ADD COLUMN IF NOT EXISTS c1 integer,
ADD COLUMN c2 integer;

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index b3a4970..aba7ec0 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
 phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase
 
-ADD [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ]replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] replaceable class=PARAMETERcolumn_name/replaceable [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable [ SET DATA ] TYPE replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ USING replaceable class=PARAMETERexpression/replaceable ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable SET DEFAULT replaceable class=PARAMETERexpression/replaceable
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
   variablelist
varlistentry
-termliteralADD COLUMN/literal/term
+termliteralADD COLUMN [ IF NOT EXISTS ]/literal/term
 listitem
  para
   This form adds a new column to the table, using the same syntax as
-  xref linkend=SQL-CREATETABLE.
+  xref linkend=SQL-CREATETABLE. If literalIF NOT EXISTS/literal
+  is specified and the column already exists, no error is thrown.
  /para
 /listitem
/varlistentry
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f5d5b63..5ecb438 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,8 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 AlterTableCmd *cmd, LOCKMODE lockmode);
 static void ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname, bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2283,7 +2283,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy((attform-attname), newattname);
@@ -3399,11 +3399,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-			false, false, false, lockmode);
+			false, false, false, cmd-missing_ok, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-			false, true, false, lockmode);
+			false, true, false, cmd-missing_ok, lockmode);
 			break;
 		case AT_ColumnDefault:	/* ALTER COLUMN DEFAULT */
 			ATExecColumnDefault(rel, cmd-name, cmd-def, lockmode);
@@ -3500,13 +3500,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			/* Use the ADD COLUMN code, unless prep decided to do nothing */
 			if (cmd-def != NULL)
 ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-true, false, false, lockmode);
+true, false, false, cmd-missing_ok, lockmode);
 			break;
 		case AT_AddOidsRecurse:	/* SET WITH OIDS */
 			/* Use the ADD COLUMN code, unless prep decided to do nothing */
 			if (cmd-def != NULL)
 ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-true, true,

Re: [HACKERS] GSoC idea - Simulated annealing to search for query plans

On 02/26/2015 05:50 PM, Fabrízio de Royes Mello wrote:
 
 On Thu, Feb 26, 2015 at 10:27 PM, Andres Freund and...@2ndquadrant.com
 mailto:and...@2ndquadrant.com wrote:

 On 2015-02-26 20:23:33 -0500, Tom Lane wrote:
  Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com writes:
   On 02/26/2015 01:59 PM, Grzegorz Parka wrote:
   I'm interested in one of old TODO items related to the optimizer -
   'Consider compressed annealing to search for query plans'.
 
   You might look at the earlier attempt to make the GEQO replacement
   pluggable.  That project failed to complete sufficiently to be a
   feature though, but it did enough to show that our current GEQO
   implementation was suboptimal.
 
   I'm currently searching for this project ... it was a GSOC
 project, but
   I think before they required posting to Google Code.
 
  I seem to recall somebody demo'ing a simulated-annealing GEQO
 replacement
  at PGCon a couple years back.  It never got to the point of being a
  submitted patch though.

 Yea, it was Jan Urbański (CCed).

 
 And the project link: https://github.com/wulczer/saio

So what w'ere saying, Grzegorz, is that we would love to see someone
pick this up and get it to the point of making it a feature as a GSOC
project.  I think if you can start from where Jan left off, you could
actually complete it.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed rahilasye...@gmail.com wrote:
Even this patch doesn't work fine. The standby emit the following
error messages.

 Yes this bug remains unsolved. I am still working on resolving this.

 Following chunk IDs have been added in the attached patch as suggested
 upthread.
 +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
 +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
 +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08

 XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
 XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
 and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.

 Before sending a new version, be sure that this get fixed by for
 example building up a master with a standby replaying WAL, and running
 make installcheck-world or similar. If the standby does not complain
 at all, you have good chances to not have bugs. You could also build
 with WAL_DEBUG to check record consistency.

It would be good to get those problems fixed first. Could you send an
updated patch? I'll look into it in more details. For the time being I
am switching this patch to Waiting on Author.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC idea - Simulated annealing to search for query plans

On 2015-02-26 20:23:33 -0500, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  On 02/26/2015 01:59 PM, Grzegorz Parka wrote:
  I'm interested in one of old TODO items related to the optimizer -
  'Consider compressed annealing to search for query plans'.
 
  You might look at the earlier attempt to make the GEQO replacement
  pluggable.  That project failed to complete sufficiently to be a
  feature though, but it did enough to show that our current GEQO
  implementation was suboptimal.
 
  I'm currently searching for this project ... it was a GSOC project, but
  I think before they required posting to Google Code.
 
 I seem to recall somebody demo'ing a simulated-annealing GEQO replacement
 at PGCon a couple years back.  It never got to the point of being a
 submitted patch though.

Yea, it was Jan Urbański (CCed).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

2015-02-26 Thread Gavin Flower


On 27/02/15 14:08, David Steele wrote:
[...]

I agree with Jim's comments.  I've generally followed column ordering
that goes something like:

1) primary key
2) foreign keys
3) flags
4) other programmatic data fields (type, order, etc.)
5) non-programmatic data fields (name, description, etc.)

The immediate practical benefit of this is that users are more likely to
see fields that they need without scrolling right.  Documentation is
also clearer because fields tend to go from most to least important
(left to right, top to bottom).  Also, if you are consistent enough then
users just *know* where to look.

I wrote a function a while back that reorders columns in tables (it not
only deals with reordering, but triggers, constraints, indexes, etc.,
though there are some caveats).  It's painful and not very efficient,
but easy to use.

Most dimension tables that I've worked with are in the millions of rows
so reordering is not problem.  With fact tables, I assess on a
case-by-case basis. It would be nice to not have to do that triage.

The function is attached if anyone is interested.

I've never formally written down the order of how I define 
fields^H^H^H^H^H^H columns in a table, but David's list is the same 
order I use.


The only additional ordering I do: is to put fields that are likely to 
be long text fields, at the end of the record.


So I would certainly appreciate my logical ordering to be the natural 
order they appear.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MemoryContext reset/delete callbacks