Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 6:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 Alternately, has anyone tested whether Timsort would work well?

 I think that was proposed a few years ago and did not look so good
 in simple testing.

I tested it in 2012. I got as far as writing a patch.

Timsort is very good where comparisons are expensive -- that's why
it's especially compelling when your comparator is written in Python.
However, when testing it with text, even though there were
significantly fewer comparisons, it was still slower than quicksort.
Quicksort is cache oblivious, and that's an enormous advantage. This
was before abbreviated keys; these days, the difference must be
larger.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Michael Paquier
On Thu, Aug 20, 2015 at 11:16 AM, Amit Langote 
langote_amit...@lab.ntt.co.jp wrote:

 On 2015-08-19 PM 09:23, Simon Riggs wrote:
  We'll need regression tests that cover each restriction and docs that
  match. This is not something we should leave until last. People read the
  docs to understand the feature, helping them to reach consensus. So it is
  for you to provide the docs before, not wait until later. I will begin a
  code review once you tell me docs and tests are present. We all want the
  feature, so its all about the details now.
 

 Sorry, should have added tests and docs already. I will add them in the
 next version of the patch.


Yes, those would be really good to have before any review so as it is
possible to grasp an understanding of what this patch does. I would like to
look at it as well more in depths.


 Thanks for willing to review.


Really thanks for working on that! I am sure you are going to get a lot of
feedback.
Regards,
-- 
Michael


Re: [HACKERS] proposal: function parse_ident

2015-08-20 Thread Pavel Stehule
2015-08-20 2:22 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Jim Nasby jim.na...@bluetreble.com writes:
  Don't say parse names for things other than tables.  Only a minority
  of the types of objects used in the database have names that meet this
  specification.

  Really? My impression is that almost everything that's not a shared
  object allows for a schema...

 Tables meet this naming spec.  Columns, functions, operators, operator
 classes/families, collations, constraints, and conversions do not (you
 need more data to name them).  Schemas, databases, languages, extensions,
 and some other things also do not, because you need *less* data to name
 them.  Types also don't really meet this naming spec, because you need to
 contend with special cases like int[] or timestamp with time zone.
 So this proposal doesn't seem very carefully thought-through to me,
 or at least the use case is much narrower than it could be.

 Also, if object does not exist isn't supposed to be an error case,
 what of name is not correctly formatted?  It seems a bit arbitrary
 to me to throw an error in one case but not the other.


When I would to work with living object, then behave of cast to regclass is
correct, but I can work with object, that will be created in future, and I
need to take some other information about this future object - and then
cast has to fail.

Regards

Pavel




 regards, tom lane



Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Simon Riggs
On 20 August 2015 at 10:10, Amit Langote langote_amit...@lab.ntt.co.jp
wrote:

 On 2015-08-20 AM 05:10, Josh Berkus wrote:
  On 08/19/2015 04:59 AM, Simon Riggs wrote:
  I like the idea of a regular partitioning step because it is how you
  design such tables - lets use monthly partitions.
 
  This gives sanely terse syntax, rather than specifying pages and pages
  of exact values in DDL
 
 PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
  START WITH value;
 
  Oh, I like that syntax!
 
  How would it work if there were multiple columns?  Maybe we don't want
  to allow that for this form?
 

 Yea, we could simply restrict it to the single column case, which does not
 sound like a major restriction.


PARTITION BY ...
SUBPARTITION BY ...

We should plan for that in the way we develop the internals, but full
support can wait until later patches.

My view has long been that the internals are they aspect here, not the
syntax. We need to be able to have a very fast partition-selection
mechanism that can be used in the planner or executor for each tuple.
Working backwards, we need a relcache representation that allows that, and
a catalog representation that allows that and syntax to match.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-19 AM 02:59, Corey Huinker wrote:
 
 Quick thoughts borne of years of slugging it out with partitions on Oracle:
 
 - Finally!!!
 
 - Your range partitioning will need to express exclusive/inclusive bounds,
 or go to the Oracle model where every partition is a cascading values less
 than test context dependent on the partitions defined before it. I would
 suggest that leveraging existing range types (or allowing the user to
 specify a range type, like for a specific collation of a text range) would
 allow for the most flexible and postgres-ish range definition. You seem to
 do this with the [USING] opclass_name bit, but don't follow through on
 the START(...) and END(...).  Something like FOR VALUES @
 '[''2014-01-01'',''2015-01-01)'::daterange would cover most needs
 succinctly, though I admit the syntax for complex ranges could be
 cumbersome, though something like FOR VALUES @
 '[(''a'',1),(''b'',1))'::letter_then_number_range is still readable.
 

It seems the way of specifying per-partition definition/constraint,
especially for range partitioning, would have a number of interesting
alternatives.

By the way, the [USING opclass_name] bit is just a way of telling that a
particular key column has user-defined notion of ordering in case of
range partitioning and equality for list partitioning. The opclass would
eventually determine which WHERE clauses (looking at operators, operand
types) are candidates to help prune partitions. If we use the range_op
range_literal::range_type notation to describe partition constraint for
each partition, it might not offer much beyond the readability. We are not
really going to detect range operators being applied in WHERE conditions
to trigger partition pruning, for example. Although I may be missing
something...

 - No partitioning scheme survives first contact with reality. So you will
 need a facility for splitting and joining existing partitions. For
 splitting partitions, it's sufficient to require that the new partition
 share either a upper/lower bound (with the same inclusivity/exclusivity) of
 an existing partition, thus uniquely identifying the partition to be split,
 and require that the other bound be within the range of the partition to be
 split. Similarly, it's fair to require that the partitions to be joined be
 adjacent in range. In both cases, range operators make these tests simple.
 

SPLIT/MERGE can be done in later patches/release, I think.

 - Your features 4 and 5 are implemented in Oracle with SWAP PARTITION,
 which is really neat for doing ETLs and index rebuilds offline in a copy
 table, and then swapping the data segment of that table with the partition
 specified. Which could be considered cheating because none of the partition
 metadata changed, just the pointers to the segments. We already do this
 with adding removing INHERIT. I'm not saying they can't be separate
 functionality, but keeping an atomic SWAP operation would be grand.
 

I think we can manage to find ways to make the proposed ATTACH/DETACH as
useful and convenient. Thanks for reminding of SWAP PARTITION.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-20 AM 05:10, Josh Berkus wrote:
 On 08/19/2015 04:59 AM, Simon Riggs wrote:
 I like the idea of a regular partitioning step because it is how you
 design such tables - lets use monthly partitions.

 This gives sanely terse syntax, rather than specifying pages and pages
 of exact values in DDL

PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
 START WITH value;
 
 Oh, I like that syntax!
 
 How would it work if there were multiple columns?  Maybe we don't want
 to allow that for this form?
 

Yea, we could simply restrict it to the single column case, which does not
sound like a major restriction.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Simon Riggs
On 18 August 2015 at 21:03, Paul Ramsey pram...@cleverelephant.ca wrote:


 So I need a way to either (a) notice when I already have a (old) copy
 of the library loaded and avoid trying to setup the GUC in that case
 or (b) set-up the GUC in a somewhat less brittle way than
 DefineCustomStringVariable() allows, something that can overwrite
 things instead of just erroring out.


Are you trying to preserve the in-memory state across upgrade as well? It
sounds unlikely we can support that directly in the general case.

Sounds like we need RedefineCustomStringVariable()

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-08-20 Thread Michael Paquier
On Wed, Jul 22, 2015 at 4:28 PM, Alexander Korotkov 
a.korot...@postgrespro.ru wrote:

 On Wed, Jul 22, 2015 at 8:48 AM, Michael Paquier 
 michael.paqu...@gmail.com wrote

 On Mon, Jul 20, 2015 at 9:18 PM, Alexander Korotkov
 a.korot...@postgrespro.ru wrote:
  attached patch allows pg_rewind to work when target timeline was
 switched.
  Actually, this patch fixes TODO from pg_rewind comments.
 
/*
 * Trace the history backwards, until we hit the target timeline.
 *
 * TODO: This assumes that there are no timeline switches on the
 target
 * cluster after the fork.
 */
 
  This patch allows pg_rewind to handle data directory synchronization is
 much
  more general way.
 For instance, user can return promoted standby to old master.


+   /*
+* Since incomplete segments are copied into next
timelines, find the
+* lastest timeline holding required segment.
+*/
+   while (private-tliIndex  targetNentries - 1 
+  targetHistory[private-tliIndex].end 
targetSegEnd)
+   {
+   private-tliIndex++;
+   tli_index++;
+   }
It seems to me that the patch is able to handle timeline switches onwards,
but not backwards and this is what would be required to return a promoted
standby, that got switched to let's say timeline 2 when promoted, to an old
master, that is still on timeline 1. This code actually fails when scanning
for the last checkpoint before WAL forked as it will be on the timeline 1
of the old master. Imagine for example that the WAL has forked at
0/30X which is saved in segment 00020003 (say 2/0/3) on
the promoted standby, and that the last checkpoint record is on 0/20X,
which is part of 00010002 (1/0/2). I think that we should
scan 2/0/3 (not the partial segment 1/0/3), and then 1/0/2 when looking for
the last checkpoint record. Hence the startup index TLI should be set to
the highest timeline and should be decremented depending on what is in the
history file.
The code above looks correct to me when scanning the WAL history onwards
though, which is what is done when extracting the page map, but not
backwards when we try to find the last common checkpoint record. This code
actually fails trying to open 2/0/2 that does not exist in the promoted
standby's pg_xlog in my test case.

Attached is a small script I have used to reproduce the failure.

I think that the documentation needs a brush up as well to outline the fact
that pg_rewind would be able to put back as well a standby in a cluster,
after for example an operator mistake when promoting a node that should not
be.
Thoughts?
-- 
Michael


rewind_test.bash
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-20 PM 06:34, Simon Riggs wrote:
 On 20 August 2015 at 10:10, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:
 
 On 2015-08-20 AM 05:10, Josh Berkus wrote:
PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
 START WITH value;

 Oh, I like that syntax!

 How would it work if there were multiple columns?  Maybe we don't want
 to allow that for this form?


 Yea, we could simply restrict it to the single column case, which does not
 sound like a major restriction.

 
 PARTITION BY ...
 SUBPARTITION BY ...
 
 We should plan for that in the way we develop the internals, but full
 support can wait until later patches.
 

At the moment, a form of SUBPARTITION BY is to allow PARTITION BY in a
partition definition. But I can see that may not be what people would expect.

 My view has long been that the internals are they aspect here, not the
 syntax. We need to be able to have a very fast partition-selection
 mechanism that can be used in the planner or executor for each tuple.
 Working backwards, we need a relcache representation that allows that, and
 a catalog representation that allows that and syntax to match.
 

Agreed.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-19 PM 09:52, David Fetter wrote:
 On Wed, Aug 19, 2015 at 04:30:39PM +0900, Amit Langote wrote:

 One small change to make this part more efficient:

 1. Take the access exclusive lock on table_name.
 2. Check for a matching constraint on it.
 3. If it's there, mark it as a valid partition.
 4. If not, check for values outside the boundaries as above.


 That's an interesting idea. Thanks!
 
 I hope I'm advancing this feature rather than bogging it down...
 

Definitely advancing.

 By a matching constraint, I guess you mean a 'valid' constraint from
 which the declared partition constraint can be proven to follow. For
 (a simple) example, from a CHECK (a = 100 AND a  150) on
 table_name, the partition constraint implied by FOR VALUES START
 (100) END (200) can be assumed to hold.
 
 Well, I was assuming an exact match, but a stricter match seems like a
 nice-to-have...possibly later.
 
 Should the be a *valid* constraint?  Perhaps that should be
 parameterized, as I'm not yet seeing a compelling argument either
 direction.  I'm picturing something like:

 ALTER TABLE table_name SET VALID PARTITION OF parent [TRUST]

 where TRUST would mean that an existing constraint need not be VALID.

 Hmm, I'd think this step must be able to assert the partition
 constraint beyond any doubt.  If the DBA added the constraint and
 marked it invalid, she should first VALIDATE the constraint to make
 it valid by performing whatever steps necessary before. IOW, a full
 heap scan at least once is inevitable (the reason why we might want
 to make this a two step process at all). Am I missing something?
 
 There are use cases where we need to warn people that their assertions
 need to be true, and if those assertions are not true, this will
 explode, leaving them to pick the resulting shrapnel out of their
 faces.  There are other parts of the system where this is true, as
 when people write UDFs in C.
 
 As I understand it, NOT VALID means, I assert that the tuples already
 here fit the constraint.  Any changes will be checked against the
 constraint.
 
 I've seen cases where a gigantic amount of data is coming out of some
 distributed system which holds the constraint as an invariant.  This
 let a DBA decide to add a NOT VALID constraint in order not to take
 the hit of a second full scan of the data, which might have made the
 import, and possibly the entire project, untenable.
 
 See above.
 

Ah, I understand the point of parameterization (TRUST). Seems like it
would be good to have with appropriate documentation of the same. Perhaps,
it might as well a parameter to the step 1 itself.

 5. Detach partition

 ALTER TABLE partitioned_table
 DETACH PARTITION partition_name [USING table_name]

 This removes partition_name as partition of partitioned_table.
 The table continues to exist with the same name or 'table_name',
 if specified.  pg_class.relispartition is set to false for the
 table, so it behaves like a normal table.

 Could this take anything short of an access exclusive lock on the
 parent?

 Yes, both the step 1 of ATTACH command and DETACH command take
 access exclusive lock on the parent. They are rather quick metadata
 changes, so should not stall others significantly, I think.
 
 So no.  Weakening required locks has been something of an ongoing
 project, project-wide, and need not be part of the first cut of this
 long-needed feature.
 

Do you mean ATTACH and DETACH, if they require access exclusive lock on
the parent, should not be in the first cut? Or am I misreading?

If so, there is no way to drop partitions. With the patch, it would be
achieved with detach and drop (if required).

 Thanks so much for working on this!
 

Thanks for the feedback and suggestions!

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Simon Riggs
On 20 August 2015 at 03:16, Amit Langote langote_amit...@lab.ntt.co.jp
wrote:


 Sorry, should have added tests and docs already. I will add them in the
 next version of the patch. Thanks for willing to review.


Thanks for picking up this challenge. It's easier if you have someone
interested all the way.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Pavan Deolasee
On Tue, Aug 18, 2015 at 4:00 PM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:


 Hi,

 I would like propose $SUBJECT for this development cycle. Attached is a
 WIP patch that implements most if not all of what's described below. Some
 yet unaddressed parts are mentioned below, too. I'll add this to the
 CF-SEP.

 Syntax
 ==

 1. Creating a partitioned table

 CREATE TABLE table_name
 PARTITION BY {RANGE|LIST}
 ON (column_list);

 Where column_list consists of simple column names or expressions:

 PARTITION BY LIST ON (name)
 PARTITION BY RANGE ON (year, month)

 PARTITION BY LIST ON ((lower(left(name, 2)))
 PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))



How about HASH partitioning? Are there plans to support foreign tables as
partitions?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-20 PM 06:27, Pavan Deolasee wrote:
 On Tue, Aug 18, 2015 at 4:00 PM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:

 PARTITION BY LIST ON (name)
 PARTITION BY RANGE ON (year, month)

 PARTITION BY LIST ON ((lower(left(name, 2)))
 PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))


 
 How about HASH partitioning? Are there plans to support foreign tables as
 partitions?
 

I've not given HASH partitioning a lot of thought yet. About the latter,
it seems it should not be too difficult to incorporate into the proposed
partitioning internals.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2015-08-20 Thread Pavel Stehule
2015-08-20 21:16 GMT+02:00 Jim Nasby jim.na...@bluetreble.com:

 On 8/19/15 7:22 PM, Tom Lane wrote:

 Jim Nasby jim.na...@bluetreble.com writes:

 Don't say parse names for things other than tables.  Only a minority
 of the types of objects used in the database have names that meet this
 specification.


 Really? My impression is that almost everything that's not a shared
 object allows for a schema...


 Tables meet this naming spec.  Columns, functions, operators, operator
 classes/families, collations, constraints, and conversions do not (you
 need more data to name them).  Schemas, databases, languages, extensions,
 and some other things also do not, because you need *less* data to name
 them.  Types also don't really meet this naming spec, because you need to
 contend with special cases like int[] or timestamp with time zone.
 So this proposal doesn't seem very carefully thought-through to me,
 or at least the use case is much narrower than it could be.

 Also, if object does not exist isn't supposed to be an error case,
 what of name is not correctly formatted?  It seems a bit arbitrary
 to me to throw an error in one case but not the other.


 I think the important point here is this is *parse*_ident(). It's not
 meant to guarantee an object actually exists, what kind of object it is, or
 whether it's syntactically correct. It's meant only to separate an
 identifier into it's 3 (or in some cases 2) components. If this was as
 simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's
 obviously a lot more complex than that.


parsing composite identifier is pretty complex - and almost all work is
done - it just need SQL envelope only

Pavel



 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-08-20 Thread Pavel Stehule
Hi

2015-08-20 16:42 GMT+02:00 Alvaro Herrera alvhe...@2ndquadrant.com:

 Pavel Stehule wrote:

 Hi,

  Job schedulers are important and sometimes very complex part of any
  software. PostgreSQL miss it. I propose new contrib module, that can be
  used simply for some tasks, and that can be used as base for other more
  richer schedulers. I prefer minimalist design - but strong enough for
  enhancing when it is necessary. Some complex logic can be implemented in
 PL
  better than in C. Motto: Simply to learn, simply to use, simply to
  customize.

 Have you made any progress on this?


I am working on second iteration prototype - resp. I worked one month ago.
I finished the basic design, basic infrastructure. I designed architecture
based on one coordinator and dynamicaly started workers.

I found, so probably some fair policy should be implemented in future.

I have to finish other requests now, so I am planning to continue at end of
autumn, but sources are public

https://github.com/okbob/generic-scheduler, Not sure, about code quality -
I had not time to debug it. But mental model (UI) is almost complete -
https://github.com/okbob/generic-scheduler/blob/master/schedulerx--1.0.sql

I found as interesting idea to handle not only time events, but handle our
notifications too. It can be perfect base for building some complex
workflow systems. But I did zero work on this topic.

Regards

Pavel







 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-20 PM 10:19, David Fetter wrote:
 On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:

 Ah, I understand the point of parameterization (TRUST). Seems like it
 would be good to have with appropriate documentation of the same. Perhaps,
 it might as well a parameter to the step 1 itself.
 
 So TRUST would obviate step 2?  Better still!
 

I really wish we could do this in one step, but...

As you know, primary intention behind keeping step 2 separate is to do the
heavy validation work with access exclusive lock on the other table
instead of on the master table.

Assuming we add TRUST TO step 1 (with implications clarified in docs),
when/if the user does not use TRUST, the default behavior being to perform
the validation, would it be OK as it would have to take the expensive lock
on the master table?

Motivation for keeping the step 2 (and TRUST parameter with it) is really
to avoid this last concern.

All that said, we'd still need some way to tell rows that came from a
TRUSTed table but do not really belong to the partition because we skipped
checking that at all. Obviously one can always write a query on the
partition that can find them, but some form of DDL would be what people
might expect.


 Do you mean ATTACH and DETACH, if they require access exclusive lock on
 the parent, should not be in the first cut? Or am I misreading?
 
 Sorry I was unclear.
 
 ATTACH and DETACH should be in the first cut even if they require an
 access exclusive lock.
 

Ah, got it.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-20 Thread Peter Geoghegan
On Sat, Jun 13, 2015 at 3:53 PM, Peter Geoghegan p...@heroku.com wrote:
 I think we should really address this. Attached patch adds a new
 release note item for it. It also adds to the documentation that
 explains why users should prefer varchar(n)/text to character(n); the
 lack of abbreviated key support now becomes a huge disadvantage for
 character(n), whereas in previous versions the disadvantages were
 fairly minor.

 In passing, I updated the existing sort item to reflect that only
 varchar(n), text, and numeric benefit from the abbreviation
 optimization (not character types more generally + numeric), and added
 a note on the effectiveness of the abbreviation optimization alone.

A recent e-mail from Kaigai-san [1] reminded me of this item. I really
think this limitation of char(n) needs to be documented along the
lines I proposed here back in June. Benchmarks like TPC-H use char(n)
extensively, since it's faster in other systems. However, PostgreSQL
now has hugely inferior sort performance for that type as compared to
text/varchar(n). This needs to be highlighted.

[1] 
http://www.postgresql.org/message-id/flat/CAM3SWZRRCs6KAyN-bDsh0_pG=8xm3fvcf1x9dlsvd3wvbt1...@mail.gmail.com#CAM3SWZRRCs6KAyN-bDsh0_pG=8xm3fvcf1x9dlsvd3wvbt1...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Jim Nasby

On 8/20/15 3:44 PM, Josh Berkus wrote:

What could be added as an extension?


A method for preventing duplicate object keys.

Since I'm in the minority here lets just drop it. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statistics for array types

2015-08-20 Thread Tomas Vondra

Hi,

On 08/11/2015 04:38 PM, Jeff Janes wrote:

When reviewing some recent patches, I decided the statistics gathered
 for arrays had some pre-existing shortcomings.

The main one is that when the arrays contain rare elements there is
no histogram to fall back upon when the MCE array is empty, the way
there is for scalar stats.  So it has to punt completely and resort
to saying that it is 0.5% selectivity without recourse to any data at
all.

The rationale for applying the threshold before things are eligible
for inclusion in the MCE array seems to be that this puts some
theoretical bound on the amount of error we are likely to have in
that element.  But I think it is better to exceed that theoretical
bound than it is to have no data at all.

The attached patch forces there to be at least one element in MCE,
keeping the one element with the highest predicted frequency if the
MCE would otherwise be empty.  Then any other element queried for is
assumed to be no more common than this most common element.


We only really need the frequency, right? So do we really need to keep
the actual MCV element? I.e. most_common_elem_freqs does not have the
same number of values as most_common_elems anyway:

  A list of the frequencies of the most common element values, i.e., the
  fraction of rows containing at least one instance of the given value.
  Two or three additional values follow the per-element frequencies;
  these are the minimum and maximum of the preceding per-element
  frequencies, and optionally the frequency of null elements.
  (Null when most_common_elems is.)

So we might modify it so that it's always defined - either it tracks the
same values as today (when most_common_elems is defined), or the
frequency of the most common element (when most_common_elems is NULL).

This way we can keep the current theoretical error-bound on the MCE
frequencies, and if that's not possible we can have at least the new
value without confusing existing code.


I'd also briefly considered just having the part of the code that
pulls the stats out of pg_stats interpret a MCE array as meaning
that nothing is more frequent than the threshold, but that would mean
that that part of the code needs to know about how the threshold is
chosen, which just seems wrong. And it would need to know the
difference between NULL MCE because no stats were gathered, versus
because stats were gathered but nothing met the threshold.


I'm not sure whether this is the same thing I just proposed ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-20 Thread Amit Kapila
On Thu, Aug 20, 2015 at 3:38 PM, Amit Kapila amit.kapil...@gmail.com
wrote:

 On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund and...@anarazel.de wrote:
 
 
  How hard did you try checking whether this causes regressions? This
  increases the number of atomics in the commit path a fair bit. I doubt
  it's really bad, but it seems like a good idea to benchmark something
  like a single full-throttle writer and a large number of readers.
 

 I think the case which you want to stress is when the patch doesn't
 have any benefit (like single writer case)


I mean to say single writer, multiple readers.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-20 Thread Andres Freund
On 2015-08-20 15:38:36 +0530, Amit Kapila wrote:
 On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund and...@anarazel.de wrote:
  I spent some time today reviewing the commited patch. So far my only
  major complaint is that I think the comments are only insufficiently
  documenting the approach taken:
  Stuff like avoiding ABA type problems by clearling the list entirely and
  it being impossible that entries end up on the list too early absolutely
  needs to be documented explicitly.
 
 
 I think more comments can be added to explain such behaviour if it is
 not clear via looking at current code and comments.

It's not mentioned at all, so yes.

 I think you are right and here we need to use something like what is
 suggested below by you.  Originally the code was similar to what you
 have written below, but it was using a different (new) variable to achieve
 what you have achieved with lwWaiting and to avoid the use of new
 variable the code has been refactored in current way.  I think we should
 do this change (I can write a patch) unless Robert feels otherwise.

I think we can just rename lwWaiting to something more generic.


 Consider what happens if such a follower enqueues in another
 transaction. It is not, as far as I could find out, guaranteed on all
 types of cpus that a third backend can actually see nextClearXidElem
 as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
 sockets. If the write to nextClearXidElem is entered into the local
 store buffer (leader #1) a hyper-threaded process (leader #2) can
 possibly see it (store forwarding) while another backend doesn't
 yet.
 
 I think this is very unlikely to be an actual problem due to
 independent barriers until enqueued again, but I don't want to rely
 on it undocumentedly. It seems safer to replace
 +wakeidx = pg_atomic_read_u32(proc-nextClearXidElem);
 +pg_atomic_write_u32(proc-nextClearXidElem,
 INVALID_PGPROCNO);
 with a pg_atomic_exchange_u32().
 
 
 I didn't follow this point, if we ensure that follower can never return
 before leader wakes it up, then why it will be a problem to update
 nextClearXidElem like above.

Because it doesn't generally enforce that *other* backends have seen the
write as there's no memory barrier.


  +/*
  + * ProcArrayGroupClearXid -- group XID clearing
  + *
  + * When we cannot immediately acquire ProcArrayLock in exclusive mode at
  + * commit time, add ourselves to a list of processes that need their XIDs
  + * cleared.  The first process to add itself to the list will acquire
  + * ProcArrayLock in exclusive mode and perform
 ProcArrayEndTransactionInternal
  + * on behalf of all group members.  This avoids a great deal of context
  + * switching when many processes are trying to commit at once, since the
 lock
  + * only needs to be handed from the last share-locker to one process
 waiting
  + * for the exclusive lock, rather than to each one in turn.
  + */
  +static void
  +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
  +{
 
  This comment, in my opinion, is rather misleading. If you have a
  workload that primarily is slowed down due to transaction commits, this
  patch doesn't actually change the number of context switches very
  much. Previously all backends enqueued in the lwlock and got woken up
  one-by-one. Safe backends 'jumping' the queue while a lock has been
  released but the woken up backend doesn't yet run, there'll be exactly
  as many context switches as today.
 
 
 I think this is debatable as in previous mechanism all the backends
 one-by-one clears their transaction id's (which is nothing but context
 switching) which lead to contention not only with read lockers
 but Write lockers as well.

Huh? You can benchmark it, there's barely any change in the number of
context switches.

I am *not* saying that there's no benefit to the patch, I am saying that
context switches are the wrong explanation. Reduced contention (due to
shorter lock holding times, fewer cacheline moves etc.) is the reason
this is beneficial.


  I don't think it's a good idea to use the variable name in PROC_HDR and
  PGPROC, it's confusing.

 What do you mean by this, are you not happy with variable name?

Yes. I think it's a bad idea to have the same variable name in PROC_HDR
and PGPROC.

struct PGPROC
{
...
/* Support for group XID clearing. */
volatile pg_atomic_uint32   nextClearXidElem;
...
}

typedef struct PROC_HDR
{
...
/* First pgproc waiting for group XID clear */
volatile pg_atomic_uint32 nextClearXidElem;
...
}

PROC_HDR's variable imo isn't well named.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Magnus Hagander
On Thu, Aug 20, 2015 at 12:55 PM, Marko Tiikkaja ma...@joh.to wrote:

 On 8/20/15 12:53 PM, Magnus Hagander wrote:

 We could change it to radius_servers and radius_ports, and deprecate but
 keep accepting the old parameters for a release or two.


 That's one option..

 To make it easy, we
 make sure that both parameter names accepts the same format parameter, so
 it's easy enough to just replace it once deprecated.


 I'm not sure I understand what you mean by this.


I mean that you could write radius_server=foo or radius_servers=foo as well
as radius_server=foo,bar and radius_servers=foo,bar. As long as you don't
specify both radius_server and radius_servers, either one of them should
accept either one server or multiple servers.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-20 Thread Amit Kapila
On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund and...@anarazel.de wrote:

 Hi,

  On Wed, Aug 5, 2015 at 10:59 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  OK, committed.

 I spent some time today reviewing the commited patch. So far my only
 major complaint is that I think the comments are only insufficiently
 documenting the approach taken:
 Stuff like avoiding ABA type problems by clearling the list entirely and
 it being impossible that entries end up on the list too early absolutely
 needs to be documented explicitly.


I think more comments can be added to explain such behaviour if it is
not clear via looking at current code and comments.

 I think I found a few minor correctness issues. Mostly around the fact
 that we, so far, tried to use semaphores in a way that copes with
 unrelated unlocks arriving. I actually think I removed all of the
 locations that caused that to happen, but I don't think we should rely
 on that fact. Looking at the following two pieces of code:
 /* If the list was not empty, the leader will clear our XID. */
 if (nextidx != INVALID_PGPROCNO)
 {
 /* Sleep until the leader clears our XID. */
 while (pg_atomic_read_u32(proc-nextClearXidElem) !=
INVALID_PGPROCNO)
 {
 extraWaits++;
 PGSemaphoreLock(proc-sem);
 }

 /* Fix semaphore count for any absorbed wakeups */
 while (extraWaits--  0)
 PGSemaphoreUnlock(proc-sem);
 return;
 }
 ...
 /*
  * Now that we've released the lock, go back and wake everybody
up.  We
  * don't do this under the lock so as to keep lock hold times to a
  * minimum.  The system calls we need to perform to wake other
processes
  * up are probably much slower than the simple memory writes we
did while
  * holding the lock.
  */
 while (wakeidx != INVALID_PGPROCNO)
 {
 PGPROC  *proc = allProcs[wakeidx];

 wakeidx = pg_atomic_read_u32(proc-nextClearXidElem);
 pg_atomic_write_u32(proc-nextClearXidElem,
INVALID_PGPROCNO);

 if (proc != MyProc)
 PGSemaphoreUnlock(proc-sem);
 }

 There's a bunch of issues with those two blocks afaics:

 1) The first block (in one backend) might read INVALID_PGPROCNO before
ever locking the semaphore if a second backend quickly enough writes
INVALID_PGPROCNO. That way the semaphore will be permanently out of
balance.


I think you are right and here we need to use something like what is
suggested below by you.  Originally the code was similar to what you
have written below, but it was using a different (new) variable to achieve
what you have achieved with lwWaiting and to avoid the use of new
variable the code has been refactored in current way.  I think we should
do this change (I can write a patch) unless Robert feels otherwise.

 2) There's no memory barriers around dealing with nextClearXidElem in
the first block. Afaics there needs to be a read barrier before
returning, otherwise it's e.g. not guaranteed that the woken up
backend sees its own xid set to InvalidTransactionId.

 3) If a follower returns before the leader has actually finished woken
that respective backend up we can get into trouble:


Surely that can lead to a problem and I think the reason and fix
for this is same as first point.

Consider what happens if such a follower enqueues in another
transaction. It is not, as far as I could find out, guaranteed on all
types of cpus that a third backend can actually see nextClearXidElem
as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
sockets. If the write to nextClearXidElem is entered into the local
store buffer (leader #1) a hyper-threaded process (leader #2) can
possibly see it (store forwarding) while another backend doesn't
yet.

I think this is very unlikely to be an actual problem due to
independent barriers until enqueued again, but I don't want to rely
on it undocumentedly. It seems safer to replace
+wakeidx = pg_atomic_read_u32(proc-nextClearXidElem);
+pg_atomic_write_u32(proc-nextClearXidElem,
INVALID_PGPROCNO);
with a pg_atomic_exchange_u32().


I didn't follow this point, if we ensure that follower can never return
before leader wakes it up, then why it will be a problem to update
nextClearXidElem like above.


 I think to fix these ProcArrayGroupClearXid() should use a protocol
 similar to lwlock.c. E.g. make the two blocks somethign like:
 while (wakeidx != INVALID_PGPROCNO)
 {
 PGPROC  *proc = allProcs[wakeidx];

 wakeidx = pg_atomic_read_u32(proc-nextClearXidElem);
 pg_atomic_write_u32(proc-nextClearXidElem,
INVALID_PGPROCNO);

 

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Marko Tiikkaja

On 8/20/15 12:53 PM, Magnus Hagander wrote:

We could change it to radius_servers and radius_ports, and deprecate but
keep accepting the old parameters for a release or two.


That's one option..


To make it easy, we
make sure that both parameter names accepts the same format parameter, so
it's easy enough to just replace it once deprecated.


I'm not sure I understand what you mean by this.


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Paul Ramsey
On August 20, 2015 at 2:17:31 AM, Simon Riggs 
(si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:

 On 18 August 2015 at 21:03, Paul Ramsey wrote:
  
  So I need a way to either (a) notice when I already have a (old) copy
  of the library loaded and avoid trying to setup the GUC in that case
  or (b) set-up the GUC in a somewhat less brittle way than
  DefineCustomStringVariable() allows, something that can overwrite
  things instead of just erroring out.
  
 Are you trying to preserve the in-memory state across upgrade as well? It 
 sounds unlikely we can support that directly in the general case. 

I’m not sure what you mean by this.

 Sounds like we need RedefineCustomStringVariable() 

Yes, if that had existed we would not have had any problems (as long as it 
delegated back to Define..() in the case where the variable hadn’t been created 
yet…, since one of the problems is knowing if/to-what-extent a custom variable 
has already been defined).

We do now have a fix, which involved copying about 100 lines of core code 
(find_option, guc_var_compare, guc_name_compare) up, that does a low level 
search to see if there is a config_generic for a particular variable name, and 
if so whether it’s a placeholder or not. The presence of a non-placeholding 
definition is used as a “uh oh, there’s already a library in here” warning 
which keeps us from re-defining the variable and causing trouble.

P.

  
  --
 Simon Riggs http://www.2ndQuadrant.com/(http://www.2ndquadrant.com/)
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Magnus Hagander
On Thu, Aug 20, 2015 at 2:36 AM, Marko Tiikkaja ma...@joh.to wrote:

 On 2015-08-20 02:29, Tom Lane wrote:

 Marko Tiikkaja ma...@joh.to writes:

 So I'm developing a patch to fix this issue, but I'm not
 exactly sure what the configuration should look like.  I see multiple
 options, but the one I like the best is the following:


 Add two new HBA configuration options: radiusfallbackservers and
 radiusfallbackports; both lists parsed with SplitIdentifierString (Ã  la
 listen_addresses).


 Why add new GUCs for that?  Can't we just redefine radiusserver as a list
 of servers to try in sequence, and similarly split radiusport into a list?

 Or maybe better, rename it radius_servers.  But what you have here seems
 weird, and it certainly doesn't follow the precedent of what we did when
 we replaced listen_address with listen_addresses.


 If we were adding RADIUS support today, this would be the best option. But
 seeing that we still only support one server today, this seems like a
 backwards incompatibility which practically 100% of users won't benefit
 from.  But I don't care much either way.  If we think breaking
 compatibility here is acceptable, I'd say we go for radius_servers and
 radius_ports.


We could change it to radius_servers and radius_ports, and deprecate but
keep accepting the old parameters for a release or two. To make it easy, we
make sure that both parameter names accepts the same format parameter, so
it's easy enough to just replace it once deprecated.

And we could consider throwing a WARNING or at least a LOG when the old
name is used, indicating that it's deprecated.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Marko Tiikkaja

On 8/20/15 12:57 PM, Magnus Hagander wrote:

I mean that you could write radius_server=foo or radius_servers=foo as well
as radius_server=foo,bar and radius_servers=foo,bar. As long as you don't
specify both radius_server and radius_servers, either one of them should
accept either one server or multiple servers.


I think if we do it this way, I'd prefer if radiusserver was kept 
fully backwards compatible.  So either specify radiusserver or 
radius_servers, but only the second one accepts a list.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 5:02 PM, Greg Stark st...@mit.edu wrote:
 I haven't thought through the exponential
 growth carefully enough to tell if doubling the run size should
 decrease the number of passes linearly or by a constant number.

It seems that with 5 times the data that previously required ~30MB to
avoid a multi-pass sort (where ~2300MB is required for an internal
sort -- the benchmark query), it took ~60MB to avoid a multi-pass
sort. I guess I just didn't exactly determine either threshold due to
that taking too long, and that as predicted, every time the input size
quadruples, the required amount of work_mem to avoid multiple passes
only doubles. That will need to be verified more vigorously, but it
looks that way.

 But you're right that seems to be less and less a realistic scenario.
 Times when users are really processing data sets that large nowadays
 they'll just throw it into Hadoop or Biigquery or whatever to get the
 parallelism of many cpus. Or maybe Citus and the like.

I'm not sure that even that's generally true, simply because sorting a
huge amount of data is very expensive -- it's not really a big data
thing, so to speak. Look at recent results on this site:

http://sortbenchmark.org

Last year's winning Gray entrant, TritonSort, uses a huge parallel
cluster of 186 machines, but only sorts 100TB. That's just over 500GB
per node. Each node is a 32 core Intel Xeon EC2 instance with 244GB
memory, and lots of SSDs. It seems like the point of the 100TB minimum
rule in the Gray contest category is that that's practically
impossible to fit entirely in memory (to avoid merging).

Eventually, linearithmic growth becomes extremely painful, not matter
how much processing power you have. It takes a while, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Greg Stark
On Thu, Aug 20, 2015 at 11:16 PM, Peter Geoghegan p...@heroku.com wrote:
 It could reduce seek time, which might be the dominant cost (but not
 I/O as such).

No I didn't quite follow the argument to completion. Increasing the
run size is a win if it reduces the number of passes. In the
single-pass case it has to read all the data once, write it all out to
tapes, then read it all back in again.So 3x the data. If it's still
not sorted it
needs to write it all back out yet again and read it all back in
again. So 5x the data. If the tapes are larger it can avoid that 66%
increase in total I/O. In large data sets it can need 3, 4, or maybe
more passes through the data and saving one pass would be a smaller
incremental difference. I haven't thought through the exponential
growth carefully enough to tell if doubling the run size should
decrease the number of passes linearly or by a constant number.

But you're right that seems to be less and less a realistic scenario.
Times when users are really processing data sets that large nowadays
they'll just throw it into Hadoop or Biigquery or whatever to get the
parallelism of many cpus. Or maybe Citus and the like.

The main case where I expect people actually run into this is in
building indexes, especially for larger data types (which come to
think of it might be exactly where the comparison is expensive enough
that quicksort's cache efficiency isn't helpful).

But to do fair tests I would suggest you configure work_mem smaller
(since running tests on multi-terabyte data sets is a pain) and sort
some slower data types that don't fit in memory. Maybe arrays of text
or json?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-08-20 Thread Rajeev rastogi
On 18 August 2015 21:18, Robert Haas Wrote:

This footnote goes to my point.

It seems clear to me that having the autonomous transaction see the
effects of the outer uncommitted transaction is a recipe for trouble.
If the autonomous transaction updates a row and commits, and the outer
transaction later aborts, the resulting state is inconsistent with any
serial history.  I'm fairly certain that's going to leave us in an
unhappy place.

Even more obviously, ending up with two committed row versions that are
both updates of a single ancestor version is no good.

So, I agree that this scenario should be an error.  What I don't agree
with is the idea that it should be the deadlock detector's job to throw
that error.  Rather, I think that when we examine the xmax of the tuple
we can see - which is the original one, not the one updated by the outer
transaction - we should check whether that XID belongs to an outer
transaction.  If it does, we should throw an error instead of trying to
lock it.  That way (1) the error message will be clear and specific to
the situation and (2) we don't need a separate PGPROC for each
autonomous transaction.  The first of those benefits is agreeable; the
second one is, in my opinion, a key design goal for this feature.

Yes I agree with this. I was in favor of error all the time without involving 
deadlock detector. 

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DBT-3 with SF=20 got failed

2015-08-20 Thread Kouhei Kaigai
 On 08/19/2015 03:53 PM, Tom Lane wrote:
 
  I don't see anything very wrong with constraining the initial
  allocation to 1GB, or even less. That will prevent consuming insane
  amounts of work_mem when the planner's rows estimate is too high
  rather than too low. And we do have the ability to increase the hash
  table size on the fly.
 
 Perhaps. Limiting the initial allocation to 1GB might help with lowering
 memory consumed in case of over-estimation, and it's high enough so that
 regular queries don't run into that.
 
 My initial thought was if the memory consumption is a problem, lower
 the work_mem but after thinking about it for a while I don't see a
 reason to limit the memory consumption this way, if possible.
 
 But what is the impact on queries that actually need more than 1GB of
 buckets? I assume we'd only limit the initial allocation and still allow
 the resize based on the actual data (i.e. the 9.5 improvement), so the
 queries would start with 1GB and then resize once finding out the
 optimal size (as done in 9.5). The resize is not very expensive, but
 it's not free either, and with so many tuples (requiring more than 1GB
 of buckets, i.e. ~130M tuples) it's probably just a noise in the total
 query runtime. But I'd be nice to see some proofs of that ...

The problem here is we cannot know exact size unless Hash node doesn't
read entire inner relation. All we can do is relying planner's estimation,
however, it often computes a crazy number of rows.
I think resizing of hash buckets is a reasonable compromise.

 Also, why 1GB and not 512MB (which is effectively the current limit on
 9.4, because we can't allocate 1GB there so we end up with 1/2 of that)?
 Why not some small percentage of work_mem, e.g. 5%?

No clear reason at this moment.

 Most importantly, this is mostly orthogonal to the original bug report.
 Even if we do this, we still have to fix the palloc after the resize.
 
 So I'd say let's do a minimal bugfix of the actual issue, and then
 perhaps improve the behavior in case of significant overestimates. The
 bugfix should happen ASAP so that it gets into 9.5.0 (and backported).
 We already have patches for that.
 
 Limiting the initial allocation deserves more thorough discussion and
 testing, and I'd argue it's 9.6 material at this point.

Indeed, the original bug was caused by normal MemoryContextAlloc().
The issue around memory over consumption is a problem newly caused
by this. I didn't notice it two months before.

  The real problem here is the potential integer overflow in
  ExecChooseHashTableSize. Now that we know there is one, that should
  be fixed (and not only in HEAD/9.5).
 
 I don't think there's any integer overflow. The problem is that we end
 up with nbuckets so high that (nbuckets*8) overflows 1GB-1.
 
 There's a protection against integer overflow in place (it was there for
 a long time), but there never was any protection against the 1GB limit.
 Because we've been using much smaller work_mem values and
 NTUP_PER_BUCKET=10, so we could not actually reach it.
 
  But I believe Kaigai-san withdrew his initial proposed patch, and we
  don't have a replacementas far as I saw.
 
 I believe the patch proposed by KaiGai-san is the right one to fix the
 bug discussed in this thread. My understanding is KaiGai-san withdrew
 the patch as he wants to extend it to address the over-estimation issue.

 I don't think we should do that - IMHO that's an unrelated improvement
 and should be addressed in a separate patch.

OK, it might not be a problem we should conclude within a few days, just
before the beta release.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-20 Thread Tom Lane
Arthur Silva arthur...@gmail.com writes:
 Are we landing pg_tgrm 1.2 in pg 9.5?

No, we aren't.

And please don't quote 70 lines of unrelated stuff before making
your point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-20 Thread Arthur Silva
On Sat, Aug 8, 2015 at 11:04 PM, Bruce Momjian br...@momjian.us wrote:

 On Sun, Aug  9, 2015 at 01:24:33AM +1200, David Rowley wrote:
 
  On 7 August 2015 at 14:24, Bruce Momjian br...@momjian.us wrote:
 
  On Tue, Jun 30, 2015 at 09:00:44PM +0200, Andres Freund wrote:
   * 2014-12-08 [519b075] Simon ..: Use GetSystemTimeAsFileTime
 directly in
  win32
 2014-12-08 [8001fe6] Simon ..: Windows: use
  GetSystemTimePreciseAsFileTime if ..
 Timer resolution isn't a unimportant thing for people using
 explain?
 
  This all seemed very internals-only, e.g.:
 
  On most Windows systems this change will actually have no
 significant
  effect on
  timestamp resolution as the system timer tick is typically
 between 1ms
  and 15ms
  depending on what timer resolution currently running
 applications have
  requested. You can check this with clockres.exe from
 sysinternals.
  Despite the
  platform limiation this change still permits capture of finer
  timestamps where
  the system is capable of producing them and it gets rid of an
  unnecessary
  syscall.
 
  Was I wrong?
 
 
 
  This does have a user visible change. Timestamps are now likely to have 6
  digits after the decimal point, if they're on a version of windows which
  supports GetSystemTimePreciseAsFileTime();
 
  Master:
 
  postgres=# select now();
now
  ---
   2015-08-09 01:14:01.959645+12
  (1 row)
 
  9.4.4
  postgres=# select now();
  now
  
   2015-08-09 01:15:09.783+12
  (1 row)

 Yes, this was already in the release notes:

 Allow higher-precision timestamp resolution on systemitem
 class=osnameWindows 8/ or systemitem class=osnameWindows
 Server 2012/ and later Windows systems (Craig Ringer)

 I am not sure why people were saying it was missing.

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + Everyone has their own god. +


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Are we landing pg_tgrm 1.2 in pg 9.5?

If yes (we should), up to an order of magnitude improvements is a worthy
inclusion in the release notes.

--
Arthur Silva


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-08-20 Thread Michael Paquier
On Thu, Jul 23, 2015 at 11:48 PM, Paul Ramsey pram...@cleverelephant.ca
wrote:
 On Wed, Jul 22, 2015 at 12:19 PM, Paul Ramsey pram...@cleverelephant.ca
wrote:

 I’ll have a look at doing invalidation for the case of changes to the FDW
 wrappers and servers.

 Here's an updated patch that clears the cache on changes to foreign
 wrappers and servers.

Thanks. So I have finally looked at it and this is quite cool. Using for
example this setup:
CREATE EXTENSION seg;
CREATE EXTENSION postgres_fdw;
CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(host 'localhost', port '5432', dbname 'postgres', extensions 'seg');
CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS (password '');
CREATE FOREIGN TABLE aa_foreign (a seg) SERVER postgres_server OPTIONS
(table_name 'aa');
CREATE SERVER postgres_server_2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(host 'localhost', port '5432', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER postgres_server_2 OPTIONS (password
'');
CREATE FOREIGN TABLE aa_foreign_2 (a seg) SERVER postgres_server_2 OPTIONS
(table_name 'aa');

I can get the following differences by shipping an expression or not:
=# explain verbose select * from aa_foreign where a = '1 ... 2'::seg;
QUERY
PLAN
---
 Foreign Scan on public.aa_foreign  (cost=100.00..138.66 rows=11 width=12)
   Output: a
   Remote SQL: SELECT a FROM public.aa WHERE ((a OPERATOR(public.=) '1 ..
2'::public.seg))
(3 rows)
=# explain verbose select * from aa_foreign_2 where a = '1 ... 2'::seg;
 QUERY PLAN
-
 Foreign Scan on public.aa_foreign_2  (cost=100.00..182.44 rows=11 width=12)
   Output: a
   Filter: (aa_foreign_2.a = '1 .. 2'::seg)
   Remote SQL: SELECT a FROM public.aa
(4 rows)

if (needlabel)
appendStringInfo(buf, ::%s,
-
format_type_with_typemod(node-consttype,
-
node-consttypmod));
+
format_type_be_qualified(node-consttype));
Pondering more about this one, I think that we are going to need a new
routine in format_type.c to be able to call format_type_internal as
format_type_internal(type_oid, typemod, true/false, false, true). If
typemod is -1, then typemod_given (the third argument) is false, otherwise
typemod_given is true. That's close to what the C function format_type at
the SQL level can do except that we want it to be qualified. Regression
tests will need an update as well.

+   /* Option validation calls this function with NULL in the */
+   /* extensionOids parameter, to just do existence/syntax */
+   /* checking of the option */
Comment format is incorrect, this should be written like that:
+   /*
+* Option validation calls this function with NULL in the
+* extensionOids parameter, to just do existence/syntax
+* checking of the option.
+*/

+   if (!extension_list)
+   return false;
I would rather mark that as == NIL instead, NIL is what an empty list is.

+extern bool is_shippable(Oid procnumber, PgFdwRelationInfo *fpinfo);
There is no need to pass PgFdwRelationInfo to is_shippable. Only the list
of extension OIDs is fine.

+   bool shippable;/* extension the object appears within, or
InvalidOid if none */
That's nitpicking, but normally we avoid more than 80 characters per line
of code.

When attempting to create a server when an extension is not installed we
get an appropriate error:
=# CREATE SERVER postgres_server
 FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5432', dbname 'postgres', extensions
'seg');
ERROR:  42601: the seg extension must be installed locally before it can
be used on a remote server
LOCATION:  extractExtensionList, shippable.c:245
However, it is possible to drop an extension listed in a postgres_fdw
server. Shouldn't we invalidate cache as well when pg_extension is updated?
Thoughts?

+   if (!SplitIdentifierString(extensionString, ',', extlist))
+   {
+   list_free(extlist);
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg(unable to parse extension list \%s\,
+   extensionString)));
+   }
It does not matter much to call list_free here. The list is going to be
freed in the error context with ERROR.

+   foreach(ext, extension_list)
+   {
+   Oid extension_oid = (Oid) lfirst(ext);
+   if (foundDep-refobjid == extension_oid)
+   {
+   nresults++;
+   }
+   }
You could just use list_member_oid here. And why not 

Re: [HACKERS] TAP tests are badly named

2015-08-20 Thread Andrew Dunstan



On 08/16/2015 08:30 PM, Michael Paquier wrote:

On Mon, Aug 17, 2015 at 7:15 AM, Noah Misch n...@leadboat.com wrote:

On Sun, Aug 16, 2015 at 05:08:56PM -0400, Andrew Dunstan wrote:

On 08/16/2015 02:23 PM, Noah Misch wrote:

-sub tapcheck
+sub tap_check
  {
-   InstallTemp();
+   die Tap tests not enabled in configuration
+ unless $config-{tap_tests};

I endorse Heikki's argument for having omitted the configuration flag:
http://www.postgresql.org/message-id/55b90161.5090...@iki.fi


That argument is not correct.  None of the tap tests can be run via make if
--enable-tap-tests is not set. This doesn't just apply to the check-world
target as Heikki asserted. Have a look at the definitions of prove_check and
prove_installcheck in src/Makefile.global.in if you don't believe me.

While that one piece of Heikki's argument was in error, I didn't feel that it
affected the conclusion.  Please reply to the aforementioned email to dispute
the decision; some involved parties may not be following this thread.

FWIW, I agree with Andrew's approach here. We are going to need this
parameter switch once TAP routines are used in another code path than
src/bin to control if a test can be run or not based on the presence
of t/ and the value of this parameter. See for example the patch aimed
at testing dumpable tables with pg_dump which should be part of the
target modulescheck as argued upthread.
My 2c.



I spoke to Heikki about this the other day, and he's fine with using the 
test if there's a need for it. In addition to Michael's point, the 
buildfarm has a need for it - if the flag isn't set it won't run the 
checks, so the flag should be supported.  I'm therefore going to stick 
with the code above.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Josh Berkus
On 08/20/2015 06:19 AM, David Fetter wrote:
 On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:
 Do you mean ATTACH and DETACH, if they require access exclusive lock on
 the parent, should not be in the first cut? Or am I misreading?
 
 Sorry I was unclear.
 
 ATTACH and DETACH should be in the first cut even if they require an
 access exclusive lock.
 
 Cheers,
 David.

I don't see a way for them to *ever* not require an access exclusive lock.

We could eventually implement:

DETACH PARTITION CONCURRENTLY

... but that's the only way I can see around it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote:


 The patch is ~3.25x faster than master


 I've tried to read this post twice and both times my work_mem overflowed.
 ;-)

 Can you summarize what this patch does? I understand clearly what it doesn't
 do...

The most important thing that it does is always quicksort runs, that
are formed by simply filling work_mem with tuples in no particular
order, rather than trying to make runs that are twice as large as
work_mem on average. That's what the ~3.25x improvement concerned.
That's actually a significantly simpler algorithm than replacement
selection, and appears to be much faster. You might even say that it's
a dumb algorithm, because it is less sophisticated than replacement
selection. However, replacement selection tends to use CPU caches very
poorly, while its traditional advantages have become dramatically less
important due to large main memory sizes in particular. Also, it hurts
that we don't currently dump tuples in batches, for several reasons.
Better to do memory intense operations in batch, rather than having a
huge inner loop, in order to minimize or prevent instruction cache
misses. And we can better take advantage of asynchronous I/O.

The complicated aspect of considering the patch is whether or not it's
okay to not use replacement selection anymore -- is that an
appropriate trade-off?

The reason that the code has not actually been simplified by this
patch is that I still want to use replacement selection for one
specific case: when it is anticipated that a quicksort with
spillover can occur, which is only possible with incremental
spilling. That may avoid most I/O, by spilling just a few tuples using
a heap/priority queue, and quicksorting everything else. That's
compelling when you can manage it, but no reason to always use
replacement selection for the first run in the common case where there
well be several runs in total.

Is that any clearer? To borrow a phrase from the processor
architecture community, from a high level this is a Brainiac versus
Speed Demon [1] trade-off. (I wish that there was a widely accepted
name for this trade-off.)

[1] http://www.lighterra.com/papers/modernmicroprocessors/#thebrainiacdebate
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 Alternately, has anyone tested whether Timsort would work well?

I think that was proposed a few years ago and did not look so good
in simple testing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Greg Stark
On Wed, Jun 25, 2014 at 6:05 PM, John Klos j...@ziaspace.com wrote:
 While I wouldn't be surprised if you remove the VAX code because not many
 people are going to be running PostgreSQL, I'd disagree with the assessment
 that this port is broken. It compiles, it initializes databases, it runs, et
 cetera, albeit not with the default postgresql.conf.


So I've been playing with this a bit. I have simh running on my home
server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
intended to be a SAN and its cpu is woefully underpowered so the
resulting VAX is actually very very slow. So slow I wonder if there's
a bug in the emulator but anyways.

I'm coming to the conclusion that the port doesn't really work
practically speaking but the failures are more interesting than I
expected. They come in a few varieties:

1) Vax does not have IEEE fp. This manifests in a few ways, some of
which may be our own bugs or missing expected outputs. The numeric
data type maths often produce numbers rounded off differently, the
floating point tests print numbers one digit shorter than our expected
results expect and sometimes in scientific notation where we don't
expect. The overflow tests generate floating point exceptions rather
than overflows. Infinity and NaN don't work. The Json code in
particular generates large numbers where +/- Infinity literals are
supplied.

There are some planner tests that fail with floating point exceptions
-- that's probably a bug on our part. And I've seen at least one
server crash (maybe two) apparently caused by one as well which I
don't believe is expected.

2) The initdb problem is actually not our fault. It looks like a
NetBSD kernel bug when allocating large shared memory blocks on a
machine without lots of memory. There's not much initdb can do with a
kernel panic...  BSD still has the problem of kern.maxfiles defaulting
to a value low enough that even two connections causes the regression
tests to run out of file descriptors. That's documented and it would
be a right pain for initdb to detect that case.

3) The tests take so long to run that autovacuum kicks in and the
tests start producing rows in inconsistent orderings. I assume that's
a problem we've run into on the CLOBBER_CACHE animals as well?

4) One of the tablesample tests seems to freeze indefinitely. I
haven't looked into why yet. That might indeed indicate that the
spinlock code isn't working?

So my conclusion tentatively is that while the port doesn't actually
work practically speaking it is nevertheless uncovering some
interesting bugs.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Tom Lane
Paul Ramsey pram...@cleverelephant.ca writes:
 On August 20, 2015 at 2:17:31 AM, Simon Riggs 
 (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:
 Sounds like we need RedefineCustomStringVariable() 

 Yes, if that had existed we would not have had any problems (as long as it 
 delegated back to Define..() in the case where the variable hadn’t been 
 created yet…, since one of the problems is knowing if/to-what-extent a 
 custom variable has already been defined).

I'm not sure that the situation you describe can be expected to work
reliably; the problems are far wider than just GUC variables.  If two
different .so's are exposing broadly the same set of C function names,
how can we be sure which one will get called by the dynamic linker?
Isn't it likely that we'd end up continuing to call some functions
out of the old .so, probably leading to disaster?

I don't necessarily object to providing some solution for GUCs, but
I think first we need to question whether that isn't just the tip of
a large iceberg.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Marko Tiikkaja

On 8/20/15 4:25 PM, Magnus Hagander wrote:

On Aug 20, 2015 4:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Our expectations about forward compatibility for postgresql.conf entries
have always been pretty low; even more so for not-widely-used settings.

In any case, wouldn't what you describe simply put off the pain for awhile
(replacing it with confusion in the short term)?  Eventually we're going
to break it.



Well, that's true of any depreciation. And if we throw a log message then
people will know about it...

That said, I'm not against a clean break with compatibility either. As long
as it's clean - like renaming the parameters.


All right.  I'll submit a patch for that.

Thanks for the input, both.


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Magnus Hagander
On Aug 20, 2015 4:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Magnus Hagander mag...@hagander.net writes:
  On Thu, Aug 20, 2015 at 2:36 AM, Marko Tiikkaja ma...@joh.to wrote:
  On 2015-08-20 02:29, Tom Lane wrote:
  Why add new GUCs for that?  Can't we just redefine radiusserver as a
list
  of servers to try in sequence, and similarly split radiusport into a
list?

  We could change it to radius_servers and radius_ports, and deprecate but
  keep accepting the old parameters for a release or two. To make it
easy, we
  make sure that both parameter names accepts the same format parameter,
so
  it's easy enough to just replace it once deprecated.

 Considering that we did no such thing for listen_address, which is of
 concern to probably two or three orders-of-magnitude more users than
 radiusserver, I don't really see why we'd do it for the RADIUS settings.

 Our expectations about forward compatibility for postgresql.conf entries
 have always been pretty low; even more so for not-widely-used settings.

 In any case, wouldn't what you describe simply put off the pain for awhile
 (replacing it with confusion in the short term)?  Eventually we're going
 to break it.


Well, that's true of any depreciation. And if we throw a log message then
people will know about it...

That said, I'm not against a clean break with compatibility either. As long
as it's clean - like renaming the parameters.

/Magnus


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 So I've been playing with this a bit. I have simh running on my home
 server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
 intended to be a SAN and its cpu is woefully underpowered so the
 resulting VAX is actually very very slow. So slow I wonder if there's
 a bug in the emulator but anyways.

Fun fun!

 There are some planner tests that fail with floating point exceptions
 -- that's probably a bug on our part. And I've seen at least one
 server crash (maybe two) apparently caused by one as well which I
 don't believe is expected.

That seems worth poking into.

 3) The tests take so long to run that autovacuum kicks in and the
 tests start producing rows in inconsistent orderings. I assume that's
 a problem we've run into on the CLOBBER_CACHE animals as well?

I'd tentatively bet that it's more like planner behavioral differences
due to different floating-point roundoff.

 4) One of the tablesample tests seems to freeze indefinitely. I
 haven't looked into why yet. That might indeed indicate that the
 spinlock code isn't working?

The tablesample tests seem like a not-very-likely first place for such a
thing to manifest.  What I'm thinking is that there are places in there
where we loop till we get an expected result.  Offhand I thought they were
all integer math; but if one was float and the VAX code wasn't doing what
was expected, maybe we could blame this on float discrepancies as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-08-20 Thread Alvaro Herrera
Pavel Stehule wrote:

Hi,

 Job schedulers are important and sometimes very complex part of any
 software. PostgreSQL miss it. I propose new contrib module, that can be
 used simply for some tasks, and that can be used as base for other more
 richer schedulers. I prefer minimalist design - but strong enough for
 enhancing when it is necessary. Some complex logic can be implemented in PL
 better than in C. Motto: Simply to learn, simply to use, simply to
 customize.

Have you made any progress on this?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-08-20 Thread Alvaro Herrera
Jim Nasby wrote:

 I think things like pageinspect are very different; I really can't see any
 use for those beyond debugging (and debugging by an expert at that).

I don't think that necessarily means it must continue to be in contrib.
Quite the contrary, I think it is a tool critical enough that it should
not be relegated to be a second-class citizen as it is now (let's face
it, being in contrib *is* second-class citizenship).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] exposing pg_controldata and pg_config as functions

2015-08-20 Thread Andrew Dunstan

Is there any significant interest in either of these?

Josh Berkus tells me that he would like pg_controldata information, and 
I was a bit interested in pg_config information, for this reason: I had 
a report of someone who had configured using --with-libxml but the xml 
tests actually returned the results that are expected without xml being 
configured. The regression tests thus passed, but should not have. It 
occurred to me that if we had a test like


select pg_config('configure') ~ '--with-libxml' as has_xml;

in the xml tests then this failure mode would be detected.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Aug 20, 2015 at 2:36 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 2015-08-20 02:29, Tom Lane wrote:
 Why add new GUCs for that?  Can't we just redefine radiusserver as a list
 of servers to try in sequence, and similarly split radiusport into a list?

 We could change it to radius_servers and radius_ports, and deprecate but
 keep accepting the old parameters for a release or two. To make it easy, we
 make sure that both parameter names accepts the same format parameter, so
 it's easy enough to just replace it once deprecated.

Considering that we did no such thing for listen_address, which is of
concern to probably two or three orders-of-magnitude more users than
radiusserver, I don't really see why we'd do it for the RADIUS settings.

Our expectations about forward compatibility for postgresql.conf entries
have always been pretty low; even more so for not-widely-used settings.

In any case, wouldn't what you describe simply put off the pain for awhile
(replacing it with confusion in the short term)?  Eventually we're going
to break it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Paul Ramsey

On August 20, 2015 at 7:21:10 AM, Tom Lane 
(t...@sss.pgh.pa.us(mailto:t...@sss.pgh.pa.us)) wrote:

 I'm not sure that the situation you describe can be expected to work
 reliably; the problems are far wider than just GUC variables. If two
 different .so's are exposing broadly the same set of C function names,
 how can we be sure which one will get called by the dynamic linker?
 Isn't it likely that we'd end up continuing to call some functions
 out of the old .so, probably leading to disaster?

Well, when you put it that way it sounds pretty scary :) 

For whatever it’s worth, we’ve had versioned .so’s for a very long time, and 
every time an upgrade happens there is a period during which a backend will 
have two versions loaded simultaneously. But we only noticed recently when we 
got the GUC collision (our first GUC was only introduced in PostGIS 2.1). 
Perhaps because after upgrading most people disconnect, and then the next time 
they connect they only get the new library henceforth. In some cases (start a 
fresh backend and then do the upgrade immediately) it’s even possible to do an 
upgrade without triggering the double-load situation.

 I don't necessarily object to providing some solution for GUCs, but
 I think first we need to question whether that isn't just the tip of
 a large iceberg.

There’s no doubt that the GUC issue is just a symptom of a potentially awful 
larger disease, but so far it’s the only symptom we’ve observed. Maybe because 
only a small % of functionality ever changes in a given release, combined with 
the relatively short lifespan of the double-loaded condition, and the fact the 
problem goes away immediately on re-connect, any problems have always just been 
ignored.

P.

 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Simon Riggs
On 20 August 2015 at 13:21, Paul Ramsey pram...@cleverelephant.ca wrote:

 On August 20, 2015 at 2:17:31 AM, Simon Riggs (si...@2ndquadrant.com
 (mailto:si...@2ndquadrant.com)) wrote:

  On 18 August 2015 at 21:03, Paul Ramsey wrote:
 
   So I need a way to either (a) notice when I already have a (old) copy
   of the library loaded and avoid trying to setup the GUC in that case
   or (b) set-up the GUC in a somewhat less brittle way than
   DefineCustomStringVariable() allows, something that can overwrite
   things instead of just erroring out.
 
  Are you trying to preserve the in-memory state across upgrade as well?
 It sounds unlikely we can support that directly in the general case.

 I’m not sure what you mean by this.


The value of the global variable can't be maintained across upgrade.


  Sounds like we need RedefineCustomStringVariable()

 Yes, if that had existed we would not have had any problems (as long as it
 delegated back to Define..() in the case where the variable hadn’t been
 created yet…, since one of the problems is knowing if/to-what-extent a
 custom variable has already been defined).

 We do now have a fix, which involved copying about 100 lines of core code
 (find_option, guc_var_compare, guc_name_compare) up, that does a low level
 search to see if there is a config_generic for a particular variable name,
 and if so whether it’s a placeholder or not. The presence of a
 non-placeholding definition is used as a “uh oh, there’s already a library
 in here” warning which keeps us from re-defining the variable and causing
 trouble.


I'm sure we all agree PostGIS is an important use case. Core is the right
place to put such code.

Please submit a patch that does that - better than someone else trying to
get it right for you. Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Greg Stark
On Thu, Aug 20, 2015 at 3:24 AM, Peter Geoghegan p...@heroku.com wrote:
 I believe, in general, that we should consider a multi-pass sort to be
 a kind of inherently suspect thing these days, in the same way that
 checkpoints occurring 5 seconds apart are: not actually abnormal, but
 something that we should regard suspiciously. Can you really not
 afford enough work_mem to only do one pass? Does it really make sense
 to add far more I/O and CPU costs to avoid that other tiny memory
 capacity cost?

I think this is the crux of the argument. And I think you're
basically, but not entirely, right.

The key metric there is not how cheap memory has gotten but rather
what the ratio is between the system's memory and disk storage. The
use case I think you're leaving out is the classic data warehouse
with huge disk arrays attached to a single host running massive
queries for hours. In that case reducing run size will reduce I/O
requirements directly and halving the amount of I/O sort takes will
halve the time it takes regardless of cpu efficiency. And I have a
suspicion typical data distributions get much better than a 2x
speedup.

But I think you're basically right that this is the wrong use case to
worry about for most users. Even those users that do have large batch
queries are probably not processing so much that they should be doing
multiple passes. The ones that do are are probably more interested in
parallel query, federated databases, column stores, and so on rather
than worrying about just how many hours it takes to sort their
multiple terabytes on a single processor.

I am quite suspicious of quicksort though. It has O(n^2) worst case
and I think it's only a matter of time before people start worrying
about DOS attacks from users able to influence the data ordering. It's
also not very suitable for GPU processing. Quicksort gets most of its
advantage from cache efficiency, it isn't a super efficient algorithm
otherwise, are there not other cache efficient algorithms to consider?

Alternately, has anyone tested whether Timsort would work well?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread David Fetter
On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:
 On 2015-08-19 PM 09:52, David Fetter wrote:
  On Wed, Aug 19, 2015 at 04:30:39PM +0900, Amit Langote wrote:
  
  There are use cases where we need to warn people that their assertions
  need to be true, and if those assertions are not true, this will
  explode, leaving them to pick the resulting shrapnel out of their
  faces.  There are other parts of the system where this is true, as
  when people write UDFs in C.
  
  As I understand it, NOT VALID means, I assert that the tuples already
  here fit the constraint.  Any changes will be checked against the
  constraint.
  
  I've seen cases where a gigantic amount of data is coming out of some
  distributed system which holds the constraint as an invariant.  This
  let a DBA decide to add a NOT VALID constraint in order not to take
  the hit of a second full scan of the data, which might have made the
  import, and possibly the entire project, untenable.
 
 Ah, I understand the point of parameterization (TRUST). Seems like it
 would be good to have with appropriate documentation of the same. Perhaps,
 it might as well a parameter to the step 1 itself.

So TRUST would obviate step 2?  Better still!

  5. Detach partition
 
  ALTER TABLE partitioned_table
  DETACH PARTITION partition_name [USING table_name]
 
  This removes partition_name as partition of partitioned_table.
  The table continues to exist with the same name or 'table_name',
  if specified.  pg_class.relispartition is set to false for the
  table, so it behaves like a normal table.
 
  Could this take anything short of an access exclusive lock on the
  parent?
 
  Yes, both the step 1 of ATTACH command and DETACH command take
  access exclusive lock on the parent. They are rather quick metadata
  changes, so should not stall others significantly, I think.
  
  So no.  Weakening required locks has been something of an ongoing
  project, project-wide, and need not be part of the first cut of this
  long-needed feature.
  
 
 Do you mean ATTACH and DETACH, if they require access exclusive lock on
 the parent, should not be in the first cut? Or am I misreading?

Sorry I was unclear.

ATTACH and DETACH should be in the first cut even if they require an
access exclusive lock.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread David Steele
On 8/20/15 5:45 AM, Amit Langote wrote:
 On 2015-08-20 PM 06:27, Pavan Deolasee wrote:
 On Tue, Aug 18, 2015 at 4:00 PM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:

 PARTITION BY LIST ON (name)
 PARTITION BY RANGE ON (year, month)

 PARTITION BY LIST ON ((lower(left(name, 2)))
 PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))



 How about HASH partitioning? Are there plans to support foreign tables as
 partitions?

 
 I've not given HASH partitioning a lot of thought yet. About the latter,
 it seems it should not be too difficult to incorporate into the proposed
 partitioning internals.

Hash partitioning seems like it could wait.  If fact, I've nearly always
implemented hash partitions as list partitions.  This gives me control
over the hash function and allows me to use properties of the key to my
advantage.  For instance - if your key is a sha1 hash there's no need to
rehash, just grab the required number of bits off the end of the key.

My experiences with Oracle's hash function were generally not good -
there's a reason many hash algorithms exist.  If/when we do hash
partitioning in Postgres I'd like to see the hash function be
user-definable.

Meanwhile, I think list and range are a good start.  I'd prefer to see
sub-partitioning before hash partitioning.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote
On 2015-08-21 AM 06:27, David Fetter wrote:

 By the last sentence, do you mean only UPDATEs to the partition key that
 cause rows to jump partitions or simply any UPDATEs to the partition key?
 
 I don't know what Simon had in mind, but it seems to me that we have
 the following in descending order of convenience to users, and I
 presume, descending order of difficulty of implementation:
 
 1.  Updates propagate transparently, moving rows between partitions if needed.
 
 2.  Updates fail only if the change to a partition key would cause the
 row to have to move to another partition.
 
 3.  All updates to the partition key fail.
 

I was thinking I'd implement 2.

There was some discussion last year[1] about how 1 could be realized.

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/20140829172216.gf10...@awork2.anarazel.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Josh Berkus
On 08/20/2015 12:24 PM, Jim Nasby wrote:
 On 8/17/15 4:25 PM, Josh Berkus wrote:
 On 08/17/2015 02:18 PM, Jim Nasby wrote:
 On 8/17/15 3:33 PM, Josh Berkus wrote:
 Again, how do we handle missing keys?  Just return NULL?  or
 ERROR?  I'd
 prefer the former, but there will be arguments the other way.
 
 I've been wondering if we should add some kind of strict JSON. My big
 concern is throwing an error if you try to provide duplicate keys, but
 it seems reasonable that json_strict would throw an error if you try to
 reference something that doesn't exist.
 Only if there's demand for it.  Is there?
 
 I'm certainly worried (paranoid?) about it. Postgres is very good about
 not silently dropping data and this seems a glaring departure from that.
 I haven't looked yet but I'm hoping this could at least be added as an
 extension without duplicating a bunch of the existing JSON stuff.

There's a big difference between silently dropping data and implicitly
creating it. As I said, the only reason I could see wanting a strict
mode is because AppDev users expect it to be consistent with their
programming languages.  Otherwise, from a user perspective, being able
to create a whole nested chain in one statement is a big win.

What could be added as an extension?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread David Fetter
On Thu, Aug 20, 2015 at 11:16:37AM +0900, Amit Langote wrote:
 On 2015-08-19 PM 09:23, Simon Riggs wrote:
  On 18 August 2015 at 11:30, Amit Langote langote_amit...@lab.ntt.co.jp
  wrote:

  You haven't specified what would happen if an UPDATE would change a row's
  partition. I'm happy to add this to the list of restrictions by saying that
  the partition key cannot be updated.
 
 UPDATEs that change a row's partition would cause error. I haven't
 implemented that yet but will that way in the next patch.
 
 By the last sentence, do you mean only UPDATEs to the partition key that
 cause rows to jump partitions or simply any UPDATEs to the partition key?

I don't know what Simon had in mind, but it seems to me that we have
the following in descending order of convenience to users, and I
presume, descending order of difficulty of implementation:

1.  Updates propagate transparently, moving rows between partitions if needed.

2.  Updates fail only if the change to a partition key would cause the
row to have to move to another partition.

3.  All updates to the partition key fail.

Whichever of these we choose, we should document it with great
clarity.

  We'll need regression tests that cover each restriction and docs
  that match. This is not something we should leave until last.
  People read the docs to understand the feature, helping them to
  reach consensus. So it is for you to provide the docs before, not
  wait until later. I will begin a code review once you tell me docs
  and tests are present. We all want the feature, so its all about
  the details now.
 
 Sorry, should have added tests and docs already. I will add them in
 the next version of the patch. Thanks for willing to review.

Docs and tests are crucial, and thanks again for taking this on.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Stefan Kaltenbrunner
On 08/20/2015 06:09 PM, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 I wonder if we should have a default of capping the dump to say 1k lines 
 or such and only optionally do a full one.
 
 -1.  It's worked like this for the last fifteen years or thereabouts,
 and you're the first one to complain.  I suspect some weirdness in
 your logging setup, rather than any systemic problem that we
 need to lobotomize our diagnostic output in order to prevent.

not sure what you consider weird in the logging setup here - the context
dump is imho borderline on internal diagnostic output at a debug level
(rather than making sense to an average sysadmin) already (and no way to
control it). But having (like in our case) the backend dumping 2 million
basically identical lines into a general logfile per event seems
excessive and rather abusive towards the rest of the system (just from
an IO perspective for example or from a log file post processing tool
perspective)

 
 (The reason I say lobotomize is that there's no particularly good
 reason to assume that the first N lines will tell you what you need
 to know.  And the filter rule would have to be *very* stupid, because
 we can't risk trying to allocate any additional memory to track what
 we're doing here.)

I do understand that there migt be challenges there but in the last 15
years machines got way faster and pg got way capable and some of those
capabilities might need to get revisited in that regards - and while it
is very nice that pg survives multiple oom cases pretty nicely I dont
think it is entitled to put an imho unreasonable burden on the rest of
the system by writing insane amounts of data.

Just from a sysadmin perspective it also means that it is trivial for a
misbehaving app to fill up the logfile on a system because unlike almost
all other actual other logging output there seems to be no way to
control/disabled it on a per role/database level.


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Annotating pg_dump/pg_restore

2015-08-20 Thread Guillaume Lelarge
2015-08-20 18:43 GMT+02:00 Kevin Burke bu...@shyp.com:

 Hi,
 Normally I try to annotate incoming queries, to make it easier to diagnose
 slow ones. For example:

 -- Users.findByPhoneNumber
 SELECT * FROM 

 The pg_dump and pg_restore commands issue a COPY with no possibility of
 adding a comment. It would be useful to know who or what exactly is
 performing a COPY against a database - maybe a nightly backup script, maybe
 a developer copying a table.

 I was wondering if you could have a command line flag that let you attach
 a comment to the query?


You already have the application name. You just need to log it.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


[HACKERS] remove unused ExecGetScanType

2015-08-20 Thread Alvaro Herrera
I don't see any reason not to remove this.  It's been unused since
a191a169d6d0b9558da4519e66510c4540204a51, dated Jan 10 2007.

-- 
Álvaro Herrera   Peñalolén, Chile
La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos... (Yo, hablando de sueños eróticos)
commit 603323089ec6d86dbc9a504c8772b8a8699c00e8
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
AuthorDate: Thu Aug 20 18:06:33 2015 -0300
CommitDate: Thu Aug 20 18:06:33 2015 -0300

Remove unused function ExecGetScanType

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 3c611b9..a735815 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -713,18 +713,6 @@ ExecFreeExprContext(PlanState *planstate)
  */
 
 /* 
- *		ExecGetScanType
- * 
- */
-TupleDesc
-ExecGetScanType(ScanState *scanstate)
-{
-	TupleTableSlot *slot = scanstate-ss_ScanTupleSlot;
-
-	return slot-tts_tupleDescriptor;
-}
-
-/* 
  *		ExecAssignScanType
  * 
  */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 193a654..226f905 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -344,7 +344,6 @@ extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 extern void ExecAssignProjectionInfo(PlanState *planstate,
 		 TupleDesc inputDesc);
 extern void ExecFreeExprContext(PlanState *planstate);
-extern TupleDesc ExecGetScanType(ScanState *scanstate);
 extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
 extern void ExecAssignScanTypeFromOuterPlan(ScanState *scanstate);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] allowing wal_level change at run time

2015-08-20 Thread Peter Eisentraut
On 8/19/15 9:32 AM, Andres Freund wrote:
 I agree that we want both.  But requiring a restart is a hard stop,
 whereas making configuration easier is a soft feature.
 
 I don't think it makes that much of a difference for people new to
 postgres.

People new to postgres are not the only audience for this change.

 To deal with streaming replication, we automatically create slots for
 replicas, but, by default, *without* having them reserve WAL. The slot
 name would be determined in some automatic fashion (uuid or something)
 and stored in a new file in the data directory.  That allows us to
 increase the internal wal_level to hot_standby/archive whenever a
 replica has connected (and thus a physical slot exists), and to logical
 whenever a logical slot exists.

 That seems kind of weird.  So every time a replication client connects,
 we create a new replication slot?  Won't they accumulate?
 
 No, that's not what I was thinking of. The name of the slot would be
 stored somewhere in the data directory, and then be reused henceforth.

It seems to me that this would effectively replace the wal_level
parameter with the presence or absence of a magic replication slot.
That doesn't seem like a net improvement.  It just replaces one
well-known configuration mechanism with a new ad hoc one.

 Also note that this scheme and any similar one requires merging the
 archive and hot_standby levels, which was the core of your original
 proposal [1]
 
 It doesn't really, we could continue to keep them separate. I'm
 proposing to maintain wal_level automatically, so it'd not be
 configurable anymore, so it'd not matter much, as long as we're not less
 efficient.

But, under any scheme to set wal_level automatically, how will the
primary know whether it needs to use level archive or hot_standby?
There is no way to know.  So if we are going to keep these levels
separate, there would need to be a manual way to switch between them.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian ft...@vitessedata.com wrote:
 Just a quick anecdotal evidence.  I did similar experiment about three years
 ago.   The conclusion was that if you have SSD, just do quick sort and
 forget the longer runs, but if you are using hard drives, longer runs is the
 winner (and safer, to avoid cliffs).I did not experiment with RAID0/5 on
 many spindles though.

 Not limited to sort, more generally, SSD is different enough from HDD,
 therefore it may worth the effort for backend to guess what storage device
 it has, then choose the right thing to do.

The devil is in the details. I cannot really comment on such a general
statement.

I would be willing to believe that that's true under
unrealistic/unrepresentative conditions. Specifically, when multiple
passes are required with a sort-merge strategy where that isn't the
case with replacement selection. This could happen with a tiny
work_mem setting (tiny in an absolute sense more than a relative
sense). With an HDD, where sequential I/O is so much faster, this
could be enough to make replacement selection win, just as it would
have in the 1970s with magnetic tapes.

As I've said, the solution is to simply avoid multiple passes, which
should be possible in virtually all cases because of the quadratic
growth in a classic hybrid sort-merge strategy's capacity to avoid
multiple passes (growth relative to work_mem's growth). Once you
ensure that, then you probably have a mostly I/O bound workload, which
can be made faster by adding sequential I/O capacity (or, on the
Postgres internals side, adding asynchronous I/O, or with memory
prefetching). You cannot really buy a faster CPU to make a degenerate
heapsort faster.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-08-20 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/20/2015 06:59 AM, Andrew Dunstan wrote:
 Is there any significant interest in either of these?
 
 Josh Berkus tells me that he would like pg_controldata information,
 and I was a bit interested in pg_config information, for this
 reason: I had a report of someone who had configured using
 --with-libxml but the xml tests actually returned the results that
 are expected without xml being configured. The regression tests
 thus passed, but should not have. It occurred to me that if we had
 a test like
 
 select pg_config('configure') ~ '--with-libxml' as has_xml;
 
 in the xml tests then this failure mode would be detected.

I've found use for them both in the past. A fair amount of bit-rot has
set it no doubt, and these were quick and dirty to begin with, but I
have these hacks from a while back:

https://github.com/jconway/pg_config
https://github.com/jconway/pg_controldata

- -- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training,  Open Source Development
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJV1epAAAoJEDfy90M199hlDjkP/AjAMtF4Q4rwfy5CsqA1rCgX
E/hLoTxExwU11nS2Q6IxC1unsXCDQrr2lkutKsw5Ybo78O7pMvR39GqShQ6ItaOr
xLxkYmxU1pEC4MAzZ6TR7RCAiP5SOgDEC3X6ArBqc0zub6FrnuYI3zv8eIgcTWqT
cFan4vCHYZnWUp3KQ0sOpSl4I/5jeW3AwrfTlnwskC8NwpP0Oa0DiXXTtXoRUYZI
CaWUsV9FgfzGvhyQCJpwcldEU9TprU24U09CpIVzSmw6Q9eQBHO4k+nT/Xw3BRjH
LxPM7gH9LweQOJhzP3J8agrJqbnSntayPZKG9ZsqMvC/8Ly+mlIO/X4cuEYpKO94
De9jO+aly6NhUdCpOdM6cZdqsTggXExaafl61wazYBUcLWotBnL9I1E9n59fm+yu
wgec7vdWIzZn6FYr+Ox2sgOBbxXVs3l/FLTCkoUgsZWRvlEL/naePr5TxMJMyqpo
pt15r1WRd4KwDEN4qxYHrzOab/T7QG1RS9Qr2v0GMPvQp4lIXgCq2aJJXy+iCDPE
lifDpHipk39h0r0RN377UFmfW3Z8DNLj0UQpuw+bOXtFLZpcA4WQdg64qXBaUU26
7icScC7+PpEr+HialFyA8lbDb9EVRrUaJ6CJajrGy8iuH/vpME2+40sgFTvavZtk
a0mfnPIdWJjzkldZGZ23
=hbBg
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Corey Huinker


 It seems the way of specifying per-partition definition/constraint,
 especially for range partitioning, would have a number of interesting
 alternatives.

 By the way, the [USING opclass_name] bit is just a way of telling that a
 particular key column has user-defined notion of ordering in case of
 range partitioning and equality for list partitioning. The opclass would
 eventually determine which WHERE clauses (looking at operators, operand
 types) are candidates to help prune partitions. If we use the range_op
 range_literal::range_type notation to describe partition constraint for
 each partition, it might not offer much beyond the readability. We are not
 really going to detect range operators being applied in WHERE conditions
 to trigger partition pruning, for example. Although I may be missing
 something...


I don't think it's important that ranges be used internally for partition
pruning, though I don't see a reason why they couldn't. I was making the
case for range types being used at partition creation/declaration time,
because you were proposing new syntax to describe a range of values in text
when we already have that in range types. We should eat our own dog food.

But mostly, I wanted to make sure that range partitions could have
inclusive and exclusive bounds.



  - No partitioning scheme survives first contact with reality. So you will
  need a facility for splitting and joining existing partitions. For
  splitting partitions, it's sufficient to require that the new partition
  share either a upper/lower bound (with the same inclusivity/exclusivity)
 of
  an existing partition, thus uniquely identifying the partition to be
 split,
  and require that the other bound be within the range of the partition to
 be
  split. Similarly, it's fair to require that the partitions to be joined
 be
  adjacent in range. In both cases, range operators make these tests
 simple.
 

 SPLIT/MERGE can be done in later patches/release, I think.


Later patches for sure. When a partition fills up, it's a critical
performance issue, so the fewer steps needed to amend it the better.


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Simon Riggs
On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote:


 The patch is ~3.25x faster than master


I've tried to read this post twice and both times my work_mem overflowed.
;-)

Can you summarize what this patch does? I understand clearly what it
doesn't do...

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Greg Stark
On Thu, Aug 20, 2015 at 4:13 PM, David Brownlee a...@absd.org wrote:
 2) The initdb problem is actually not our fault. It looks like a
 NetBSD kernel bug when allocating large shared memory blocks on a
 machine without lots of memory. There's not much initdb can do with a
 kernel panic...

 That should definitely be fixed...

cf
http://mail-index.netbsd.org/port-vax/2015/08/19/msg002524.html
http://comments.gmane.org/gmane.os.netbsd.ports.vax/5773

It's possible it's a simh bug it smells more like a simple overflow to me.

BSD still has the problem of kern.maxfiles defaulting
 to a value low enough that even two connections causes the regression
 tests to run out of file descriptors. That's documented and it would
 be a right pain for initdb to detect that case.

 Is initdb calling ulimit() to check/set open files? Its probably worth
 it as a sanity check if nothing else.

Yup, we do that.

 I think the VAX default open_max is 128. The 'bigger' ports have a
 default of 1024, and I think they should probably all be updated to
 that, though that is orthogonal to a ulimit() check.

That's the problem. initdb tests how many connections can start up
when writing the default config. But we assume that each process can
use up to the rlimit file descriptors without running into a
system-wide limit. Raising it to 1024 lets me get two processes
running which is how I'm running them currently.

Also I forgot to mention, I also have to raise the stack limit with
ulimit. The default is so small that Postgres calculates the maximum
safe value for its max_stack_depth config is 0.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Greg Stark
On Thu, Aug 20, 2015 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 4) One of the tablesample tests seems to freeze indefinitely. I
 haven't looked into why yet. That might indeed indicate that the
 spinlock code isn't working?

 The tablesample tests seem like a not-very-likely first place for such a
 thing to manifest.  What I'm thinking is that there are places in there
 where we loop till we get an expected result.  Offhand I thought they were
 all integer math; but if one was float and the VAX code wasn't doing what
 was expected, maybe we could blame this on float discrepancies as well.

Ah, I was wrong. It's not the tablesample test -- I think that was the
last one to complete. Annoyingly we don't seem to print test names
until they finish.

It was groupingsets. And it's stuck again on the same query:

regression=# select
pid,now()-query_start,now()-state_change,waiting,state,query from
pg_stat_activity where pid  pg_backend_pid();
+--+-+-+-++--+
| pid  |?column? |?column? | waiting | state  |
query |
+--+-+-+-++--+
| 9185 | 00:53:38.571552 | 00:53:38.571552 | f   | active | select
a, b, grouping(a,b), sum(v), count(*), max(v)#|
|  | | | ||   from
gstest1 group by rollup (a,b);|
+--+-+-+-++--+

It's only been stuck an hour so it's possible it's still running but
this morning it was the same query that was running for 7 hours so I'm
guessing not.

Unfortunately I appear to have built without debugging symbols so
it'll be a couple days before I can rebuild with symbols to get a back
trace. (I vaguely remember when builds took hours but I don't recall
ever having to wait 48 hours for a build even back then)

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Stefan Kaltenbrunner

Hi all!


We just had a case of a very long running process of ours that creates 
does a lot of prepared statements through Perls DBD:Pg running into:


https://rt.cpan.org/Public/Bug/Display.html?id=88827

This resulted in millions of prepared statements created, but not 
removed in the affected backends over the course of 1-2 hours until the 
backends in question ran out of memory.
The out of memory condition resulted in one memory context dump 
generated per occurance each consisting of 2M lines(!) (basically a 
line of CachedPlan/CachePlanSource per statement/function).
In the 20 Minutes or it took monitoring to alert and operations to react 
this cause a followup incident because repeated out of memory 
conditions caused over 400M(!!) loglines amounting to some 15GB of data 
running the log partition dangerously close to full.


an example memory context dump looks like this:


TopMemoryContext: 582728880 total in 71126 blocks; 6168 free (52 
chunks); 582722712 used
  TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 chunks); 
2096 used

ExecutorState: 8192 total in 1 blocks; 5392 free (0 chunks); 2800 used
  ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used
SPI Proc: 8192 total in 1 blocks; 5416 free (0 chunks); 2776 used
  PL/pgSQL function context: 8192 total in 1 blocks; 1152 free (1 
chunks); 7040 used
  PL/pgSQL function context: 24576 total in 2 blocks; 11400 free (1 
chunks); 13176 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5 
chunks); 12688 used
  PL/pgSQL function context: 8192 total in 1 blocks; 1120 free (1 
chunks); 7072 used
  PL/pgSQL function context: 24576 total in 2 blocks; 10984 free (1 
chunks); 13592 used
  PL/pgSQL function context: 57344 total in 3 blocks; 29928 free (2 
chunks); 27416 used
  PL/pgSQL function context: 57344 total in 3 blocks; 28808 free (2 
chunks); 28536 used
  PL/pgSQL function context: 24576 total in 2 blocks; 5944 free (3 
chunks); 18632 used
  RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks); 
8592 used
  RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 
used
  PL/pgSQL function context: 57344 total in 3 blocks; 31832 free (2 
chunks); 25512 used
  PL/pgSQL function context: 57344 total in 3 blocks; 29600 free (2 
chunks); 27744 used
  PL/pgSQL function context: 57344 total in 3 blocks; 39688 free (5 
chunks); 17656 used

  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 
chunks); 6512 used
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0 
chunks); 20776 used
  Prepared Queries: 125886512 total in 25 blocks; 4764208 free (91 
chunks); 121122304 used

  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 
chunks); 12688 used

  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 
6512 used
  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 
18880 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 
chunks); 32 used

  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
  ExecutorState: 57344 total in 3 blocks; 21856 free (2 chunks); 
35488 used

ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 
11744 used
  CacheMemoryContext: 42236592 total in 28 blocks; 7160904 free (298 
chunks); 35075688 used

CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks); 
3264 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks); 
3264 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks); 
3264 used

CachedPlan: 15360 

Re: [HACKERS] allowing wal_level change at run time

2015-08-20 Thread Andres Freund
On 2015-08-20 15:11:02 -0400, Peter Eisentraut wrote:
 It seems to me that this would effectively replace the wal_level
 parameter with the presence or absence of a magic replication slot.
 That doesn't seem like a net improvement.  It just replaces one
 well-known configuration mechanism with a new ad hoc one.

Well, with the difference that it can be changed automatically. We could
alternatively automagically use ALTER SYSTEM, but that's not really
guaranteed to work and isn't available via the replication protocol
currently. But I could live with that as well.

  Also note that this scheme and any similar one requires merging the
  archive and hot_standby levels, which was the core of your original
  proposal [1]
  
  It doesn't really, we could continue to keep them separate. I'm
  proposing to maintain wal_level automatically, so it'd not be
  configurable anymore, so it'd not matter much, as long as we're not less
  efficient.
 
 But, under any scheme to set wal_level automatically, how will the
 primary know whether it needs to use level archive or hot_standby?

I'd have said archive_mode triggered archive and everything else
hot_standby.  I am still of the opinion though that the difference
between those two levels is meaningless and that we should remove
archive.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 1:28 PM, Feng Tian ft...@vitessedata.com wrote:
 Agree everything in principal,except one thing -- no, random IO on HDD in
 2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s.
 :-)

Sure. The advantage of replacement selection could be a deciding
factor in unrepresentative cases, as I mentioned, but even then it's
not going to be a dramatic difference as it would have been in the
past.

By the way, please don't top-post.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Andrew Dunstan



On 08/20/2015 03:24 PM, Jim Nasby wrote:

On 8/17/15 4:25 PM, Josh Berkus wrote:

On 08/17/2015 02:18 PM, Jim Nasby wrote:

On 8/17/15 3:33 PM, Josh Berkus wrote:
Again, how do we handle missing keys?  Just return NULL?  or 
ERROR?  I'd

prefer the former, but there will be arguments the other way.


I've been wondering if we should add some kind of strict JSON. My 
big

concern is throwing an error if you try to provide duplicate keys, but
it seems reasonable that json_strict would throw an error if you 
try to

reference something that doesn't exist.

Only if there's demand for it.  Is there?


I'm certainly worried (paranoid?) about it. Postgres is very good 
about not silently dropping data and this seems a glaring departure 
from that. I haven't looked yet but I'm hoping this could at least be 
added as an extension without duplicating a bunch of the existing JSON 
stuff.



What on earth does this have to do with duplicate keys? The jsonb input 
rules on duplicate keys are very clear, incidentally, and follow the 
practice of most JSON processors (duplicates are discarded, last one 
wins.) But that has nothing at all to do with this feature, really.


This feature can NOT be added as an extension, since it requires grammar 
processing modification.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Feng Tian
On Thu, Aug 20, 2015 at 1:16 PM, Peter Geoghegan p...@heroku.com wrote:

 On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian ft...@vitessedata.com wrote:
  Just a quick anecdotal evidence.  I did similar experiment about three
 years
  ago.   The conclusion was that if you have SSD, just do quick sort and
  forget the longer runs, but if you are using hard drives, longer runs is
 the
  winner (and safer, to avoid cliffs).I did not experiment with
 RAID0/5 on
  many spindles though.
 
  Not limited to sort, more generally, SSD is different enough from HDD,
  therefore it may worth the effort for backend to guess what storage
 device
  it has, then choose the right thing to do.

 The devil is in the details. I cannot really comment on such a general
 statement.

 I would be willing to believe that that's true under
 unrealistic/unrepresentative conditions. Specifically, when multiple
 passes are required with a sort-merge strategy where that isn't the
 case with replacement selection. This could happen with a tiny
 work_mem setting (tiny in an absolute sense more than a relative
 sense). With an HDD, where sequential I/O is so much faster, this
 could be enough to make replacement selection win, just as it would
 have in the 1970s with magnetic tapes.

 As I've said, the solution is to simply avoid multiple passes, which
 should be possible in virtually all cases because of the quadratic
 growth in a classic hybrid sort-merge strategy's capacity to avoid
 multiple passes (growth relative to work_mem's growth). Once you
 ensure that, then you probably have a mostly I/O bound workload, which
 can be made faster by adding sequential I/O capacity (or, on the
 Postgres internals side, adding asynchronous I/O, or with memory
 prefetching). You cannot really buy a faster CPU to make a degenerate
 heapsort faster.

 --
 Peter Geoghegan


Agree everything in principal,except one thing -- no, random IO on HDD in
2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s.
:-)


Re: [HACKERS] proposal: function parse_ident

2015-08-20 Thread Jim Nasby

On 8/19/15 7:22 PM, Tom Lane wrote:

Jim Nasby jim.na...@bluetreble.com writes:

Don't say parse names for things other than tables.  Only a minority
of the types of objects used in the database have names that meet this
specification.



Really? My impression is that almost everything that's not a shared
object allows for a schema...


Tables meet this naming spec.  Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them).  Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them.  Types also don't really meet this naming spec, because you need to
contend with special cases like int[] or timestamp with time zone.
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if object does not exist isn't supposed to be an error case,
what of name is not correctly formatted?  It seems a bit arbitrary
to me to throw an error in one case but not the other.


I think the important point here is this is *parse*_ident(). It's not 
meant to guarantee an object actually exists, what kind of object it is, 
or whether it's syntactically correct. It's meant only to separate an 
identifier into it's 3 (or in some cases 2) components. If this was as 
simple as string_to_array(foo, '.') then it'd be a bit pointless, but 
it's obviously a lot more complex than that.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Jim Nasby

On 8/17/15 4:25 PM, Josh Berkus wrote:

On 08/17/2015 02:18 PM, Jim Nasby wrote:

On 8/17/15 3:33 PM, Josh Berkus wrote:

Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
prefer the former, but there will be arguments the other way.


I've been wondering if we should add some kind of strict JSON. My big
concern is throwing an error if you try to provide duplicate keys, but
it seems reasonable that json_strict would throw an error if you try to
reference something that doesn't exist.

Only if there's demand for it.  Is there?


I'm certainly worried (paranoid?) about it. Postgres is very good about 
not silently dropping data and this seems a glaring departure from that. 
I haven't looked yet but I'm hoping this could at least be added as an 
extension without duplicating a bunch of the existing JSON stuff.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Feng Tian
On Thu, Aug 20, 2015 at 10:41 AM, Peter Geoghegan p...@heroku.com wrote:

 On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote:
 
 
  The patch is ~3.25x faster than master
 
 
  I've tried to read this post twice and both times my work_mem overflowed.
  ;-)
 
  Can you summarize what this patch does? I understand clearly what it
 doesn't
  do...

 The most important thing that it does is always quicksort runs, that
 are formed by simply filling work_mem with tuples in no particular
 order, rather than trying to make runs that are twice as large as
 work_mem on average. That's what the ~3.25x improvement concerned.
 That's actually a significantly simpler algorithm than replacement
 selection, and appears to be much faster. You might even say that it's
 a dumb algorithm, because it is less sophisticated than replacement
 selection. However, replacement selection tends to use CPU caches very
 poorly, while its traditional advantages have become dramatically less
 important due to large main memory sizes in particular. Also, it hurts
 that we don't currently dump tuples in batches, for several reasons.
 Better to do memory intense operations in batch, rather than having a
 huge inner loop, in order to minimize or prevent instruction cache
 misses. And we can better take advantage of asynchronous I/O.

 The complicated aspect of considering the patch is whether or not it's
 okay to not use replacement selection anymore -- is that an
 appropriate trade-off?

 The reason that the code has not actually been simplified by this
 patch is that I still want to use replacement selection for one
 specific case: when it is anticipated that a quicksort with
 spillover can occur, which is only possible with incremental
 spilling. That may avoid most I/O, by spilling just a few tuples using
 a heap/priority queue, and quicksorting everything else. That's
 compelling when you can manage it, but no reason to always use
 replacement selection for the first run in the common case where there
 well be several runs in total.

 Is that any clearer? To borrow a phrase from the processor
 architecture community, from a high level this is a Brainiac versus
 Speed Demon [1] trade-off. (I wish that there was a widely accepted
 name for this trade-off.)

 [1]
 http://www.lighterra.com/papers/modernmicroprocessors/#thebrainiacdebate
 --
 Peter Geoghegan


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Hi, Peter,

Just a quick anecdotal evidence.  I did similar experiment about three
years ago.   The conclusion was that if you have SSD, just do quick sort
and forget the longer runs, but if you are using hard drives, longer runs
is the winner (and safer, to avoid cliffs).I did not experiment with
RAID0/5 on many spindles though.

Not limited to sort, more generally, SSD is different enough from HDD,
therefore it may worth the effort for backend to guess what storage
device it has, then choose the right thing to do.

Cheers.


Re: [HACKERS] DBT-3 with SF=20 got failed

2015-08-20 Thread Tomas Vondra

Hi,

On 08/19/2015 03:53 PM, Tom Lane wrote:


I don't see anything very wrong with constraining the initial
allocation to 1GB, or even less. That will prevent consuming insane
amounts of work_mem when the planner's rows estimate is too high
rather than too low. And we do have the ability to increase the hash
table size on the fly.


Perhaps. Limiting the initial allocation to 1GB might help with lowering 
memory consumed in case of over-estimation, and it's high enough so that 
regular queries don't run into that.


My initial thought was if the memory consumption is a problem, lower 
the work_mem but after thinking about it for a while I don't see a 
reason to limit the memory consumption this way, if possible.


But what is the impact on queries that actually need more than 1GB of 
buckets? I assume we'd only limit the initial allocation and still allow 
the resize based on the actual data (i.e. the 9.5 improvement), so the 
queries would start with 1GB and then resize once finding out the 
optimal size (as done in 9.5). The resize is not very expensive, but 
it's not free either, and with so many tuples (requiring more than 1GB 
of buckets, i.e. ~130M tuples) it's probably just a noise in the total 
query runtime. But I'd be nice to see some proofs of that ...


Also, why 1GB and not 512MB (which is effectively the current limit on 
9.4, because we can't allocate 1GB there so we end up with 1/2 of that)? 
Why not some small percentage of work_mem, e.g. 5%?


Most importantly, this is mostly orthogonal to the original bug report. 
Even if we do this, we still have to fix the palloc after the resize.


So I'd say let's do a minimal bugfix of the actual issue, and then 
perhaps improve the behavior in case of significant overestimates. The 
bugfix should happen ASAP so that it gets into 9.5.0 (and backported). 
We already have patches for that.


Limiting the initial allocation deserves more thorough discussion and 
testing, and I'd argue it's 9.6 material at this point.



The real problem here is the potential integer overflow in
ExecChooseHashTableSize. Now that we know there is one, that should
be fixed (and not only in HEAD/9.5).


I don't think there's any integer overflow. The problem is that we end 
up with nbuckets so high that (nbuckets*8) overflows 1GB-1.


There's a protection against integer overflow in place (it was there for 
a long time), but there never was any protection against the 1GB limit. 
Because we've been using much smaller work_mem values and 
NTUP_PER_BUCKET=10, so we could not actually reach it.



But I believe Kaigai-san withdrew his initial proposed patch, and we
don't have a replacementas far as I saw.


I believe the patch proposed by KaiGai-san is the right one to fix the 
bug discussed in this thread. My understanding is KaiGai-san withdrew 
the patch as he wants to extend it to address the over-estimation issue.


I don't think we should do that - IMHO that's an unrelated improvement 
and should be addressed in a separate patch.



kind regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 6:05 AM, Greg Stark st...@mit.edu wrote:
 On Thu, Aug 20, 2015 at 3:24 AM, Peter Geoghegan p...@heroku.com wrote:
 I believe, in general, that we should consider a multi-pass sort to be
 a kind of inherently suspect thing these days, in the same way that
 checkpoints occurring 5 seconds apart are: not actually abnormal, but
 something that we should regard suspiciously. Can you really not
 afford enough work_mem to only do one pass? Does it really make sense
 to add far more I/O and CPU costs to avoid that other tiny memory
 capacity cost?

 I think this is the crux of the argument. And I think you're
 basically, but not entirely, right.

I agree that that's the crux of my argument. I disagree about my not
being entirely right.  :-)

 The key metric there is not how cheap memory has gotten but rather
 what the ratio is between the system's memory and disk storage. The
 use case I think you're leaving out is the classic data warehouse
 with huge disk arrays attached to a single host running massive
 queries for hours. In that case reducing run size will reduce I/O
 requirements directly and halving the amount of I/O sort takes will
 halve the time it takes regardless of cpu efficiency. And I have a
 suspicion typical data distributions get much better than a 2x
 speedup.

It could reduce seek time, which might be the dominant cost (but not
I/O as such). I do accept that my argument did not really apply to
this case, but you seem to be making an additional non-conflicting
argument that certain data warehousing cases would be helped in
another way by my patch. My argument was only about multi-gigabyte
cases that I tested that were significantly improved, primarily due to
CPU caching effects. If this helps with extremely large sorts that do
require multiple passes by reducing seek time -- I think that they'd
have to be multi-terabyte sorts, which I am ill-equipped to test --
then so much the better, I suppose.

In any case, as I've said the way we allow run size to be dictated
only by available memory (plus whatever replacement selection can do
to make on-tape runs longer) is bogus. In the future there should be a
cost model for an optimal run size, too.

 But I think you're basically right that this is the wrong use case to
 worry about for most users. Even those users that do have large batch
 queries are probably not processing so much that they should be doing
 multiple passes. The ones that do are are probably more interested in
 parallel query, federated databases, column stores, and so on rather
 than worrying about just how many hours it takes to sort their
 multiple terabytes on a single processor.

I suppose so. If you can afford multiple terabytes of storage, you can
probably still afford gigabytes of memory to do a single pass. My
laptop is almost 3 years old, weighs about 1.5 Kg, and has 16 GiB of
memory. It's usually always that simple, and not really because we
assume that Postgres doesn't have to deal with multi-terabyte sorts.
Maybe I lack perspective, having never really dealt with a real data
warehouse. I didn't mean to imply that in no circumstances could
anyone profit from a multi-pass sort. If you're using Hadoop or
something, I imagine that it still makes sense.

In general, I think you'll agree that we should strongly leverage the
fact that a multi-pass sort just isn't going to be needed when things
are set up correctly under standard operating conditions nowadays.

 I am quite suspicious of quicksort though. It has O(n^2) worst case
 and I think it's only a matter of time before people start worrying
 about DOS attacks from users able to influence the data ordering. It's
 also not very suitable for GPU processing. Quicksort gets most of its
 advantage from cache efficiency, it isn't a super efficient algorithm
 otherwise, are there not other cache efficient algorithms to consider?

I think that high quality quicksort implementations [1] will continue
to be the way to go for sorting integers internally at the very least.
Practically speaking, problems with the worst case performance have
been completely ironed out since the early 1990s. I think it's
possible to DOS Postgres by artificially introducing a worst-case, but
it's very unlikely to be the easiest way of doing that in practice. I
admit that it's probably the coolest way, though.

I think that the benefits of offloading sorting to the GPU are not in
evidence today. This may be especially true of a street legal
implementation that takes into account all of the edge cases, as
opposed to a hand customized thing for sorting uniformly distributed
random integers. GPU sorts tend to use radix sort, and I just can't
see that catching on.

[1] https://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Tomas Vondra

Hi,

On 08/20/2015 11:04 PM, Stefan Kaltenbrunner wrote:

On 08/20/2015 06:09 PM, Tom Lane wrote:

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:

I wonder if we should have a default of capping the dump to say 1k lines
or such and only optionally do a full one.


-1. It's worked like this for the last fifteen years or
thereabouts, and you're the first one to complain. I suspect some
weirdness in your logging setup, rather than any systemic problem
that we need to lobotomize our diagnostic output in order to
prevent.


not sure what you consider weird in the logging setup here - the
context dump is imho borderline on internal diagnostic output at a
debug level (rather than making sense to an average sysadmin) already
(and no way to control it). But having (like in our case) the backend
dumping 2 million basically identical lines into a general logfile
per event seems excessive and rather abusive towards the rest of the
system (just from an IO perspective for example or from a log file
post processing tool perspective)


I've seen similar issues too, on systems with perfectly common logging 
setups. So even if your logging setup would be strange, it's not the 
main factor here.



(The reason I say lobotomize is that there's no particularly
good reason to assume that the first N lines will tell you what you
need to know. And the filter rule would have to be *very* stupid,
because we can't risk trying to allocate any additional memory to
track what we're doing here.)


I do understand that there migt be challenges there but in the last
15 years machines got way faster and pg got way capable and some of
those capabilities might need to get revisited in that regards - and
while it is very nice that pg survives multiple oom cases pretty
nicely I dont think it is entitled to put an imho unreasonable burden
on the rest of the system by writing insane amounts of data.

Just from a sysadmin perspective it also means that it is trivial for
a misbehaving app to fill up the logfile on a system because unlike
almost all other actual other logging output there seems to be no way
to control/disabled it on a per role/database level.


IMHO the first thing we might do is provide log_memory_stats GUC 
controlling that. I'm not a big fan of adding GUC for everything, but in 
this case it seems appropriate, just like the other log_ options.


I also don't think logging just subset of the stats is a lost case. 
Sure, we can't know which of the lines are important, but for example 
logging just the top-level contexts with a summary of the child contexts 
would be OK. So something like this:


TopMemoryContext: 582728880 total in 71126 blocks; 6168 free  ...
  TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 ...
3 child contexts, 10 blocks, 34873 free
  PL/pgSQL function context: 8192 total in 1 blocks; 1152 fre ...
  PL/pgSQL function context: 24576 total in 2 blocks; 11400 f ...
  Type information cache: 24576 total in 2 blocks; 11888 free ...
  PL/pgSQL function context: 8192 total in 1 blocks; 1120 fre ...
  PL/pgSQL function context: 24576 total in 2 blocks; 10984 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 29928 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 28808 f ...
  PL/pgSQL function context: 24576 total in 2 blocks; 5944 fr ...
  RI compare cache: 24576 total in 2 blocks; 15984 free (5 ch ...
  RI query cache: 24576 total in 2 blocks; 11888 free (5 chun ...
  PL/pgSQL function context: 57344 total in 3 blocks; 31832 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 29600 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 39688 f ...
  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 65 ...
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free ...
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free  ...
  Prepared Queries: 125886512 total in 25 blocks; 4764208 fre ...
  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chun ...
  Operator lookup cache: 24576 total in 2 blocks; 11888 free  ...
  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks ...
  Operator class cache: 8192 total in 1 blocks; 1680 free (0  ...
  smgr relation table: 24576 total in 2 blocks; 5696 free (4  ...
  TransactionAbortContext: 32768 total in 1 blocks; 32736 fre ...
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks);  ...
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); ...
4 child contexts, 34 blocks, 87283 free
  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chu ...
  CacheMemoryContext: 42236592 total in 28 blocks; 7160904 fr ...
1487438 child contexts, 14874382 blocks, 3498349 free

instead of the millions of lines with all the datails.

So either

log_memory_stats = (on|off)

or

log_memory_stats = (full|short|off)

might work.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via 

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread David Brownlee
On 20 August 2015 at 14:54, Greg Stark st...@mit.edu wrote:
 On Wed, Jun 25, 2014 at 6:05 PM, John Klos j...@ziaspace.com wrote:
 While I wouldn't be surprised if you remove the VAX code because not many
 people are going to be running PostgreSQL, I'd disagree with the assessment
 that this port is broken. It compiles, it initializes databases, it runs, et
 cetera, albeit not with the default postgresql.conf.

 So I've been playing with this a bit. I have simh running on my home
 server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
 intended to be a SAN and its cpu is woefully underpowered so the
 resulting VAX is actually very very slow. So slow I wonder if there's
 a bug in the emulator but anyways.

I've run NetBS/vax in simh on a laptop with a 2.5Ghz i5-2520M and it
feels reasonably fast or a VAX (make of that what you will :)

 I'm coming to the conclusion that the port doesn't really work
 practically speaking but the failures are more interesting than I
 expected. They come in a few varieties:

Mmm. edge cases and failing (probably reasonable :) assumptions.

 1) Vax does not have IEEE fp. This manifests in a few ways, some of
 which may be our own bugs or missing expected outputs. The numeric
 data type maths often produce numbers rounded off differently, the
 floating point tests print numbers one digit shorter than our expected
 results expect and sometimes in scientific notation where we don't
 expect. The overflow tests generate floating point exceptions rather
 than overflows. Infinity and NaN don't work. The Json code in
 particular generates large numbers where +/- Infinity literals are
 supplied.

 There are some planner tests that fail with floating point exceptions
 -- that's probably a bug on our part. And I've seen at least one
 server crash (maybe two) apparently caused by one as well which I
 don't believe is expected.

Sounds like some useful test cases there.

 2) The initdb problem is actually not our fault. It looks like a
 NetBSD kernel bug when allocating large shared memory blocks on a
 machine without lots of memory. There's not much initdb can do with a
 kernel panic...

That should definitely be fixed...

BSD still has the problem of kern.maxfiles defaulting
 to a value low enough that even two connections causes the regression
 tests to run out of file descriptors. That's documented and it would
 be a right pain for initdb to detect that case.

Is initdb calling ulimit() to check/set open files? Its probably worth
it as a sanity check if nothing else.

I think the VAX default open_max is 128. The 'bigger' ports have a
default of 1024, and I think they should probably all be updated to
that, though that is orthogonal to a ulimit() check.

 3) The tests take so long to run that autovacuum kicks in and the
 tests start producing rows in inconsistent orderings. I assume that's
 a problem we've run into on the CLOBBER_CACHE animals as well?

Can the autovaccum daemon settings be bumped/disabled while running the tests?

 4) One of the tablesample tests seems to freeze indefinitely. I
 haven't looked into why yet. That might indeed indicate that the
 spinlock code isn't working?

 So my conclusion tentatively is that while the port doesn't actually
 work practically speaking it is nevertheless uncovering some
 interesting bugs.

Good to hear. Looks like bugs in both the OS and software side, so fun for all.

Thanks for taking the time to do this!

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Corey Huinker


 My experiences with Oracle's hash function were generally not good -
 there's a reason many hash algorithms exist.  If/when we do hash
 partitioning in Postgres I'd like to see the hash function be
 user-definable.


+1

In my experience, hash partitioning had one use: When you had run out of
ways to logically partition the data, AND you had two tables in a foreign
key relationship, AND that relationship was the most common use-case for
selecting the two tables. In which case, your one big join became 8 or 16
little ones, and all was well in the universe...

...until those little joins started to spill to disk, and now you need to
repartition by 2x hashes, and that basically means a reload of your primary
fact table, and a talk with a boss about why you painted yourself into a
corner when you first did that.





 Meanwhile, I think list and range are a good start.  I'd prefer to see
 sub-partitioning before hash partitioning.


+1


Re: [HACKERS] deparsing utility commands

2015-08-20 Thread Alvaro Herrera
Shulgin, Oleksandr wrote:

 Another quirk of ALTER TABLE is that due to multi-pass processing in
 ATRewriteCatalogs, the same command might be collected a number of times.
 For example, in src/test/regress/sql/inherit.sql:
 
 alter table a alter column aa type integer using bit_length(aa);
 
 the alter column type then appears 4 times in the deparsed output as
 identical subcommands of a single ALTER TABLE command.

Yeah, I had a hack somewhere in the collection code that if the relation
ID was different from what was specified, then the command was ignored.
I removed that before commit because it seemed possible that for some
cases you actually want the command reported separately for each child.

I think our best option in this case is to ignore commands that are
reported for different relations during JSON deparsing.  Not sure how
easy that is.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Annotating pg_dump/pg_restore

2015-08-20 Thread Kevin Burke
Hi,
Normally I try to annotate incoming queries, to make it easier to diagnose
slow ones. For example:

-- Users.findByPhoneNumber
SELECT * FROM 

The pg_dump and pg_restore commands issue a COPY with no possibility of
adding a comment. It would be useful to know who or what exactly is
performing a COPY against a database - maybe a nightly backup script, maybe
a developer copying a table.

I was wondering if you could have a command line flag that let you attach a
comment to the query?

Thanks!
Kevin

-- 
kevin


Re: [HACKERS] deparsing utility commands

2015-08-20 Thread Alvaro Herrera
Shulgin, Oleksandr wrote:

 A particularly nasty one is:
 
 ERROR:  index cwi_replaced_pkey does not exist
 
 The test statement that's causing it is this one:
 
 ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,
 ADD CONSTRAINT cwi_replaced_pkey PRIMARY KEY
 USING INDEX cwi_uniq2_idx;
 
 Which gets deparsed as:
 
 ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,
 ADD CONSTRAINT cwi_replaced_pkey PRIMARY KEY
 USING INDEX cwi_replaced_pkey;
 
 The problem is that during constraint transformation, the index is being
 renamed to match the constraint name and at the deparse stage the original
 index name appears to be lost completely...  I haven't figure out if
 there's a way around unless we introduce a new field in IndexStmt
 (originalName?) to cover exactly this corner case.

Oh :-(  Hmm, modifying parse nodes should normally be considered last
resort, and at the same time identifying objects based on names rather
than OIDs is frowned upon.  But I don't see any way to handle this
otherwise.  I'm not sure what's the best solution for this one.

 The updated versions of the core-support patch and the contrib modules are
 attached.

Please try to split things up, or at least don't mix more than it
already is.  For instance, the tsconfig mapping stuff should be its own
patch; we can probably make a case for pushing that on its own.

Also I see you added one caller of EventTriggerInhibitCommandCollection.
I don't like that crap at all and I think we would be better off if we
were able to remove it completely.  Please see whether it's possible to
handle this case in some other way.

You seem to have squashed the patches?  Please keep the split out.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Joshua D. Drake


On 08/20/2015 08:51 AM, Stefan Kaltenbrunner wrote:


This is 9.1.14 on Debian Wheezy/amd64 fwiw - but I dont think we have
made relevant changes in more recent versions.


It seems we may also want to consider a way to drop those prepared 
queries after a period time of non use.


JD





regards


Stefan





--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Andres Freund
On 2015-08-20 16:42:21 +0100, Greg Stark wrote:
 Ah, I was wrong. It's not the tablesample test -- I think that was the
 last one to complete. Annoyingly we don't seem to print test names
 until they finish.
 
 It was groupingsets. And it's stuck again on the same query:
 
 regression=# select
 pid,now()-query_start,now()-state_change,waiting,state,query from
 pg_stat_activity where pid  pg_backend_pid();
 +--+-+-+-++--+
 | pid  |?column? |?column? | waiting | state  |
 query |
 +--+-+-+-++--+
 | 9185 | 00:53:38.571552 | 00:53:38.571552 | f   | active | select
 a, b, grouping(a,b), sum(v), count(*), max(v)#|
 |  | | | ||   from
 gstest1 group by rollup (a,b);|
 +--+-+-+-++--+
 
 It's only been stuck an hour so it's possible it's still running but
 this morning it was the same query that was running for 7 hours so I'm
 guessing not.

Interesting.

 Unfortunately I appear to have built without debugging symbols so
 it'll be a couple days before I can rebuild with symbols to get a back
 trace. (I vaguely remember when builds took hours but I don't recall
 ever having to wait 48 hours for a build even back then)

Without any further clues I'd guess it's stuck somewhere in
bipartite_match.c. That's the only place where floating point problmes
would possibly result in getting stuck.


I'm all for making sure these issues are indeed caused by platform
specific float oddities, and not a more fundamental problem. But to me
the state of this port, as evidenced in this thread, seems to be too bad
to be worthwhile keeping alive. Especially since there's really no
imaginable use case except for playing around.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 I wonder if we should have a default of capping the dump to say 1k lines 
 or such and only optionally do a full one.

-1.  It's worked like this for the last fifteen years or thereabouts,
and you're the first one to complain.  I suspect some weirdness in
your logging setup, rather than any systemic problem that we
need to lobotomize our diagnostic output in order to prevent.

(The reason I say lobotomize is that there's no particularly good
reason to assume that the first N lines will tell you what you need
to know.  And the filter rule would have to be *very* stupid, because
we can't risk trying to allocate any additional memory to track what
we're doing here.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-08-20 Thread Shulgin, Oleksandr
On Thu, Aug 20, 2015 at 4:28 PM, Shulgin, Oleksandr 
oleksandr.shul...@zalando.de wrote:


 Which gets deparsed as:

 ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,
 ADD CONSTRAINT cwi_replaced_pkey PRIMARY KEY
 USING INDEX cwi_replaced_pkey;

 The problem is that during constraint transformation, the index is being
 renamed to match the constraint name and at the deparse stage the original
 index name appears to be lost completely...  I haven't figure out if
 there's a way around unless we introduce a new field in IndexStmt
 (originalName?) to cover exactly this corner case.

 The updated versions of the core-support patch and the contrib modules are
 attached.


Another quirk of ALTER TABLE is that due to multi-pass processing in
ATRewriteCatalogs, the same command might be collected a number of times.
For example, in src/test/regress/sql/inherit.sql:

alter table a alter column aa type integer using bit_length(aa);

the alter column type then appears 4 times in the deparsed output as
identical subcommands of a single ALTER TABLE command.