date:20140708


From: Amit Kapila amit.kapil...@gmail.com

On Fri, Jul 4, 2014 at 7:29 PM, MauMau maumau...@gmail.com wrote:

[Hypothesis]
Why does the connection processing emit WAL?

Probably, it did page-at-a-time vacuum during access to pg_database and

pg_authid for client authentication.  src/backend/access/heap/README.HOT
describes:

I agree with your analysis that it can happen during connection
attempt.


Thank you.  I'm relieved the cause seems correct.




But the customer could not reproduce the problem when he performed the

same archive recovery from the same base backup again.  Why?  I guess the
autovacuum daemon vacuumed the system catalogs before he attempted to
connect to the database.



One way to confirm could be to perform the archive recovery by
disabling autovacuum.


Yeah, I thought of that too.  Unfortunately, the customer deleted the the 
base backup for testing.




Another thing which I am wondering about is can't the same happen
even for Read Only transaction (incase someone does Select which
prunes the page).


I'm afraid about that, too.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running


From: Rajeev rastogi rajeev.rast...@huawei.com
As of now there is no solution for this in PostgreSQL but I had submitted a 
patch Standalone synchronous master in
9.4 2014-01 CommitFest, which was rejected because of some issues. This 
patch was meant to degrade the synchronous

level of master, if all synchronous standbys are down.
I plan to resubmit this with better design sometime in 9.5.


Although I only read some mails of that thread, I'm sure your proposal is 
what many people would appreciate.  Your new operation mode is equivalent to 
the maximum availability mode of Oracle Data Guard, isn't it?  I'm looking 
forward to it.  Good luck.



==
Maximum availability
This protection mode provides the highest level of data protection that is 
possible without compromising the availability of a primary database. 
Transactions do not commit until all redo data needed to recover those 
transactions has been written to the online redo log and to at least one 
standby database. If the primary database cannot write its redo stream to at 
least one standby database, it effectively switches to maximum performance 
mode to preserve primary database availability and operates in that mode 
until it is again able to write its redo stream to a standby database.


This protection mode ensures zero data loss except in the case of certain 
double faults, such as failure of a primary database after failure of the 
standby database.


Maximum performance
This is the default protection mode. It provides the highest level of data 
protection that is possible without affecting the performance of a primary 
database. This is accomplished by allowing transactions to commit as soon as 
all redo data generated by those transactions has been written to the online 
log. Redo data is also written to one or more standby databases, but this is 
done asynchronously with respect to transaction commitment, so primary 
database performance is unaffected by delays in writing redo data to the 
standby database(s).


This protection mode offers slightly less data protection than maximum 
availability mode and has minimal impact on primary database performance.

==


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running


From: Tom Lane t...@sss.pgh.pa.us

problem, the user might not realize he's got one until he starts to wonder
why autovac/autoanalyze aren't working.


In autovacuum.c, autovacuum workers avoid waiting for the standby by doing:

/*
 * Force synchronous replication off to allow regular maintenance even if
 * we are waiting for standbys to connect. This is important to ensure we
 * aren't blocked from performing anti-wraparound tasks.
 */
if (synchronous_commit  SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
 SetConfigOption(synchronous_commit, local,
 PGC_SUSET, PGC_S_OVERRIDE);

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running


From: Tom Lane t...@sss.pgh.pa.us

Andres Freund and...@2ndquadrant.com writes:

On 2014-07-07 09:57:20 -0400, Tom Lane wrote:

Well, see the comment that explains why the logic is like this now:



I think we should 'simply' make sequences assign a toplevel xid - then
we can get rid of that special case in RecordTransactionCommit(). And I
think the performance benefit of not having to wait on XLogFlush() for
readonly xacts due to hot prunes far outweighs the decrease due to the
xid assignment/commit record.  I don't think that nextval()s are called
overly much without a later xid assigning statement.


Yeah, that could well be true.  I'm not sure if there are any other cases
where we have non-xid-assigning operations that are considered part of
what has to be flushed before reporting commit; if there are not, I'd
be okay with changing nextval() this way.


Thank you all for letting me know your thoughts.  I understood and agree 
that read-only transactions, including the connection establishment one, 
should not wait for the standby nor the XLOG flush at commit, and the 
current documentation/specification should not be changed.


I'll consider how to fix this problem, learning the code, then I'll ask for 
review.  I'd like to submit the patch for next CF if possible.


From: Fujii Masao masao.fu...@gmail.com
Sounds good direction. One question is: Can RecordTransactionCommit() 
avoid

waiting for not only replication but also local WAL flush safely in
such read-only
transaction case?


I'd appreciate any opinion on this, too.

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
 once the table is built and there's free space in work_mem. The patch
 mentioned above makes implementing this possible / rather simple.

Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we
run out of memory, increase NTUP_PER_BUCKET.  I'd like to think that
the common case is that work_mem will be large enough that everything
fits; and if you do it that way, then you save yourself the trouble of
rehashing later, which as you point out might lose if there are only a
few probes.  If it turns out that you run short of memory, you can
merge pairs of buckets up to three times, effectively doubling
NTUP_PER_BUCKET each time.

Yet another idea is to stick with your scheme, but do the dynamic
bucket splits lazily.  Set a flag on each bucket indicating whether or
not it needs a lazy split.  When someone actually probes the hash
table, if the flag is set for a particular bucket, move any tuples
that don't belong as we scan the bucket.  If we reach the end of the
bucket chain, clear the flag.

Not sure either of these are better (though I kind of like the first
one) but I thought I'd throw them out there...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder


From: Asif Naeem anaeem...@gmail.com

Other than my pervious comments, patch looks good to me. Thanks.


Thanks for reviewing.  I understood that your previous comment was to 
suggest copying the desired DLLs directly from Release/Debug folder to bin. 
But I'm afraid it cannot be done cleanly...  CopySolutionOutput() copies all 
DLLS to lib folder with no exception as follows:


  if ($1 == 1)
  {
   $dir = bin;
   $ext = exe;
  }
  elsif ($1 == 2)
  {
   $dir = lib;
   $ext = dll;
  }

It seems like I have to do something like this, listing the relevant DLL 
names anyway.  I don't think this is not cleaner.


  if ($1 == 1)
  {
   $dir = bin;
   $ext = exe;
  }
  elsif ($1 == 2  /* the project is libpq, libecpg, etc. */)
  {
   $dir = bin;
   $ext = dll;
  }
  elsif ($1 == 2)
  {
   $dir = lib;
   $ext = dll;
  }

What do you think?  Am I misunderstanding your suggestion?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On 8 Červenec 2014, 14:49, Robert Haas wrote:
 On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
 once the table is built and there's free space in work_mem. The patch
 mentioned above makes implementing this possible / rather simple.

 Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we
 run out of memory, increase NTUP_PER_BUCKET.  I'd like to think that
 the common case is that work_mem will be large enough that everything
 fits; and if you do it that way, then you save yourself the trouble of
 rehashing later, which as you point out might lose if there are only a
 few probes.  If it turns out that you run short of memory, you can
 merge pairs of buckets up to three times, effectively doubling
 NTUP_PER_BUCKET each time.

Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer
relations it may be way cheaper to use higher NTUP_PER_BUCKET values
instead of increasing the number of batches (resulting in repeated scans
of the outer table). I think it's important (but difficult) to keep these
things somehow balanced.

With large work_mem values the amount of memory for buckets may be quite
significant. E.g. 800MB work_mem may easily give ~120MB of memory taken by
buckets with NTUP_PER_BUCKET=1. With NTUP_PER_BUCKET=4 it's just ~30MB.


 Yet another idea is to stick with your scheme, but do the dynamic
 bucket splits lazily.  Set a flag on each bucket indicating whether or
 not it needs a lazy split.  When someone actually probes the hash
 table, if the flag is set for a particular bucket, move any tuples
 that don't belong as we scan the bucket.  If we reach the end of the
 bucket chain, clear the flag.

I don't think a lazy scheme makes much sense here. There are two clearly
separated phases - loading the table and search.

Also, it seems to me it can't work as described. Say we build a table and
then find out we need a table 4x the size. If I understand your proposal
correctly, you'd just resize buckets array, copy the existing buckets
(first 1/4 buckets) and set 'needs_split' flags. Correct?

Well, the problem is that while scanning a bucket you already need to know
which tuples belong there. But with the lazy approach, the tuples might
still be in the old (not yet split) buckets. So you'd have to search the
current bucket and all buckets that were not split yet. Or did I miss
something?

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-08 Thread Michael Paquier

On Sat, Jul 5, 2014 at 12:46 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 At 2014-07-02 15:51:08 -0700, p...@heroku.com wrote:

 Attached revision factors in everyone's concerns here, I think.

 Is anyone planning to review Peter's revised patch?

I have been doing some functional tests, and looked quickly at the
code to understand what it does:
1) Compiles without warnings, passes regression tests
2) Checking process goes through all the existing columns of a
relation even a difference of 1 with some other column(s) has already
been found. As we try to limit the number of hints returned, this
seems like a waste of resources.
3) distanceName could be improved, by for example having some checks
on the string lengths of target and source columns, and immediately
reject the match if for example the length of the source string is the
double/half of the length of target.
4) This is not nice, could it be possible to remove the stuff from varlena.c?
+/* Expand each Levenshtein distance variant */
+#include levenshtein.c
+#define LEVENSHTEIN_LESS_EQUAL
+#include levenshtein.c
+#undef LEVENSHTEIN_LESS_EQUAL
Part of the same comment: only varstr_leven_less_equal is used to
calculate the distance, should we really move varstr_leven to core?
This clearly needs to be reworked as not just a copy-paste of the
things in fuzzystrmatch.
The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think.
5) Do we want hints on system columns as well? For example here we
could get tableoid as column hint:
=# select tablepid from foo;
ERROR:  42703: column tablepid does not exist
LINE 1: select tablepid from foo;
   ^
LOCATION:  errorMissingColumn, parse_relation.c:3123
Time: 0.425 ms
6) Sometimes no hints are returned... Even in simple cases like this one:
=# create table foo (aa int, bb int);
CREATE TABLE
=# select ab from foo;
ERROR:  42703: column ab does not exist
LINE 1: select ab from foo;
   ^
LOCATION:  errorMissingColumn, parse_relation.c:3123
7) Performance penalty with a table with 1600 columns:
=# CREATE FUNCTION create_long_table(tabname text, columns int)
RETURNS void
LANGUAGE plpgsql
as $$
declare
  first_col bool = true;
  count int;
  query text;
begin
  query := 'CREATE TABLE ' || tabname || ' (';
  for count in 0..columns loop
query := query || 'col' || count ||  ' int';
if count  columns then
  query := query || ', ';
end if;
  end loop;
  query := query || ')';
  execute query;
end;
$$;
=# SELECT create_long_table('aa', 1599);
 create_long_table
---

(1 row)
Then tested queries like that: SELECT col888a FROM aa;
Patched version: 2.100ms~2.200ms
master branch (6048896): 0.956 ms~0.990 ms
So the performance impact seems limited.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS Design

2014-07-08 Thread Kohei KaiGai

2014-07-06 14:19 GMT+09:00 Stephen Frost sfr...@snowman.net:
 Kaigai,

 * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  Can you clarify where this is coming from..?  It sounds like you're
  referring to an existing implementation and, if so, it'd be good to get
  more information on how that works exactly.

 Oracle VPD - Multiple Policies for Each Table, View, or Synonym
 http://docs.oracle.com/cd/B19306_01/network.102/b14266/apdvpoli.htm#i1008351

 It says - Note that all policies applied to a table are enforced with AND 
 syntax.

 While I'm not against using this as an example to consider, it's much
 more complex than what we're talking about here- and it supports
 application contexts which allow groups of RLS rights to be applied or
 not applied; essentially it allows both AND and OR for sets of RLS
 policies, along with default policies which are applied no matter
 what.

 Not only Oracle VPD, it fits attitude of defense in depth.
 Please assume a system that installs network firewall, unix permissions
 and selinux. If somebody wants to reference an information asset within
 a file, he has to connect the server from the network address being allowed
 by the firewall configuration AND both of DAC and MAC has to allow his
 access.

 These are not independent systems and your argument would apply to our
 GRANT system also, which I hope it's agreed would make it far less
 useful.  Note also that SELinux brings in another complexity- it needs
 to make system calls out to check the access.

 Usually, we have to pass all the access control to reference the target
 information, not one of the access control stuffs being installed.

 This is true in some cases, and not in others.  Only one role you are a
 member of needs to have access to a relation, not all of them.  There
 are other examples of 'OR'-style security policies, this is merely one.
 I'm simply not convinced that it applies in the specific case we're
 talking about.

 In the end, I expect that either way people will be upset because they
 won't be able to specify fully which should be AND vs. which should be
 OR with the kind of flexibility other systems provide.  What I'm trying
 to get to is an initial implementation which is generally useful and is
 able to add such support later.

  This is similar to how roles work- your overall access includes all access
  granted to any roles you are a member of. You don't need SELECT rights 
  granted
  to every role you are a member of to select from the table. Additionally,
  if an admin wants to AND the quals together then they can simply create
  a policy which does that rather than have 2 policies.
 
 It seems to me a pain on database administration, if we have to pay attention
 not to conflict each RLS-policy.

 This notion of 'conflict' doesn't make much sense to me.  What is
 'conflicting' here?  Each policy would simply need to stand on its own
 for the role which it's being applied to.  That's very simple and
 straight-forward.

 I expect 90% of RLS-policy will be configured
 to PUBLIC user, to apply everybody same rules on access. In this case, DBA
 has to ensure the target table has no policy or existing policy does not
 conflict with the new policy to be set.
 I don't think it is a good idea to enforce DBA these checks.

 If the DBA only uses PUBLIC then they have to ensure that each policy
 they set up for PUBLIC can stand on its own- though, really, I expect if
 they go that route they'd end up with just one policy that calls a
 stored procedure...

   You are suggesting instead that if application 2 sets up policies on the
  table and then application 1 adds another policy that it should reduce what
  application 2's users can see?  That doesn't make any sense to me.  I'd
  actually expect these applications to at least use different roles anyway,
  which means they could each have a single role specific policy which only
  returns what that application is allowed to see.
 
 I don't think this assumption is reasonable.
 Please expect two applications: app-X that is a database security product
 to apply access control based on remote ip-address of the client for any
 table accesses by any database roles. app-Y that is a usual enterprise
 package for daily business data, with RLS-policy.
 What is the expected behavior in this case?

 That the DBA manage the rights on the tables.  I expect that will be
 required for quite a while with PG.  It's nice to think of these
 application products that will manage all access for users by setting up
 their own policies, but we have yet to even discuss how they would have
 appropriate rights on the table to be able to do so (and to not
 interfere with each other..).

 Let's at least get something which is generally useful in.  I'm all for
 trying to plan out how to get there and would welcome suggestions you
 have which are specific to PG on what we could do here (I'm not keen on
 just trying to mimic another product completely...), but at the level

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 8 Červenec 2014, 14:49, Robert Haas wrote:
 On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
 once the table is built and there's free space in work_mem. The patch
 mentioned above makes implementing this possible / rather simple.

 Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we
 run out of memory, increase NTUP_PER_BUCKET.  I'd like to think that
 the common case is that work_mem will be large enough that everything
 fits; and if you do it that way, then you save yourself the trouble of
 rehashing later, which as you point out might lose if there are only a
 few probes.  If it turns out that you run short of memory, you can
 merge pairs of buckets up to three times, effectively doubling
 NTUP_PER_BUCKET each time.

 Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer
 relations it may be way cheaper to use higher NTUP_PER_BUCKET values
 instead of increasing the number of batches (resulting in repeated scans
 of the outer table). I think it's important (but difficult) to keep these
 things somehow balanced.

Right, I think that's clear.  I'm just pointing out that you get to
decide: you can either start with a larger NTUP_PER_BUCKET and then
reduce it if you enough memory left, or you can start with a smaller
NTUP_PER_BUCKET and then increase it if you run short of memory.

 Yet another idea is to stick with your scheme, but do the dynamic
 bucket splits lazily.  Set a flag on each bucket indicating whether or
 not it needs a lazy split.  When someone actually probes the hash
 table, if the flag is set for a particular bucket, move any tuples
 that don't belong as we scan the bucket.  If we reach the end of the
 bucket chain, clear the flag.

 I don't think a lazy scheme makes much sense here. There are two clearly
 separated phases - loading the table and search.

 Also, it seems to me it can't work as described. Say we build a table and
 then find out we need a table 4x the size. If I understand your proposal
 correctly, you'd just resize buckets array, copy the existing buckets
 (first 1/4 buckets) and set 'needs_split' flags. Correct?

 Well, the problem is that while scanning a bucket you already need to know
 which tuples belong there. But with the lazy approach, the tuples might
 still be in the old (not yet split) buckets. So you'd have to search the
 current bucket and all buckets that were not split yet. Or did I miss
 something?

If you have, say, bucket 0..63 and you decide to change it to buckets
0..255, then any tuple that isn't in bucket N has to be in bucket N %
64.  More generally, any time you expand or contract by a power of two
it's pretty easy to figure out where tuples have to go.

But even if you do something that's not a power of two, like say a 10x
compaction of the buckets, there's still only two buckets that can
contain any particular hash value.  If the hash value is H, the old
number of buckets is O, and the new number of buckets is N, then the
tuple has got to be in bucket H % O or bucket H % N.  If bucket H % O
doesn't have the needs-split flag set, then it must be in H % N (if it
exists at all).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I doubt it.  The extra code isn't the problem so much, it's the extra
 planning cycles spent trying to make proofs that will usually fail.
 What you propose will create a combinatorial explosion in the number
 of proof paths to be considered.

 I can understand that it will create combinatorial explosion in the number
 of predicates that need to be examined by the constraint exclusion. I do
 not understand where come the paths gets involved here. The constraint
 exclusion kicks in before paths are created

Perhaps I should not have used the term path here, because I wasn't
referring to what the planner calls Paths.  I just meant that there will
be many more ways to perhaps prove a constraint-exclusion result, and the
planner will have to go through them all.  (Usually to no avail, because
how often do people write queries that are guaranteed to produce no rows?)

An example of what I meant by combinatorial explosion is that if a clause
mentions K variables, and each of those variables has been equated to M
other variables, there are (M+1)^K possible derived clauses, and it looks
to me like we'd have to consider them all to catch the sort of situation
you're talking about.

I think this is going to require a *lot* of additional planner cycles,
and TBH you've not provided any very compelling example of why it's
worthwhile.  Yeah, if you can exclude a large partition it's a win,
but how often is that going to happen in real-world queries?  The one
example you gave seemed pretty artificial to me, ie unrelated to typical
partitioning constraints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query_is_distinct_for does not take into account set returning functions

David Rowley dgrowle...@gmail.com writes:
 I think we should probably include the logic to test for set returning
 functions into query_is_distinct_for.

It strikes me that there's only a problem if the SRF is in a tlist entry
that is not one of the DISTINCT or GROUP BY columns, respectively.  It
may not be worth the extra complexity to figure that out, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to use Makefile - pgxs without gcc -O2 ?

2014-07-08 Thread geohas


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

hi,

is there any hint to tell pgxs to compile with gcc -O0 (CFLAGS) ?

regards

geohas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTvA4AAAoJEJFGMlQe7wR/6PAIAIy6619E9WpksRxaT+t132Py
+jsiIhuj0qfyRHglILmyQLITc1Y3MoTO0jjBfhspBZE/NdOMd2uqf+0+QMIBdy9L
vnM36quDUs5Zd0Ix8EWG75l9z4aeRRZG5Sh4pH8gGbOYwWPT9uefl7/4bvIeMD3c
0Zsmgt3nE6F5Yxptud+RQNf4ppOfh+46K0Y24WSXW5V99YuLaOfVJXbe0IyU9Zna
/Ib6UqRwKaBoHwqub3jVZTcCKk3CXigqeQ9dJffw0e+3uMOlajRLTQ2CDlZKg6I+
Pakzlwby6szFsMEG13S8e32QuQ/X3lCxPfl2Us+78R4vXpfKSdqvfdIMEtbWTLM=
=aUO6
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] LEFT JOINs not optimized away when not needed

2014-07-08 Thread David Fetter

On Tue, Jul 08, 2014 at 11:19:31AM -0400, Tom Lane wrote:
 Moshe Jacobson mo...@neadwerx.com writes:
 
  Seeing that there is only one output column, and that the results are
  grouped by this output column, it seems to me that the optimizer should not
  even look at the rest of the tables.
 
 The GROUP BY has nothing to do with it, but if all the other tables' join
 keys are primary keys (or at least unique), I'd expect the planner to get
 rid of the joins.  However, I'm not sure whether it works completely when
 there are more than join_collapse_limit relations to worry about.

Eliminating JOINs seems orthogonal, at least in theory, to
join_collapse_limit.  What have I missed here, and how might they have
dependencies?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to use Makefile - pgxs without gcc -O2 ?

geohas li...@hasibether.at writes:
 is there any hint to tell pgxs to compile with gcc -O0 (CFLAGS) ?

I tend to use

make PROFILE=-O0

which relies on knowing that PG's make rules append $(PROFILE) to CFLAGS.
Alternatively you could just override CFLAGS:

make CFLAGS=whatever

but this requires knowing exactly what configure put into CFLAGS so
that you don't remove any flags that you still want.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to use Makefile - pgxs without gcc -O2 ?

2014-07-08 Thread geohas


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thank you Tom!

make CFLAGS=-O0  was it.

now gdb doesn't print Value optimized out.

regards


On 08/07/14 17:45, Tom Lane wrote:
 geohas li...@hasibether.at writes:
 is there any hint to tell pgxs to compile with gcc -O0 (CFLAGS) ?

 I tend to use

 make PROFILE=-O0

 which relies on knowing that PG's make rules append $(PROFILE) to CFLAGS.
 Alternatively you could just override CFLAGS:

 make CFLAGS=whatever

 but this requires knowing exactly what configure put into CFLAGS so
 that you don't remove any flags that you still want.

 regards, tom lane



-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTvBP3AAoJEJFGMlQe7wR/ACYIAJufplHygGUpKft8+S0W/Mpu
+tLX8bWpFd9Zg1fD1+gC0AglN6hSmiBc1TpSejOSruVUnFRoIAPbzDp+lSUpR4Hm
nC99Zjr71Qosinrwh3upLcN+CvM5jG+J5SLJfjTK1z5CLR0imudTQLXrNw8j0rFu
mgH5gKYa+/idxKnG1Hf8dzvdGIUOiRxDvX0F+FXgESedur6/voYgA6utdQBjHVLS
rfdNncWtcvswNhvhBfaZX30Nv9w+tiY/i+d/fR5mxU1D6RPDDy2JDIiQcX4ZstaS
1Rc4GYfH780TJxxp8whZ1cVgCcUC6G1cC23bvBfScuZn14+1qMsyHX/AjQRpugQ=
=lDtn
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On 8 Červenec 2014, 16:16, Robert Haas wrote:
 On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer
 relations it may be way cheaper to use higher NTUP_PER_BUCKET values
 instead of increasing the number of batches (resulting in repeated scans
 of the outer table). I think it's important (but difficult) to keep
 these
 things somehow balanced.

 Right, I think that's clear.  I'm just pointing out that you get to
 decide: you can either start with a larger NTUP_PER_BUCKET and then
 reduce it if you enough memory left, or you can start with a smaller
 NTUP_PER_BUCKET and then increase it if you run short of memory.

I don't think those two approaches are equal.

With the approach I took, I can use a compromise value (NTUP=4) initially,
and only resize the hash table once at the end (while keeping the amount
of memory under work_mem all the time).

With the NTUP=1 and increase in case of memory pressure you have to
shrink the table immediately (to fit into work_mem), and if the hash table
gets split into multiple batches you're suddenly in a situation with lower
memory pressure and you may need to increase it again.

 Yet another idea is to stick with your scheme, but do the dynamic
 bucket splits lazily.  Set a flag on each bucket indicating whether or
 not it needs a lazy split.  When someone actually probes the hash
 table, if the flag is set for a particular bucket, move any tuples
 that don't belong as we scan the bucket.  If we reach the end of the
 bucket chain, clear the flag.

 I don't think a lazy scheme makes much sense here. There are two clearly
 separated phases - loading the table and search.

 Also, it seems to me it can't work as described. Say we build a table
 and
 then find out we need a table 4x the size. If I understand your proposal
 correctly, you'd just resize buckets array, copy the existing buckets
 (first 1/4 buckets) and set 'needs_split' flags. Correct?

 Well, the problem is that while scanning a bucket you already need to
 know
 which tuples belong there. But with the lazy approach, the tuples might
 still be in the old (not yet split) buckets. So you'd have to search the
 current bucket and all buckets that were not split yet. Or did I miss
 something?

 If you have, say, bucket 0..63 and you decide to change it to buckets
 0..255, then any tuple that isn't in bucket N has to be in bucket N %
 64.  More generally, any time you expand or contract by a power of two
 it's pretty easy to figure out where tuples have to go.

OK.

 But even if you do something that's not a power of two, like say a 10x
 compaction of the buckets, there's still only two buckets that can
 contain any particular hash value.  If the hash value is H, the old
 number of buckets is O, and the new number of buckets is N, then the
 tuple has got to be in bucket H % O or bucket H % N.  If bucket H % O
 doesn't have the needs-split flag set, then it must be in H % N (if it
 exists at all).

I wonder if this is really worth the effort - my guess is it's efficient
only if large portion of buckets is not visited (and thus does not need to
be split) at all. Not sure how common that is (our workloads certainly are
not like that).

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to use Makefile - pgxs without gcc -O2 ?

2014-07-08 Thread Piotr Stefaniak

On 07/08/2014 17:53, geohas wrote:
 make CFLAGS=-O0  was it.
 
 now gdb doesn't print Value optimized out.
If you're using GCC 4.8 or later, consider using it with -Og for that
kind of builds.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-08 Thread Peter Geoghegan

Hi,

On Tue, Jul 8, 2014 at 6:58 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 2) Checking process goes through all the existing columns of a
 relation even a difference of 1 with some other column(s) has already
 been found. As we try to limit the number of hints returned, this
 seems like a waste of resources.

In general it's possible that an exact match will later be found
within the RTE, and exact matches don't have to pay the wrong alias
penalty, and are immediately returned. It is therefore not a waste of
resources, but even if it was that would be pretty inconsequential as
your benchmark shows.

 3) distanceName could be improved, by for example having some checks
 on the string lengths of target and source columns, and immediately
 reject the match if for example the length of the source string is the
 double/half of the length of target.

I don't think it's a good idea to tie distanceName() to the ultimate
behavior of errorMissingColumn() hinting, since there may be other
callers in the future. Besides, that isn't going to help much.

 4) This is not nice, could it be possible to remove the stuff from varlena.c?
 +/* Expand each Levenshtein distance variant */
 +#include levenshtein.c
 +#define LEVENSHTEIN_LESS_EQUAL
 +#include levenshtein.c
 +#undef LEVENSHTEIN_LESS_EQUAL
 Part of the same comment: only varstr_leven_less_equal is used to
 calculate the distance, should we really move varstr_leven to core?
 This clearly needs to be reworked as not just a copy-paste of the
 things in fuzzystrmatch.
 The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think.

So there'd be one variant within core and one within
contrib/fuzzystrmatch? I don't think that's an improvement.

 5) Do we want hints on system columns as well?

I think it's obvious that the answer must be no. That's going to
frequently result in suggestions of columns that users will complain
aren't even there. If you know about the system columns, you can just
get it right. They're supposed to be hidden for most purposes.

 6) Sometimes no hints are returned... Even in simple cases like this one:
 =# create table foo (aa int, bb int);
 CREATE TABLE
 =# select ab from foo;
 ERROR:  42703: column ab does not exist
 LINE 1: select ab from foo;
^
 LOCATION:  errorMissingColumn, parse_relation.c:3123

That's because those two candidates come from a single RTE and have an
equal distance -- you'd see both suggestions if you joined two tables
with each candidate, assuming that each table being joined didn't
individually have the same issue. I think that that's probably
considered the correct behavior by most.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On Tue, Jul 8, 2014 at 12:06 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 8 Červenec 2014, 16:16, Robert Haas wrote:
 On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer
 relations it may be way cheaper to use higher NTUP_PER_BUCKET values
 instead of increasing the number of batches (resulting in repeated scans
 of the outer table). I think it's important (but difficult) to keep
 these
 things somehow balanced.

 Right, I think that's clear.  I'm just pointing out that you get to
 decide: you can either start with a larger NTUP_PER_BUCKET and then
 reduce it if you enough memory left, or you can start with a smaller
 NTUP_PER_BUCKET and then increase it if you run short of memory.

 I don't think those two approaches are equal.

 With the approach I took, I can use a compromise value (NTUP=4) initially,
 and only resize the hash table once at the end (while keeping the amount
 of memory under work_mem all the time).

 With the NTUP=1 and increase in case of memory pressure you have to
 shrink the table immediately (to fit into work_mem), and if the hash table
 gets split into multiple batches you're suddenly in a situation with lower
 memory pressure and you may need to increase it again.

True.  On the other hand, this really only comes into play when the
estimates are wrong.  If you know at the start how many tuples you're
going to need to store and how big they are, you determine whether
NTUP_PER_BUCKET=1 is going to work before you even start building the
hash table.  If it isn't, then you use fewer buckets right from the
start.  If we start by estimating a small value for NTUP_PER_BUCKET
and then let it float upward if we turn out to have more tuples than
expected, we're optimizing for the case where our statistics are
right.  If we start by estimating a larger value for NTUP_PER_BUCKET
than what we think we need to fit within work_mem, we're basically
guessing that our statistics are more likely to be wrong than to be
right.  I think.

 I wonder if this is really worth the effort - my guess is it's efficient
 only if large portion of buckets is not visited (and thus does not need to
 be split) at all. Not sure how common that is (our workloads certainly are
 not like that).

Yeah.  It may be a bad idea.  I threw it out there as a possible way
of trying to mitigate the worst case, which is when you trouble to
build the hash table and then make very few probes.  But that may not
be worth the complexity that this would introduce.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-08 Thread Christoph Berg

Re: Noah Misch 2014-06-08 20140608135713.ga525...@tornado.leadboat.com
 Here's an update that places the socket in a temporary subdirectory of /tmp.
 The first attached patch adds NetBSD mkdtemp() to libpgport.  The second,
 principal, patch uses mkdtemp() to implement this design in pg_regress.  The
 corresponding change to contrib/pg_upgrade/test.sh is based on the configure
 script's arrangements for its temporary directory.

Hi,

I believe pg_upgrade itself still needs a fix. While it's not a
security problem to put the socket in $CWD while upgrading (it is
using -c unix_socket_permissions=0700), this behavior is pretty
unexpected, and does fail if your $CWD is  107 bytes.

In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl
perl tests to avoid that problem, so imho it would make even more
sense to fix pg_upgrade which could also fail in production.

This has been discussed here and elsewhere [1] before, but was
rejected as not being in line what the other utilities do, but now
pg_upgrade is the lone outlier. Noah's changes let Debian drop 4 out
of 5 pg_regress-sockdir patches, having this fixed would also let us
get rid of the last one [2].

[1] http://lists.debian.org/debian-wb-team/2013/05/msg00015.html
[2] 
https://alioth.debian.org/scm/loggerhead/pkg-postgresql/postgresql-9.5/trunk/view/head:/debian/patches/64-pg_upgrade-sockdir

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

On Sat, Jul 5, 2014 at 10:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 Please find the patch attached to address the above concern.  I have
 updated docs, so that users can be aware of such behaviour.

 I'm in the camp that says that we need to do something about this, not
 just claim that it's operating as designed.  AFAICS, the *entire* argument
 for having ALTER SYSTEM at all is ease of use; but this behavior is not
 what I would call facilitating ease of use.  In particular, if you are
 conveniently able to edit postgresql.conf, what the heck do you need
 ALTER SYSTEM for?

 One possible answer is to modify guc-file.l so that only the last value
 supplied for any one GUC gets processed.  The naive way of doing that
 would be O(N^2) in the number of uncommented settings, which might or
 might not be a problem; if it is, we could no doubt devise a smarter
 data structure.

I've been really annoyed by the current behavior even on older
releases because I have a script, which I use frequently, that does
this:

initdb
cat  $PGDATA/postgresql.conf EOM
shared_buffers=8GB
another setting
another setting
another setting
EOM

Now, sometimes while experimenting, I will change a setting in
postgresql.conf and reload the config.  At which point it complains
that it can't change shared_buffers at runtime.  It does this because
the line from initdb is in the file, and so is the one I added.  But
I'm *not* trying to change shared_buffers.  Sure, the first value in
the config file doesn't match the current value, but the *last* line
in the config file is the one that's supposed to control, so why is it
complaining about the earlier line?

I haven't looked at the code in this area too carefully, but it seems
to me like the flow ought to be:

1. Read all of the config files and determine what the final value
present in each config file is.
2. *Then*, in a second pass, enforce requirements like can't be
changed except at server start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-08 Thread Josh Berkus

On 07/08/2014 10:07 AM, Robert Haas wrote:
 I haven't looked at the code in this area too carefully, but it seems
 to me like the flow ought to be:
 
 1. Read all of the config files and determine what the final value
 present in each config file is.
 2. *Then*, in a second pass, enforce requirements like can't be
 changed except at server start.

+1

This would also make conf.d much more useful; I wouldn't have to worry
as much about overlapping config settings.

Sounds like a 9.5 feature, though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-07-08 Thread Jeff Janes

On Thu, Jun 26, 2014 at 4:26 PM, Andreas Karlsson andr...@proxel.se wrote:

 On 06/24/2014 03:20 AM, Jeff Janes wrote:

 I've tried your 0001 patch, reflecting this refactoring, on Linux and it
 caused 'make check' to hang at 'starting postmaster'.


 I found the bug in the code, and I have attached the a patch which you can
 apply on top of the patch. The regression tests pass now on my Debian
 machine.


Your fix works for me as well.  Thanks.

Is there some recipe for testing the 0002 patch?  Can it be tested on an
MinGW environment, or does it need to use the MicroSoft supplied compilers?

Thanks,

Jeff

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-08 Thread Josh Berkus

On 07/06/2014 01:27 AM, Christoph Berg wrote:
 Another could be that during initdb all the uncommented settings be
 written to postgresql.auto.conf file rather than to postgresql.conf.
 I think we can do this by changing code in initdb.c-setup_config().
 This will ensure that unless user is changing settings in a mixed way
 (some by Alter System and some manually by editing postgresql.conf),
 there will no such problem.

There is no reasonable way for us to prevent issues for users who are
manually changing both pg.conf and pg.auto.conf.  Users should stick to
one or the other method of management (or, thirdly, using conf.d); if
they mix methods, we can't prevent confusion at restart time and we
shouldn't try.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-08 Thread Noah Misch

On Tue, Jul 08, 2014 at 07:02:04PM +0200, Christoph Berg wrote:
 Re: Noah Misch 2014-06-08 20140608135713.ga525...@tornado.leadboat.com
  Here's an update that places the socket in a temporary subdirectory of /tmp.
  The first attached patch adds NetBSD mkdtemp() to libpgport.  The second,
  principal, patch uses mkdtemp() to implement this design in pg_regress.  The
  corresponding change to contrib/pg_upgrade/test.sh is based on the 
  configure
  script's arrangements for its temporary directory.
 
 Hi,
 
 I believe pg_upgrade itself still needs a fix. While it's not a
 security problem to put the socket in $CWD while upgrading (it is
 using -c unix_socket_permissions=0700), this behavior is pretty
 unexpected, and does fail if your $CWD is  107 bytes.
 
 In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl
 perl tests to avoid that problem, so imho it would make even more
 sense to fix pg_upgrade which could also fail in production.

+1.  Does writing that patch interest you?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-08 Thread Christoph Berg

Re: Noah Misch 2014-07-08 20140708174125.ga1884...@tornado.leadboat.com
 On Tue, Jul 08, 2014 at 07:02:04PM +0200, Christoph Berg wrote:
  Re: Noah Misch 2014-06-08 20140608135713.ga525...@tornado.leadboat.com
   Here's an update that places the socket in a temporary subdirectory of 
   /tmp.
   The first attached patch adds NetBSD mkdtemp() to libpgport.  The second,
   principal, patch uses mkdtemp() to implement this design in pg_regress.  
   The
   corresponding change to contrib/pg_upgrade/test.sh is based on the 
   configure
   script's arrangements for its temporary directory.
  
  Hi,
  
  I believe pg_upgrade itself still needs a fix. While it's not a
  security problem to put the socket in $CWD while upgrading (it is
  using -c unix_socket_permissions=0700), this behavior is pretty
  unexpected, and does fail if your $CWD is  107 bytes.
  
  In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl
  perl tests to avoid that problem, so imho it would make even more
  sense to fix pg_upgrade which could also fail in production.
 
 +1.  Does writing that patch interest you?

I'll give it a try once I've finished this CF review.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On 8.7.2014 19:00, Robert Haas wrote:
 On Tue, Jul 8, 2014 at 12:06 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 8 Červenec 2014, 16:16, Robert Haas wrote:

 Right, I think that's clear. I'm just pointing out that you get
 to decide: you can either start with a larger NTUP_PER_BUCKET and
 then reduce it if you enough memory left, or you can start with a
 smaller NTUP_PER_BUCKET and then increase it if you run short of
 memory.

 I don't think those two approaches are equal.

 With the approach I took, I can use a compromise value (NTUP=4) 
 initially, and only resize the hash table once at the end (while
 keeping the amount of memory under work_mem all the time).

 With the NTUP=1 and increase in case of memory pressure you have 
 to shrink the table immediately (to fit into work_mem), and if the 
 hash table gets split into multiple batches you're suddenly in a
 situation with lower memory pressure and you may need to increase
 it again.
 
 True. On the other hand, this really only comes into play when the 
 estimates are wrong. If you know at the start how many tuples you're 
 going to need to store and how big they are, you determine whether 
 NTUP_PER_BUCKET=1 is going to work before you even start building
 the hash table. If it isn't, then you use fewer buckets right from
 the start. If we start by estimating a small value for
 NTUP_PER_BUCKET and then let it float upward if we turn out to have
 more tuples than expected, we're optimizing for the case where our
 statistics are right. If we start by estimating a larger value for
 NTUP_PER_BUCKET than what we think we need to fit within work_mem,
 we're basically guessing that our statistics are more likely to be
 wrong than to be right. I think.

Good point. The fist patch was targetted exactly at the wrongly
estimated queries. This patch attempts to apply the rehash to all plans,
and maybe there's a better way.

If the estimates are correct / not too off, we can use this information
to do the sizing 'right' at the beginning (without facing rehashes later).

Over-estimates are not a problem, because it won't make the hash table
slower (it'll be sized for more tuples) and we can't change the number
of batches anyway.

With under-estimates we have to decide whether to resize the hash or
increase the number of batches.

In both cases that matter (correct estimates and under-estimates) we
have to decide whether to increase the number of buckets or batches. I'm
not sure how to do that.

 I wonder if this is really worth the effort - my guess is it's 
 efficient only if large portion of buckets is not visited (and
 thus does not need to be split) at all. Not sure how common that is
 (our workloads certainly are not like that).
 
 Yeah. It may be a bad idea. I threw it out there as a possible way of
 trying to mitigate the worst case, which is when you trouble to build
 the hash table and then make very few probes. But that may not be
 worth the complexity that this would introduce.

Let's keep it simple for now. I think the sizing question (explained
above) is more important and needs to be solved first.

regards
Tomas




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-08 Thread Christoph Berg

Hi,

here's my review for this patch.

Generally, the patch addresses a real need, tables often only turn
out to be desired as unlogged if there are performance problems in
practice, and the other way round changing an unlogged table to logged
is way more involved manually than it could be with this patch. So
yes, we want the feature.

I've tried the patch, and it works as desired, including tab
completion in psql. There are a few issues to be resolved, though.

Re: Fabrízio de Royes Mello 2014-06-26 
cafcns+pyv0pbjlo97osyqv1yqoc7s+jgvwxc8uny5jsrdwy...@mail.gmail.com
 As part of GSoC2014 and with agreement of my mentor and reviewer (Stephen
 Frost) I've send a complement of the first patch to add the capability to
 change a regular table to unlogged.

(Fwiw, I believe this direction is the more interesting one.)

 With this patch we finish the main goals of the GSoC2014 and now we'll work
 in the additional goals.

... that being the non-WAL-logging with wal_level=minimal, or more?


 diff --git a/doc/src/sgml/ref/alter_table.sgml 
 b/doc/src/sgml/ref/alter_table.sgml
 index 69a1e14..424f2e9 100644
 --- a/doc/src/sgml/ref/alter_table.sgml
 +++ b/doc/src/sgml/ref/alter_table.sgml
 @@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ] replaceable 
 class=PARAMETERname/replaceable
  ENABLE REPLICA RULE replaceable 
 class=PARAMETERrewrite_rule_name/replaceable
  ENABLE ALWAYS RULE replaceable 
 class=PARAMETERrewrite_rule_name/replaceable
  CLUSTER ON replaceable class=PARAMETERindex_name/replaceable
 +SET {LOGGED | UNLOGGED}
  SET WITHOUT CLUSTER
  SET WITH OIDS
  SET WITHOUT OIDS

This must not be between the two CLUSTER lines. I think the best spot
would just be one line down, before SET WITH OIDS.

(While we are at it, SET TABLESPACE should probably be moved up to the
other SET options...)

 @@ -432,6 +433,20 @@ ALTER TABLE [ IF EXISTS ] replaceable 
 class=PARAMETERname/replaceable
 /varlistentry
  
 varlistentry
 +termliteralSET {LOGGED | UNLOGGED}/literal/term
 +listitem
 + para
 +  This form change the table persistence type from unlogged to permanent 
 or 

This grammar bug pops up consistently: This form *changes*...

There's trailing whitespace.

 +  from unlogged to permanent by rewriting the entire contents of the 
 table 
 +  and associated indexes into new disk files.
 + /para
 + para
 +  Changing the table persistence type acquires an literalACCESS 
 EXCLUSIVE/literal lock.
 + /para

I'd rewrite that and add a reference:

... from unlogged to permanent (see xref linkend=SQL-CREATETABLE-UNLOGGED).
/para
para
Changing the table persistence type acquires an literalACCESS 
EXCLUSIVE/literal lock
and rewrites the table contents and associated indexes into new disk files.
/para


 diff --git a/src/backend/commands/tablecmds.c 
 b/src/backend/commands/tablecmds.c
 index 60d387a..9dfdca2 100644
 --- a/src/backend/commands/tablecmds.c
 +++ b/src/backend/commands/tablecmds.c
 @@ -125,18 +125,19 @@ static List *on_commits = NIL;
   * a pass determined by subcommand type.
   */
  
 -#define AT_PASS_UNSET-1  /* UNSET will 
 cause ERROR */
 -#define AT_PASS_DROP 0   /* DROP (all flavors) */
 -#define AT_PASS_ALTER_TYPE   1   /* ALTER COLUMN TYPE */
 -#define AT_PASS_OLD_INDEX2   /* re-add existing 
 indexes */
 -#define AT_PASS_OLD_CONSTR   3   /* re-add existing 
 constraints */
 -#define AT_PASS_COL_ATTRS4   /* set other column 
 attributes */
 +#define AT_PASS_UNSET-1  /* 
 UNSET will cause ERROR */
 +#define AT_PASS_DROP 0   /* DROP (all 
 flavors) */
 +#define AT_PASS_ALTER_TYPE   1   /* ALTER COLUMN 
 TYPE */
 +#define AT_PASS_OLD_INDEX2   /* re-add 
 existing indexes */
 +#define AT_PASS_OLD_CONSTR   3   /* re-add 
 existing constraints */
 +#define AT_PASS_COL_ATTRS4   /* set other 
 column attributes */
  /* We could support a RENAME COLUMN pass here, but not currently used */
 -#define AT_PASS_ADD_COL  5   /* ADD COLUMN */
 -#define AT_PASS_ADD_INDEX6   /* ADD indexes */
 -#define AT_PASS_ADD_CONSTR   7   /* ADD constraints, 
 defaults */
 -#define AT_PASS_MISC 8   /* other stuff */
 -#define AT_NUM_PASSES9
 +#define AT_PASS_ADD_COL  5   /* ADD 
 COLUMN */
 +#define AT_PASS_ADD_INDEX6   /* ADD indexes 
 */
 +#define AT_PASS_ADD_CONSTR   7   /* ADD 
 constraints, defaults */
 +#define AT_PASS_MISC 8   /* other stuff 
 */

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Jeff Janes

On Tue, Jul 8, 2014 at 6:35 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 8 Červenec 2014, 14:49, Robert Haas wrote:
  On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote:
  I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
  once the table is built and there's free space in work_mem. The patch
  mentioned above makes implementing this possible / rather simple.
 
  Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we
  run out of memory, increase NTUP_PER_BUCKET.  I'd like to think that
  the common case is that work_mem will be large enough that everything
  fits; and if you do it that way, then you save yourself the trouble of
  rehashing later, which as you point out might lose if there are only a
  few probes.  If it turns out that you run short of memory, you can
  merge pairs of buckets up to three times, effectively doubling
  NTUP_PER_BUCKET each time.

 Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer
 relations it may be way cheaper to use higher NTUP_PER_BUCKET values
 instead of increasing the number of batches (resulting in repeated scans
 of the outer table). I think it's important (but difficult) to keep these
 things somehow balanced.

 With large work_mem values the amount of memory for buckets may be quite
 significant. E.g. 800MB work_mem may easily give ~120MB of memory taken by
 buckets with NTUP_PER_BUCKET=1. With NTUP_PER_BUCKET=4 it's just ~30MB.



That extra 90MB is memory well spent, in my book.  Versus having to walk a
4-deep linked list which is scattered all over main memory just to find one
entry we care about in it.

It might cause some things that were very close to the edge to tip over
into multi-pass hash joins, but even that is not necessarily a bad thing.
 (When I test with work_mem in the 20 to 50 MB range, I often find
batches=2 be ~30% faster than batches=1, I think because partitioning into
main memory using sequential writes is not much of a burden, and building
and walking two hash tables that both fit in L3 cache is much faster than
building 1 hash table in main memory, and more than makes up for the work
of partitioning.  Presumably that situation doesn't apply to work_mem
900MB, but isn't NUMA about the same thing as L4 cache, in effect?).

And if someone had a whole bunch of hash joins which were right in that
anti-sweet spot, all they have to do is increase work_mem by (at most) 15%
to get out of it.  work_mem is basically impossible to tune, so I doubt
anyone exists who has a reasonable setting for it which can' be increased
by 15% and still be reasonable.  And if someone does have it tuned so
tightly, they probably won't be upgrading to new major versions without
expecting to do some re-tuning.

Cheers,

Jeff

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-08 Thread Bruce Momjian

On Tue, Jul  8, 2014 at 08:21:48PM +0200, Christoph Berg wrote:
 Re: Noah Misch 2014-07-08 20140708174125.ga1884...@tornado.leadboat.com
  On Tue, Jul 08, 2014 at 07:02:04PM +0200, Christoph Berg wrote:
   Re: Noah Misch 2014-06-08 20140608135713.ga525...@tornado.leadboat.com
Here's an update that places the socket in a temporary subdirectory of 
/tmp.
The first attached patch adds NetBSD mkdtemp() to libpgport.  The 
second,
principal, patch uses mkdtemp() to implement this design in pg_regress. 
 The
corresponding change to contrib/pg_upgrade/test.sh is based on the 
configure
script's arrangements for its temporary directory.
   
   Hi,
   
   I believe pg_upgrade itself still needs a fix. While it's not a
   security problem to put the socket in $CWD while upgrading (it is
   using -c unix_socket_permissions=0700), this behavior is pretty
   unexpected, and does fail if your $CWD is  107 bytes.
   
   In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl
   perl tests to avoid that problem, so imho it would make even more
   sense to fix pg_upgrade which could also fail in production.
  
  +1.  Does writing that patch interest you?
 
 I'll give it a try once I've finished this CF review.

OK.  Let me know if you need help.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-08 Thread Alvaro Herrera

Peter Geoghegan wrote:

  6) Sometimes no hints are returned... Even in simple cases like this one:
  =# create table foo (aa int, bb int);
  CREATE TABLE
  =# select ab from foo;
  ERROR:  42703: column ab does not exist
  LINE 1: select ab from foo;
 ^
  LOCATION:  errorMissingColumn, parse_relation.c:3123
 
 That's because those two candidates come from a single RTE and have an
 equal distance -- you'd see both suggestions if you joined two tables
 with each candidate, assuming that each table being joined didn't
 individually have the same issue. I think that that's probably
 considered the correct behavior by most.

It seems pretty silly to me actually.  Was this designed by a committee?
I agree with the general principle that showing a large number of
candidates (a la bash) is a bad idea, but failing to show two of them ...

Words fail me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-08 Thread Peter Geoghegan

On Tue, Jul 8, 2014 at 1:42 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 That's because those two candidates come from a single RTE and have an
 equal distance -- you'd see both suggestions if you joined two tables
 with each candidate, assuming that each table being joined didn't
 individually have the same issue. I think that that's probably
 considered the correct behavior by most.

 It seems pretty silly to me actually.  Was this designed by a committee?
 I agree with the general principle that showing a large number of
 candidates (a la bash) is a bad idea, but failing to show two of them ...

I guess it was designed by a committee. But we don't fail to show both
because they're equally distant. Rather, it's because they're equally
distant and from the same RTE. This is a contrived example, but
typically showing equally distant columns is useful when they're in a
foreign-key relationship - I was worried about the common case where a
column name is misspelled that would otherwise be ambiguous, which is
why that shows a HINT while the single RTE case doesn't. I think that
in most realistic cases it wouldn't be all that useful to show two
columns from the same table when they're equally distant. It's easy to
imagine that reflecting that no match is good in absolute terms, and
we're somewhat conservative about showing any match. While I think
this general behavior is defensible, I must admit that it did suit me
to write it that way because to do otherwise would have necessitated
more invasive code in the existing general purpose scanRTEForColumn()
function.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-08 Thread Peter Geoghegan

On Tue, Jul 8, 2014 at 1:55 PM, Peter Geoghegan p...@heroku.com wrote:
 I was worried about the common case where a
 column name is misspelled that would otherwise be ambiguous, which is
 why that shows a HINT while the single RTE case doesn't

To be clear - I mean a HINT with two suggestions rather than just one.
If there are 3 or more equally distant suggestions (even if they're
all from different RTEs) we also give no HINT in the proposed patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

On 8.7.2014 21:53, Jeff Janes wrote:
 On Tue, Jul 8, 2014 at 6:35 AM, Tomas Vondra t...@fuzzy.cz wrote:
 
 Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large
 outer relations it may be way cheaper to use higher NTUP_PER_BUCKET
 values instead of increasing the number of batches (resulting in
 repeated scans of the outer table). I think it's important (but
 difficult) to keep these things somehow balanced.
 
 With large work_mem values the amount of memory for buckets may be
 quite significant. E.g. 800MB work_mem may easily give ~120MB of
 memory taken by buckets with NTUP_PER_BUCKET=1. With
 NTUP_PER_BUCKET=4 it's just ~30MB.
 
 
 That extra 90MB is memory well spent, in my book. Versus having to
 walk a 4-deep linked list which is scattered all over main memory
 just to find one entry we care about in it.

Yes, I share this view, although it really depends on how expensive it's
to rescan the outer relation (due to more batches) vs. lookup in a
deeper hash table.

The other thing is that this memory is currently unaccounted for, i.e.
the space for buckets is not counted within the work_mem limit (unless
I'm missing something in the code). I'm not sure why, and I believe it
should be, so I changer this in the patch.

 It might cause some things that were very close to the edge to tip
 over into multi-pass hash joins, but even that is not necessarily a
 bad thing. (When I test with work_mem in the 20 to 50 MB range, I
 often find batches=2 be ~30% faster than batches=1, I think because 
 partitioning into main memory using sequential writes is not much of
 a burden, and building and walking two hash tables that both fit in
 L3 cache is much faster than building 1 hash table in main memory,
 and more than makes up for the work of partitioning. Presumably that
 situation doesn't apply to work_mem 900MB, but isn't NUMA about the
 same thing as L4 cache, in effect?).

Yes, I've seen cases where plans with (nbatch1) were faster than a plan
with (nbatch=1). I'm not entirely sure why :-( but I have two hypotheses
so far:

(a) Caching within CPU (current CPUs have multiple MBs of L2 cache),
which may make a difference, especially in the size range you've
mentioned.

(b) Lower tuple/bucket ratio - this may easily happen for example if
the estimates are slighly lower than reality (either row count or
row width) and narrowly exceed work_mem, thus forcing batching.
The resulting hash table has ~50% tuple/bucket on average, and
thus is faster.

 And if someone had a whole bunch of hash joins which were right in
 that anti-sweet spot, all they have to do is increase work_mem by (at
 most) 15% to get out of it. work_mem is basically impossible to tune,
 so I doubt anyone exists who has a reasonable setting for it which
 can' be increased by 15% and still be reasonable. And if someone does
 have it tuned so tightly, they probably won't be upgrading to new
 major versions without expecting to do some re-tuning.

Right. Still, it's not really nice to get slower hash joins after
upgrading to a new version - I somehow expect to get the same or better
performance, if possible. So I'd like to make it as efficient as
possible, within the given memory limit.

Sadly, the increase may be needed anyway because of counting the bucket
memory into work_mem, as mentioned above. If committed, this should be
probably mentioned in release notes or something.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tweaking NTUP_PER_BUCKET

Hi,

Thinking about this a bit more, do we really need to build the hash
table on the first pass? Why not to do this:

(1) batching
- read the tuples, stuff them into a simple list
- don't build the hash table yet

(2) building the hash table
- we have all the tuples in a simple list, batching is done
- we know exact row count, can size the table properly
- build the table

Also, maybe we could use a regular linear hash table [1], instead of
using the current implementation with NTUP_PER_BUCKET=1. (Although,
that'd be absolutely awful with duplicates.)

regards
Tomas

[1] http://en.wikipedia.org/wiki/Linear_probing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing join removals for more join types

David Rowley dgrowle...@gmail.com writes:
On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I'm a bit skeptical as to whether testing for that case is actually worth
any extra complexity. Do you have a compelling use-case? But anyway,
if we do want to allow it, why does it take any more than adding a check
for Consts to the loops in query_is_distinct_for? It's the targetlist
entries where we'd want to allow Consts, not the join conditions.

Meh. I wrote a regression test that expects it is a pretty poor
rationale for adding logic. If you can't point to a real-world case
where this is important, I'm inclined to take it out.

If we were actually serious about exploiting such cases, looking for
bare Consts would be a poor implementation anyhow, not least because
const-folding has not yet been applied to the sub-select. I think we'd
want to do it for any pseudoconstant expression (no Vars, no volatile
functions); which is a substantially more expensive test.

Or put a quick-check subroutine next to query_is_distinct_for(). The code
we're skipping here is not so cheap that I want to blow off skipping it.
On review it looks like analyzejoins.c would possibly benefit from an
earlier fast-path check as well.

Right (and done, if you didn't notice already).

TBH I find the checks for FOR UPDATE and volatile functions to be
questionable as well. We have never considered those things to prevent
pushdown of quals into a subquery (cf subquery_is_pushdown_safe). I think
what we're talking about here is pretty much equivalent to pushing an
always-false qual into the subquery; if people haven't complained about
that, why should they complain about this? Or to put it in slightly more
principled terms, we've attempted to prevent subquery optimization from
causing volatile expressions to be evaluated *more* times than the naive
reading of the query would suggest, but we have generally not felt that
we needed to prevent them from happening *fewer* times.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder

2014-07-08 Thread Asif Naeem

Yes. Can you please take a look at Win_lib_bin.patch I shared earlier ?, I
think it is or similar approach will be appropriate. Thanks.

Regards,
Muhammad Asif Naeem


On Tue, Jul 8, 2014 at 5:53 PM, MauMau maumau...@gmail.com wrote:

 From: Asif Naeem anaeem...@gmail.com

  Other than my pervious comments, patch looks good to me. Thanks.


 Thanks for reviewing.  I understood that your previous comment was to
 suggest copying the desired DLLs directly from Release/Debug folder to bin.
 But I'm afraid it cannot be done cleanly...  CopySolutionOutput() copies
 all DLLS to lib folder with no exception as follows:

   if ($1 == 1)
   {
$dir = bin;
$ext = exe;
   }
   elsif ($1 == 2)
   {
$dir = lib;
$ext = dll;
   }

 It seems like I have to do something like this, listing the relevant DLL
 names anyway.  I don't think this is not cleaner.

   if ($1 == 1)
   {
$dir = bin;
$ext = exe;
   }
   elsif ($1 == 2  /* the project is libpq, libecpg, etc. */)
   {
$dir = bin;
$ext = dll;
   }
   elsif ($1 == 2)
   {
$dir = lib;
$ext = dll;
   }

 What do you think?  Am I misunderstanding your suggestion?

 Regards
 MauMau

Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder

2014-07-08 Thread Alvaro Herrera

Asif Naeem wrote:
 Yes. Can you please take a look at Win_lib_bin.patch I shared earlier ?, I
 think it is or similar approach will be appropriate. Thanks.

I think the suggestion by Peter Eisentraut upthread was pretty
reasonable -- the Makefiles are already aware that they are building a
shared library, so scrape the info off them.  The less hardcoded stuff
in the msvc framework, the better.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing join removals for more join types

2014-07-08 Thread David Rowley

On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote:

David Rowley dgrowle...@gmail.com writes:
On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I'm a bit skeptical as to whether testing for that case is actually
worth
any extra complexity. Do you have a compelling use-case? But anyway,
if we do want to allow it, why does it take any more than adding a check
for Consts to the loops in query_is_distinct_for? It's the targetlist
entries where we'd want to allow Consts, not the join conditions.

Meh. I wrote a regression test that expects it is a pretty poor
rationale for adding logic. If you can't point to a real-world case
where this is important, I'm inclined to take it out.

Ok, I'll pull that logic back out when I get home tonight.

1. The fast path code that exited in join_is_removable() for subquery's
when the subquery had no group or distinct clause is now gone. I wasn't
too
sure that I wanted to assume too much about what query_is_distinct_for
may
do in the future and I thought if I included some logic in
join_is_removable() to fast path, that one day it may fast path wrongly.

Or put a quick-check subroutine next to query_is_distinct_for(). The code
we're skipping here is not so cheap that I want to blow off skipping it.

Ok, good idea. I'll craft something up tonight along those lines.

On review it looks like analyzejoins.c would possibly benefit from an
earlier fast-path check as well.

Do you mean for non-subqueries? There already is a check to see if the
relation has no indexes.

2. The patch I submitted here

http://www.postgresql.org/message-id/caaphdvrfvkh0p3faoogcckby7fecj9qfanklkx7mwsbcxy2...@mail.gmail.com
if that gets accepted then it makes the check for set returning functions
in join_is_removable void.

Right (and done, if you didn't notice already).

Thanks, I noticed that this morning. I'll remove the (now) duplicate check
from the patch

Well, that's a real tough one for me as I only added that based on what you
told me here:

On 20 May 2014 23:22, Tom Lane t...@sss.pgh.pa.us wrote:

I doubt you should drop a subquery containing FOR UPDATE, either.
That's a side effect, just as much as a volatile function would be.

regards, tom lane

As far as I know the FOR UPDATE check is pretty much void as of now anyway,
since the current state of query_is_distinct_for() demands that there's
either a DISTINCT, GROUP BY or just a plain old aggregate without any
grouping, which will just return a single row, neither of these will allow
FOR UPDATE anyway. I really just added the check just to protect the code
from possible future additions to query_is_distinct_for() which may add
logic to determine uniqueness by some other means.

So the effort here should be probably be more focused on if we should allow
the join removal when the subquery contains volatile functions. We should
probably think fairly hard on this now as I'm still planning on working on
INNER JOIN removals at some point and I'm thinking we should likely be
consistent between the 2 types of join for when it comes to FOR UPDATE and
volatile functions, and I'm thinking right now that for INNER JOINs that,
since they're INNER that we could remove either side of the join. In that
case maybe it would be harder for the user to understand why their volatile
function didn't get executed.

Saying that... off the top of my head I can't remember what we'd do in a
case like:

create view v_a as select a,volatilefunc(a) AS funcresult from a;

select a from v_a;

Since we didn't select funcresult, do we execute the function?

Re: [HACKERS] Allowing join removals for more join types

David Rowley dgrow...@gmail.com writes:
 On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote:
 On review it looks like analyzejoins.c would possibly benefit from an
 earlier fast-path check as well.

 Do you mean for non-subqueries? There already is a check to see if the
 relation has no indexes.

Oh, sorry, that was a typo: I meant to write pathnode.c.  Specifically,
we could skip the translate_sub_tlist step.  Admittedly that's not
hugely expensive, but as long as we have the infrastructure for a quick
check it might be worth doing.


 TBH I find the checks for FOR UPDATE and volatile functions to be
 questionable as well.

 Well, that's a real tough one for me as I only added that based on what you
 told me here:
 I doubt you should drop a subquery containing FOR UPDATE, either.
 That's a side effect, just as much as a volatile function would be.

Hah ;-).  But the analogy to qual pushdown hadn't occurred to me at the
time.

 As far as I know the FOR UPDATE check is pretty much void as of now anyway,
 since the current state of query_is_distinct_for() demands that there's
 either a DISTINCT, GROUP BY or just a plain old aggregate without any
 grouping, which will just return a single row, neither of these will allow
 FOR UPDATE anyway.

True.

 So the effort here should be probably be more focused on if we should allow
 the join removal when the subquery contains volatile functions. We should
 probably think fairly hard on this now as I'm still planning on working on
 INNER JOIN removals at some point and I'm thinking we should likely be
 consistent between the 2 types of join for when it comes to FOR UPDATE and
 volatile functions, and I'm thinking right now that for INNER JOINs that,
 since they're INNER that we could remove either side of the join. In that
 case maybe it would be harder for the user to understand why their volatile
 function didn't get executed.

Color me dubious.  In exactly what circumstances would it be valid to
suppress an inner join involving a sub-select?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-08 Thread Mark Kirkwood


On 09/07/14 05:13, Josh Berkus wrote:

On 07/06/2014 01:27 AM, Christoph Berg wrote:

Another could be that during initdb all the uncommented settings be

written to postgresql.auto.conf file rather than to postgresql.conf.
I think we can do this by changing code in initdb.c-setup_config().
This will ensure that unless user is changing settings in a mixed way
(some by Alter System and some manually by editing postgresql.conf),
there will no such problem.


There is no reasonable way for us to prevent issues for users who are
manually changing both pg.conf and pg.auto.conf.  Users should stick to
one or the other method of management (or, thirdly, using conf.d); if
they mix methods, we can't prevent confusion at restart time and we
shouldn't try.




Yes, but even well behaved users will see this type of error, because 
initdb uncomments certain values (ones that are dead certs for being 
changed via ALTER SYSTEM subsequently like shared_buffers), and then - 
bang! your next reload gets that your postgresql.conf contains errors 
message.


Regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Steve Singer

I've encountered a corrupt pg_control  file on my 9.4 development 
cluster.  I've mostly been using the cluster for changeset extraction / 
slony testing.


This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
discussed in another thread) but would have had the initdb done with an 
earlier 9.4 snapshot.




/usr/local/pgsql94wal/bin$ ./pg_controldata ../data
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:937
Catalog version number:   201405111
Database system identifier:   6014096177254975326
Database cluster state:   in production
pg_control last modified: Tue 08 Jul 2014 06:15:58 PM EDT
Latest checkpoint location:   5/44DC5FC8
Prior checkpoint location:5/44C58B88
Latest checkpoint's REDO location:5/44DC5FC8
Latest checkpoint's REDO WAL file:000100050044
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/1558590
Latest checkpoint's NextOID:  505898
Latest checkpoint's NextMultiXactId:  3285
Latest checkpoint's NextMultiOffset:  6569
Latest checkpoint's oldestXID:1281
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Tue 08 Jul 2014 06:15:23 PM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:logical
Current wal_log_hints setting:off
Current max_connections setting:  200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 65793
Date/time type storage:   floating-point numbers
Float4 argument passing:  by reference
Float8 argument passing:  by reference
Data page checksum version:   2602751502
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


Before this postgres crashed, and seemed to have problems recovering. I 
might have hit CTRL-C but I didn't do anything drastic like issue a kill -9.



test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING:  terminating 
connection because of crash of another server process
test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and 
exit, because another server process exited abnormally and possibly 
corrupted shared memory.
test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT:  In a moment you should 
be able to reconnect to the database and repeat your command.
  2014-07-08 18:15:20.483 EDTLOG:  all server processes terminated; 
reinitializing
  2014-07-08 18:15:20.720 EDTLOG:  database system was interrupted; 
last known up at 2014-07-08 18:15:15 EDT
  2014-07-08 18:15:20.865 EDTLOG:  database system was not properly 
shut down; automatic recovery in progress

  2014-07-08 18:15:20.954 EDTLOG:  redo starts at 5/41023848
  2014-07-08 18:15:23.153 EDTLOG:  unexpected pageaddr 4/D8DC6000 in 
log segment 000100050044, offset 14442496

  2014-07-08 18:15:23.153 EDTLOG:  redo done at 5/44DC5F60
  2014-07-08 18:15:23.153 EDTLOG:  last completed transaction was at 
log time 2014-07-08 18:15:17.874937-04
test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL:  the database system 
is in

Re: [HACKERS] 9.4 pg_control corruption

Steve Singer st...@ssinger.info writes:
 I've encountered a corrupt pg_control  file on my 9.4 development 
 cluster.  I've mostly been using the cluster for changeset extraction / 
 slony testing.

 This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
 discussed in another thread) but would have had the initdb done with an 
 earlier 9.4 snapshot.

Somehow or other you missed the update to pg_control version number 942.
There's no obvious reason to think that this pg_control file is corrupt
on its own terms, but the pg_controldata version you're using expects
the 942 layout.  The fact that the server wasn't complaining about this
suggests that you've not recompiled the server, or at least not xlog.c.
Possibly the odd failure to restart indicates that you have a partially
updated server executable?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Steve Singer


On 07/08/2014 10:14 PM, Tom Lane wrote:

Steve Singer st...@ssinger.info writes:

I've encountered a corrupt pg_control  file on my 9.4 development
cluster.  I've mostly been using the cluster for changeset extraction /
slony testing.
This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change
discussed in another thread) but would have had the initdb done with an
earlier 9.4 snapshot.

Somehow or other you missed the update to pg_control version number 942.
There's no obvious reason to think that this pg_control file is corrupt
on its own terms, but the pg_controldata version you're using expects
the 942 layout.  The fact that the server wasn't complaining about this
suggests that you've not recompiled the server, or at least not xlog.c.
Possibly the odd failure to restart indicates that you have a partially
updated server executable?



The server  is complaining about that, it started to after the crash 
(which is why I ran pg_controldata)


ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data
  2014-07-08 22:28:57.796 EDTFATAL:  database files are incompatible 
with server
  2014-07-08 22:28:57.796 EDTDETAIL:  The database cluster was 
initialized with PG_CONTROL_VERSION 937, but the server was compiled 
with PG_CONTROL_VERSION 942.

  2014-07-08 22:28:57.796 EDTHINT:  It looks like you need to initdb.
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


The server seemed fine (and it was 9.4 because I was using 9.4 features)
The server crashed
The server performed crash recovery
The server server wouldn't start and pg_controldata shows the attached 
output


I wasn't recompiling or reinstalling around this time either.




regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 pg_control corruption

Steve Singer st...@ssinger.info writes:
 On 07/08/2014 10:14 PM, Tom Lane wrote:
 There's no obvious reason to think that this pg_control file is corrupt
 on its own terms, but the pg_controldata version you're using expects
 the 942 layout.  The fact that the server wasn't complaining about this
 suggests that you've not recompiled the server, or at least not xlog.c.

 The server  is complaining about that, it started to after the crash 

Then you updated your sources, recompiled and reinstalled, but failed to
restart the server when you did that.  Else it would have complained on
the spot.

If you had any valuable data in the installation, we could talk about how
to get it out; but since you didn't I'd suggest just re-initdb and move
on.  I don't see anything unexpected here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-08 Thread Amit Kapila

On Tue, Jul 8, 2014 at 11:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 No, ALTER SYSTEM is there now and it needs to work right in its first
 release.  I will go fix this if nobody else does.

I am planing to provide an initial patch for this issue in a day or so, hope
that is not too late.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-08 Thread Amit Kapila

On Wed, Jul 9, 2014 at 6:40 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz
wrote:
 On 09/07/14 05:13, Josh Berkus wrote:
 Another could be that during initdb all the uncommented settings be

 written to postgresql.auto.conf file rather than to postgresql.conf.
 I think we can do this by changing code in initdb.c-setup_config().
 This will ensure that unless user is changing settings in a mixed way
 (some by Alter System and some manually by editing postgresql.conf),
 there will no such problem.


 There is no reasonable way for us to prevent issues for users who are
 manually changing both pg.conf and pg.auto.conf.  Users should stick to
 one or the other method of management (or, thirdly, using conf.d); if
 they mix methods, we can't prevent confusion at restart time and we
 shouldn't try.

 Yes, but even well behaved users will see this type of error, because
initdb uncomments certain values (ones that are dead certs for being
changed via ALTER SYSTEM subsequently like shared_buffers), and then -
bang! your next reload gets that your postgresql.conf contains errors
message.

That is the reason, why I have suggested up-thread that uncommented
values should go to postgresql.auto.conf, that will avoid any such
observations for a well-behaved user.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] postgresql.auto.conf and reload