Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Fabien COELHO


Hello Tatsuo,


I think I'm starting to understand what's going on.  Suppose there are
n transactions be issued by pgbench and it decides each schedule d(0),
d(1)... d(n). Actually the schedule d(i) (which is stored in
st-until) is decided by the following code:

int64 wait = (int64)
throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
thread-throttle_trigger += wait;
st-until = thread-throttle_trigger;


Yep. Let us say d(i) is the target starting time for transaction i, that 
is throttle_trigger above.



st-until represents the time for a transaction to be finished by the
time. Now the transaction i finishes at t(i).


No, it is the time for the **start** of the transaction. The client is 
sleeping until this time. We can only try to control the beginning of 
the transaction. It ends when it ends!



So the lag l(i) = t(i) -d(i) if the transaction is behind.


Transaction i lags behind if it *starts* later that d(i). If it start 
effectively at t(i), t(i)=d(i), lag l(i) = t(i)-d(i). When it completes 
is not the problem of the scheduler.


Then next transaction i+1 begins. The lag l(i+1) = t(i+1) - d(i+1) and 
so on. At the end of pgbench, it shows the average lag as 
sum(l(0)...l(n))/n.


Yes.


Now suppose we have 3 transactions and each has following values:

d(0) = 10
d(1) = 20
d(2) = 30

t(0) = 100
t(1) = 110
t(2) = 120

That says pgbench expects the duration 10 for each transaction.


pgbench does not expect any duration, but your proposed scheduling d(i) 
cannot be followed if the duration is more than 10.


With your above figures, with d(i) the expected start time and t(i) the 
actual start time, then for some reason pgbench was not around to start 
transaction before time 100 (maybe the OS switched the process off to 
attend to other stuff) although it should have started at 10, so l(0) = 
90. Then the second transaction starts readily at 110, but was expected at 
20 nevertheless, 90 lag again. Same for the last one. All transactions 
started 90 units after their scheduled time, the cumulative lag is 270, 
the average lag is 90.


If I take another example.

 - Scheduled start time d(0 .. 3) = 0 20 40 60
 - Durations D(0 .. 3) = 15 25 50 10
 - Actual start time for transactions
   t(0) = 3 (it is late by 3 for some reason), completes by 18
   t(1) = t(0)+D(0) + some more lag for some reason = 21, completes by 46
   t(2) = t(1)+D(1) + no additional lag here = 46, completes by 96
   t(3) = t(2)+D(2) + some more lag for some reason = 97, completes by 107

The l(0 .. 3) = 3-0, 21-20, 46-40, 97-60

Total lag is 3 + 1 + 6 + 37 = 48

Average lag = 48/4 = 12

In this example, some lag is due to the process (3 at the beginning, 1 on 
the second transaction), some other is due to a transaction duration which 
impact the following transactions.



However actually pgbench calculates like this:

average lag = (t(0)-d(0) + t(1)-d(1) + t(2)-d(2))/3
   = (100-10 + 110-20 + 120-30)/3
   = (90 + 90 + 90)/3
   = 90


Yes, this is correct.


Looks like too much lag calculated. The difference between the lag
which pgbench calculates and the expected one will be growing if a lag
happens eariler. I guess why my Linux box shows bigger lag than Mac OS
X is, the first transaction or early transactions run slowly than the
ones run later.


Possibly.


Of course this conclusion depends on the definition of the average
rate limit lag of pgbench. So you might have other opinion. However
the way how pgbench calculates the average lag is not expected at
least for me.


Indeed, I think that it really depends on your definition of lag. The lag 
I defined is the time between the scheduled transaction start time and the 
actual transaction start time. This is a measure of how well pgbench is 
able to follow the stochastic process, and if pgbench is constantly late 
then it cumulates a lot, but that basically mean that there is not enough 
(cpu) resources to run pgbench cleanly.


What you seem to expect is the average transaction latency. This is also a 
useful measure, and I'm planning to add a clean measure of that when under 
throttling, and also with --progress, as the current computation based on 
tps is not significant under throttling.


But that is a plan for the next commitfest!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgindent behavior we could do without

2013-07-18 Thread Bruce Momjian
On Thu, Jul 18, 2013 at 12:27:21AM -0400, Tom Lane wrote:
 It's always annoyed me that pgindent insists on adjusting vertical
 whitespace around #else and related commands.  This has, for example,
 rendered src/include/storage/barrier.h nigh illegible: you get things
 like
 
 /*
  * lwsync orders loads with respect to each other, and similarly with stores.
  * But a load can be performed before a subsequent store, so sync must be used
  * for a full memory barrier.
  */
 #define pg_memory_barrier() __asm__ __volatile__ (sync : : : memory)
 #define pg_read_barrier()   __asm__ __volatile__ (lwsync : : : memory)
 #define pg_write_barrier()  __asm__ __volatile__ (lwsync : : : memory)
 #elif defined(__alpha) || defined(__alpha__)/* Alpha */
 
 which makes it look like this block of code has something to do with
 Alpha.
 
 By chance, I noticed today that this misbehavior comes from a discretely
 identifiable spot, to wit lines 289-290 in src/tools/pgindent/pgindent:
 
   # Remove blank line(s) before #else, #elif, and #endif
   $source =~ s!\n\n+(\#else|\#elif|\#endif)!\n$1!g;
 
 This seems pretty broken to me: why exactly is whitespace there such a
 bad idea?  Not only that, but the next action is concerned with undoing
 some of the damage this rule causes:
 
   # Add blank line before #endif if it is the last line in the file
   $source =~ s!\n(#endif.*)\n\z!\n\n$1\n!;
 
 I assert that we should simply remove both of these bits of code, as
 just about every committer on the project is smarter about when to use
 vertical whitespace than this program is.
 
 Thoughts?

Interesting.  I can't remember fully but the problem might be that BSD
indent was adding extra spacing around those, and I needed to remove it.
Would you like me to test a run and show you the output?  I can do that
next week when I return from vacation, or you can give it a try.  I am
fine with removing that code.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Fabien COELHO


Hello Greg,

The lag computation was not the interesting part of this feature to me.  As I 
said before, I considered it more of a debugging level thing than a number 
people would analyze as much as you did.  I understand why you don't like it 
though.  If the reference time was moved forward to match the transaction end 
each time, I think that would give the lag definition you're looking for. 
That's fine to me too, if Fabien doesn't have a good reason to reject the 
idea.  We would need to make sure that doesn't break some part of the design 
too.


I really thing that the information currently computed is useful. First, 
as you note, for debug, not really debugging the throttling feature which 
works fine, but being able to debug performance if something goes wrong 
while running a bench. Another reason why it is useful is that from a 
client perspective it measures whether the database system is coping with 
the load without incurring additional delays by processing clients 
requests (say from the web server) far behind their actual (i.e. 
scheduled) occurences.


So my recommendation is : please keep this measure as it, and if you want 
the other lag measure, why not add it as well next to the previous one?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Tatsuo Ishii
Fabien,

 Hello again Tatsuo,
 
 For your information, included is the patch against git master head to
 implement the lag in a way what I proposed. With the patch, I get more
 consistent number on Linux (and Mac OS X).
 
 I must disagree with your proposal: At least, it does not provide the
 information I want, but another one.
 
 ISTM that this patch measures the lag which is due to pgbench thread
 coming around to deal with a transaction after sleeping. I would
 expect that to be quite small most of the time, so I agree that it
 must be reassuringly consistent.
 
 However it does not provide the information I want, which is the
 measure of the health of pgbench with respect to the stochastic
 process scheduled transactions.
 
 Basically you propose a partial lag measure, which will be smaller,
 but which does not tell whether pgbench is able to follow the
 schedule, which is the information I find useful in this context to
 appreciate if the throttling is going well.

I don't care what kind of measurement is provided by pgbench. However
I *do* care about what the measurement means is clearly described in
the doc as a committer. I think current measurement method will give
enough confusion if it's not provided with detailed explanation. Could
you please provide doc updatation?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Fabien COELHO


Hello Tatsuo

I think current measurement method will give enough confusion if it's 
not provided with detailed explanation. Could you please provide doc 
updatation?


Please find a v17 proposition with an updated and extended documentation, 
focussed on clarifying the lag measure and its implications, and taking 
into account the recent discussion on the list with you  Greg.


However I'm not a native English speaker, if you find that some part are 
not clear enough, please tell me what can be improved further.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2ad8f0b..752119d 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * When threads are throttled to a given rate limit, this is the target delay
+ * to reach that rate in usec.  0 is the default and means no throttling.
+ */
+int64		throttle_delay = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -202,11 +208,13 @@ typedef struct
 	int			listen;			/* 0 indicates that an async query has been
  * sent */
 	int			sleeping;		/* 1 indicates that the client is napping */
+	boolthrottling; /* whether nap is for throttling */
 	int64		until;			/* napping until (usec) */
 	Variable   *variables;		/* array of variable definitions */
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	bool		is_throttled;	/* whether transaction throttling is done */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -224,6 +232,9 @@ typedef struct
 	instr_time *exec_elapsed;	/* time spent executing cmds (per Command) */
 	int		   *exec_count;		/* number of cmd executions (per Command) */
 	unsigned short random_state[3];		/* separate randomness for each thread */
+	int64   throttle_trigger; 	/* previous/next throttling (us) */
+	int64   throttle_lag; 		/* total transaction lag behind throttling */
+	int64   throttle_lag_max; 	/* max transaction lag */
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -232,6 +243,8 @@ typedef struct
 {
 	instr_time	conn_time;
 	int			xacts;
+	int64   throttle_lag;
+	int64   throttle_lag_max;
 } TResult;
 
 /*
@@ -356,6 +369,7 @@ usage(void)
 		 -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n
 		 -P, --progress=NUM   show thread progress report every NUM seconds\n
 		 -r, --report-latencies   report average latency per command\n
+		 -R, --rate SPEC  target rate in transactions per second\n
 		 -s, --scale=NUM  report this scale factor in output\n
 		 -S, --select-onlyperform SELECT-only transactions\n
 		 -t, --transactions   number of transactions each client runs 
@@ -898,17 +912,62 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 {
 	PGresult   *res;
 	Command   **commands;
+	booltrans_needs_throttle = false;
 
 top:
 	commands = sql_files[st-use_file];
 
+	/*
+	 * Handle throttling once per transaction by sleeping.  It is simpler
+	 * to do this here rather than at the end, because so much complicated
+	 * logic happens below when statements finish.
+	 */
+	if (throttle_delay  ! st-is_throttled)
+	{
+		/*
+		 * Use inverse transform sampling to randomly generate a delay, such
+		 * that the series of delays will approximate a Poisson distribution
+		 * centered on the throttle_delay time.
+ *
+ * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier.
+		 *
+		 * If transactions are too slow or a given wait is shorter than
+		 * a transaction, the next transaction will start right away.
+		 */
+		int64 wait = (int64)
+			throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
+
+		thread-throttle_trigger += wait;
+
+		st-until = thread-throttle_trigger;
+		st-sleeping = 1;
+		st-throttling = true;
+		st-is_throttled = true;
+		if (debug)
+			fprintf(stderr, client %d throttling INT64_FORMAT us\n,
+	st-id, wait);
+	}
+
 	if (st-sleeping)
 	{			/* are we sleeping? */
 		instr_time	now;
+		int64 now_us;
 
 		INSTR_TIME_SET_CURRENT(now);
-		if (st-until = INSTR_TIME_GET_MICROSEC(now))
+		now_us = INSTR_TIME_GET_MICROSEC(now);
+		if (st-until = now_us)
+		{
 			st-sleeping = 0;	/* Done sleeping, go ahead with next command */
+			if (st-throttling)
+			{
+/* Measure lag of throttled transaction relative to target */
+int64 lag = now_us - st-until;
+thread-throttle_lag += lag;
+if (lag  thread-throttle_lag_max)
+	thread-throttle_lag_max = lag;
+st-throttling = false;
+			}
+		}
 		else
 			return true;		/* Still sleeping, nothing to do here */
 	}
@@ -1095,6 +1154,15 @@ top:
 			st-state = 0;
 			st-use_file = (int) getrand(thread, 0, num_files - 1);
 			commands = 

Re: [HACKERS] Add visibility map information to pg_freespace.

2013-07-18 Thread Kyotaro HORIGUCHI
Thank you for the worthwhile additions.

At Tue, 16 Jul 2013 16:04:43 +0900, Satoshi Nagayasu sn...@uptime.jp wrote in 
51e4f08b.3030...@uptime.jp
  | postgres=# select * from pg_freespace_with_vminfo('t'::regclass) limit
  | 10;
..
 I think we can simply add is_all_viible column to the existing
 pg_freespace(), because adding column would not break
 backward-compatibility in general. Any other thoughts?

I agree to you. I cannot guess any 'ordinary' application which
uses this function, or someone's craft critically affected by
this change. This decision was merely a safe bet.

I'll remerge _with_vminfo function to pg_freespace() in the next
patch if no objection is raised.

  pgstattuple_vm_v1.patch:
...
 It seems working fine.
 
 And I added a regression test for pg_freespacemap and additional
 test cases for pgstattuple. Please take a look.

Thank you. This seems fine. I felt a bit uneasy with the absense
of regtests in pg_freespacemap, but I took advantage of the
absense not to add new ones.

I have simply merged the two regtests separately into two
original patches.  You will find the two attached files.

pg_freespace_vm_v3.patch :
   new patch for pg_freespace with regtests and _with_vminfo

pgstattuple_vm_v2.patch  :
   new patch for gstattuple with regtests


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/pg_freespacemap/Makefile b/contrib/pg_freespacemap/Makefile
index b2e3ba3..09d6ff8 100644
--- a/contrib/pg_freespacemap/Makefile
+++ b/contrib/pg_freespacemap/Makefile
@@ -4,7 +4,9 @@ MODULE_big = pg_freespacemap
 OBJS = pg_freespacemap.o
 
 EXTENSION = pg_freespacemap
-DATA = pg_freespacemap--1.0.sql pg_freespacemap--unpackaged--1.0.sql
+DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.0--1.1.sql pg_freespacemap--unpackaged--1.0.sql
+
+REGRESS = pg_freespacemap
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/pg_freespacemap/expected/pg_freespacemap.out b/contrib/pg_freespacemap/expected/pg_freespacemap.out
new file mode 100644
index 000..cde954d
--- /dev/null
+++ b/contrib/pg_freespacemap/expected/pg_freespacemap.out
@@ -0,0 +1,100 @@
+create extension pg_freespacemap;
+create table t1 ( uid integer primary key, uname text not null );
+select * from pg_freespace('t1');
+ blkno | avail 
+---+---
+(0 rows)
+
+select * from pg_freespace('t1'::regclass);
+ blkno | avail 
+---+---
+(0 rows)
+
+select * from pg_freespace('t1', 1);
+ pg_freespace 
+--
+0
+(1 row)
+
+select * from pg_freespace_with_vminfo('t1');
+ blkno | avail | is_all_visible 
+---+---+
+(0 rows)
+
+select * from pg_freespace_with_vminfo('t1'::regclass);
+ blkno | avail | is_all_visible 
+---+---+
+(0 rows)
+
+insert into t1 values ( 100, 'postgresql' );
+select * from pg_freespace('t1');
+ blkno | avail 
+---+---
+ 0 | 0
+(1 row)
+
+select * from pg_freespace('t1', 1);
+ pg_freespace 
+--
+0
+(1 row)
+
+select * from pg_freespace_with_vminfo('t1');
+ blkno | avail | is_all_visible 
+---+---+
+ 0 | 0 | f
+(1 row)
+
+select * from pg_freespace('t1_pkey');
+ blkno | avail 
+---+---
+ 0 | 0
+ 1 | 0
+(2 rows)
+
+select * from pg_freespace('t1_pkey', 1);
+ pg_freespace 
+--
+0
+(1 row)
+
+select * from pg_freespace('t1_pkey', 2);
+ pg_freespace 
+--
+0
+(1 row)
+
+select * from pg_freespace_with_vminfo('t1_pkey');
+ blkno | avail | is_all_visible 
+---+---+
+ 0 | 0 | f
+ 1 | 0 | f
+(2 rows)
+
+vacuum t1;
+select * from pg_freespace('t1');
+ blkno | avail 
+---+---
+ 0 |  8096
+(1 row)
+
+select * from pg_freespace_with_vminfo('t1');
+ blkno | avail | is_all_visible 
+---+---+
+ 0 |  8096 | t
+(1 row)
+
+select * from pg_freespace('t1_pkey');
+ blkno | avail 
+---+---
+ 0 | 0
+ 1 | 0
+(2 rows)
+
+select * from pg_freespace_with_vminfo('t1_pkey');
+ blkno | avail | is_all_visible 
+---+---+
+ 0 | 0 | f
+ 1 | 0 | f
+(2 rows)
+
diff --git a/contrib/pg_freespacemap/pg_freespacemap--1.0--1.1.sql b/contrib/pg_freespacemap/pg_freespacemap--1.0--1.1.sql
new file mode 100644
index 000..e7b25bd
--- /dev/null
+++ b/contrib/pg_freespacemap/pg_freespacemap--1.0--1.1.sql
@@ -0,0 +1,21 @@
+/* contrib/pg_freespacemap/pg_freespacemap--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use ALTER EXTENSION pg_freespacemap UPDATE TO '1.1' to load this file. \quit
+
+CREATE FUNCTION pg_is_all_visible(regclass, bigint)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_is_all_visible'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION
+  pg_freespace_with_vminfo(rel regclass, blkno OUT bigint, avail OUT int2, is_all_visible OUT boolean)
+RETURNS SETOF RECORD
+AS $$
+  SELECT blkno, pg_freespace($1, 

Re: [HACKERS] Add visibility map information to pg_freespace.

2013-07-18 Thread Kyotaro HORIGUCHI
Hmm. I'm sorry to find that this patch is already marked as
'Return with Feedback' on the CF page around the same time when
the preveous review comment has sent. Is it somewhat crossing?

Anyway, I'll take a rain check for this.

 I have simply merged the two regtests separately into two
 original patches.  You will find the two attached files.
 
 pg_freespace_vm_v3.patch :
new patch for pg_freespace with regtests and _with_vminfo
 
 pgstattuple_vm_v2.patch  :
new patch for gstattuple with regtests

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Amit Kapila
On Thursday, July 18, 2013 12:38 AM Josh Berkus wrote:
 On 07/17/2013 12:01 PM, Alvaro Herrera wrote:
  Both of these seem like they would make troubleshooters' lives more
  difficult.  I think we should just parse the auto file automatically
  after parsing postgresql.conf, without requiring the include
 directive
  to be there.
 
 Wait, I thought the auto file was going into the conf.d directory?

Auto file is going into config directory, but will that make any difference 
if we have to parse it automatically after postgresql.conf.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Amit Kapila
On Wednesday, July 17, 2013 6:08 PM Ants Aasma wrote:
 On Wed, Jul 17, 2013 at 2:54 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  I think Oracle also use similar concept for making writes efficient,
 and
  they have patent also for this technology which you can find at below
 link:
 
 http://www.google.com/patents/US7194589?dq=645987hl=ensa=Xei=kn7mUZ-
 PIsWq
  rAe99oDgBwsqi=2pjf=1ved=0CEcQ6AEwAw
 
  Although Oracle has different concept for performing checkpoint
 writes, but
  I thought of sharing the above link with you, so that unknowingly we
 should
  not go into wrong path.
 
  AFAIK instead of depending on OS buffers, they use direct I/O and
 infact in
  the patent above they are using temporary buffer (Claim 3) to sort
 the
  writes which is not the same idea as far as I can understand by
 reading
  above thread.
 
 They are not even sorting anything, the patent is for
 opportunistically looking for adjacent dirty blocks when writing out a
 dirty buffer to disk. While a useful technique, this has nothing to do
 with sorting checkpoints. 

It is not sorting, rather it finds consecutive blocks before writing to disk
using hashing in buffer cache.
I think the patch is different from it in multiple ways.
I had read this patent some time back and thought that you are also trying
to achieve something similar (Reduce random I/O), 
so shared with you.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some regression tests for SEQUENCE

2013-07-18 Thread Fabien COELHO


Hello Robins,


Thanks Fabien. This was a wrong attachment to the email.


This patch works for me (applied, tested).

However, some remarks:

seq4: should it check something? How do you know that OWNED BY did 
anything?


regress_role_seq2: shoult check that the sequence owner is the table 
owner?


seq12/seq14: is it twice the same tests??

seq13/seq15: idem??

I still do not know what asdf means... it is about the qwerty keyboard?
What about something explicit, like regress_seq_undefined?

seq22: remove the syntax error check at the end, pg people do not want 
syntax error checks.



Also, here is a proposal for testing that CACHE is working:

  -- check CACHE operation by resetting the sequence cache size
  CREATE SEQUENCE seq31 CACHE 10;
  -- 1 to 10 are preallocated
  SELECT NEXTVAL('seq31');
  -- reset cache, 2..10 are lost, should start again from 11
  ALTER SEQUENCE seq31 CACHE 1;
  SELECT NEXTVAL('seq31');
  DROP SEQUENCE seq31;

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] getting rid of SnapshotNow

2013-07-18 Thread Robert Haas
There seems to be a consensus that we should try to get rid of
SnapshotNow entirely now that we have MVCC catalog scans, so I'm
attaching two patches that together come close to achieving that goal:

1. snapshot-error-v1.patch introduces a new special snapshot, called
SnapshotError.  In the cases where we set SnapshotNow as a sort of
default snapshot, this patch changes the code to use SnapshotError
instead.  This affects scan-xs_snapshot in genam.c and
estate-es_snapshot in execUtils.c.  This passes make check-world, so
apparently there is no code in the core distribution that does this.
However, this is safer for third-party code, which will ERROR instead
of seg faulting.  The alternative approach would be to use
InvalidSnapshot, which I think would be OK too if people dislike this
approach.

2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow
to use SnapshotSelf instead.  These include pgrowlocks(),
pgstat_heap(), and get_actual_variable_range().  In all of those
cases, only an approximately-correct answer is needed, so the change
should be fine.  I'd also generally expect that it's very unlikely for
any of these things to get called in a context where the table being
scanned has been updated by the current transaction after the most
recent command-counter increment, so in practice the change in
semantics will probably not be noticeable at all.

Barring objections, I'll commit both of these next week.

With that done, the only remaining uses of SnapshotNow in our code
base will be in currtid_byreloid() and currtid_byrelname().  So far no
one on this list has been able to understand clearly what the purpose
of those functions is, so I'm copying this email to pgsql-odbc in case
someone there can provide more insight.  If I were a betting man, I'd
bet that they are used in contexts where the difference between
SnapshotNow and SnapshotSelf wouldn't matter there, either.  For
example, if those functions are always invoked in a query that does
nothing but call those functions, the difference wouldn't be visible.
If we don't want to risk any change to the semantics, we can (1) grit
our teeth and keep SnapshotNow around or (2) use an instantaneous MVCC
snapshot there, and accept that people who have very large connection
counts and extremely heavy use of those functions may see a
performance regression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


snapshot-error-v1.patch
Description: Binary data


snapshot-self-not-now-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of can't paste into psql issue

2013-07-18 Thread Merlin Moncure
On Wed, Jul 17, 2013 at 6:30 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Josh Berkus j...@agliodbs.com writes:
 So, an even more practical workaround (I've been using cat | psql), but
 still a mysterious issue.

 As a workaround you might try \e with EDITOR=emacs or some of the other
 solutions you've been pasting, maybe even cat, so that you can switch
 that readline-completion-bug-free environment for just that paste?

Well, I use \e=vim and it works fine.  But, sometimes you forget and
the results can be destructive to the database. I messed around for a
little bit with the tab completion callback routine yesterday and I
now believe that's not the issue.  It only fires when you tab AFAICT
-- so the problem is in the readline function itself which reduces the
set of solutions somewhat -- either you run it or you don't.

One thing that could solve a lot of issues would be to disable
readline when inside a dollar quote etc.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of can't paste into psql issue

2013-07-18 Thread Merlin Moncure
On Thu, Jul 18, 2013 at 7:57 AM, Merlin Moncure mmonc...@gmail.com wrote:
 One thing that could solve a lot of issues would be to disable
 readline when inside a dollar quote etc.

actually, that's dumb (pre-coffee).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-18 Thread Robert Haas
On Wed, Jul 17, 2013 at 2:03 PM, Gurjeet Singh gurj...@singh.im wrote:
 In v6 of the  patch, I have deferred the 'pending' list initialization to
 until we actually hit a candidate right-branch. So in the common case the
 pending list will never be populated, and if we find a bushy or right-deep
 tree (for some reason an ORM/tool may choose to build AND/OR lists that may
 end being right-deep when in Postgres), then the pending list will be used
 to process them iteratively.

 Does that alleviate your concern about 'pending' list management causing an
 overhead.

 Agreed that bushy/right-deep trees are a remote corner case, but we are
 addressing a remote corner case in the first place (insanely long AND lists)
 and why not handle another remote corner case right now if it doesn't cause
 an overhead for common case.

Because simpler code is less likely to have bugs and is easier to
maintain.   It's worth noting that the change you're proposing is in
fact user-visible, as demonstrated by the fact that you had to update
the regression test output:

- |   WHERE (((rsl.sl_color =
rsh.slcolor) AND (rsl.sl_len_cm = rsh.slminlen_cm)) AND
(rsl.sl_len_cm = rsh.slmaxlen_cm));
+ |   WHERE ((rsl.sl_color =
rsh.slcolor) AND (rsl.sl_len_cm = rsh.slminlen_cm) AND (rsl.sl_len_cm
= rsh.slmaxlen_cm));

Now, I think that change is actually an improvement, because here's
what that WHERE clause looked like when it was entered:

 WHERE rsl.sl_color = rsh.slcolor
   AND rsl.sl_len_cm = rsh.slminlen_cm
   AND rsl.sl_len_cm = rsh.slmaxlen_cm;

But flattening a = 1 AND (b = 1 AND c = 1 AND d = 1) AND e = 1 to a
flat list doesn't have any of the same advantages.

At the end of the day, this is a judgement call, and I'm giving you
mine.  If somebody else wants to weigh in, that's fine.  I can't think
of anything that would actually be outright broken under your proposed
approach, but my personal feeling is that it's better to only add the
amount of code that we know is needed to solve the problem actually
observed in practice, and no more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of can't paste into psql issue

2013-07-18 Thread Andrew Dunstan


On 07/18/2013 08:59 AM, Merlin Moncure wrote:

On Thu, Jul 18, 2013 at 7:57 AM, Merlin Moncure mmonc...@gmail.com wrote:

One thing that could solve a lot of issues would be to disable
readline when inside a dollar quote etc.

actually, that's dumb (pre-coffee).




Yeah, but what would be useful would be a way to disable and re-enable 
readline on the fly.


I looked a while back and didn't find an obvious way to do that, but I 
might well have missed something.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 17, 2013 at 2:03 PM, Gurjeet Singh gurj...@singh.im wrote:
 Agreed that bushy/right-deep trees are a remote corner case, but we are
 addressing a remote corner case in the first place (insanely long AND lists)
 and why not handle another remote corner case right now if it doesn't cause
 an overhead for common case.

 Because simpler code is less likely to have bugs and is easier to
 maintain.

I agree with that point, but one should also remember Polya's Inventor's
Paradox: the more general problem may be easier to solve.  That is, if
done right, code that fully flattens an AND tree might actually be
simpler than code that does just a subset of the transformation.  The
current patch fails to meet this expectation, but maybe you just haven't
thought about it the right way.

My concerns about this patch have little to do with that, though, and
much more to do with the likelihood that it breaks some other piece of
code that is expecting AND/OR to be strictly binary operators, which
is what they've always been in parsetrees that haven't reached the
planner.  It doesn't appear to me that you've done any research on that
point whatsoever --- you have not even updated the comment for BoolExpr
(in primnodes.h) that this patch falsifies.

 It's worth noting that the change you're proposing is in
 fact user-visible, as demonstrated by the fact that you had to update
 the regression test output:

The point to worry about here is whether rule dump and reload is still
safe.  In particular, the logic in ruleutils.c for deciding whether it's
safe to omit parentheses has only really been thought about/tested for
the binary AND/OR case.  Although that code can dump N-way AND/OR
because it's also used to print post-planner expression trees in EXPLAIN,
that case has never been held to the standard of is the parser
guaranteed to interpret this expression the same as before?.  Perhaps
it's fine, but has anyone looked at that issue?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Fabien COELHO


Hello again Tatsuo,


For your information, included is the patch against git master head to
implement the lag in a way what I proposed. With the patch, I get more
consistent number on Linux (and Mac OS X).


I must disagree with your proposal: At least, it does not provide the 
information I want, but another one.


ISTM that this patch measures the lag which is due to pgbench thread 
coming around to deal with a transaction after sleeping. I would expect 
that to be quite small most of the time, so I agree that it must be 
reassuringly consistent.


However it does not provide the information I want, which is the measure 
of the health of pgbench with respect to the stochastic process scheduled 
transactions.


Basically you propose a partial lag measure, which will be smaller, but 
which does not tell whether pgbench is able to follow the schedule, which 
is the information I find useful in this context to appreciate if the 
throttling is going well.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 1. snapshot-error-v1.patch introduces a new special snapshot, called
 SnapshotError.  In the cases where we set SnapshotNow as a sort of
 default snapshot, this patch changes the code to use SnapshotError
 instead.  This affects scan-xs_snapshot in genam.c and
 estate-es_snapshot in execUtils.c.  This passes make check-world, so
 apparently there is no code in the core distribution that does this.
 However, this is safer for third-party code, which will ERROR instead
 of seg faulting.  The alternative approach would be to use
 InvalidSnapshot, which I think would be OK too if people dislike this
 approach.

FWIW, I think using InvalidSnapshot would be preferable to introducing
a new concept for what's pretty much the same thing.

 With that done, the only remaining uses of SnapshotNow in our code
 base will be in currtid_byreloid() and currtid_byrelname().  So far no
 one on this list has been able to understand clearly what the purpose
 of those functions is, so I'm copying this email to pgsql-odbc in case
 someone there can provide more insight.

I had the idea they were used for a client-side implementation of WHERE
CURRENT OF.  Perhaps that's dead code and could be removed entirely?

 If we don't want to risk any change to the semantics, we can (1) grit
 our teeth and keep SnapshotNow around or (2) use an instantaneous MVCC
 snapshot there, and accept that people who have very large connection
 counts and extremely heavy use of those functions may see a
 performance regression.

Of those I'd go for (2); it's unlikely that an extra snapshot would be
visible compared to the query roundtrip overhead.  But another attractive
possibility is to make these functions use GetActiveSnapshot(), which
would avoid taking any new snapshot.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Robert Haas
On Thu, Jul 18, 2013 at 10:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 1. snapshot-error-v1.patch introduces a new special snapshot, called
 SnapshotError.  In the cases where we set SnapshotNow as a sort of
 default snapshot, this patch changes the code to use SnapshotError
 instead.  This affects scan-xs_snapshot in genam.c and
 estate-es_snapshot in execUtils.c.  This passes make check-world, so
 apparently there is no code in the core distribution that does this.
 However, this is safer for third-party code, which will ERROR instead
 of seg faulting.  The alternative approach would be to use
 InvalidSnapshot, which I think would be OK too if people dislike this
 approach.

 FWIW, I think using InvalidSnapshot would be preferable to introducing
 a new concept for what's pretty much the same thing.

Andres voted the other way on the previous thread.  I'll wait and see
if there are any other opinions.  The SnapshotError concept seemed
attractive to me initially, but I'm not as excited about it after
seeing how it turned out, so maybe it's best to do it as you suggest.

 With that done, the only remaining uses of SnapshotNow in our code
 base will be in currtid_byreloid() and currtid_byrelname().  So far no
 one on this list has been able to understand clearly what the purpose
 of those functions is, so I'm copying this email to pgsql-odbc in case
 someone there can provide more insight.

 I had the idea they were used for a client-side implementation of WHERE
 CURRENT OF.  Perhaps that's dead code and could be removed entirely?

It's been reported that ODBC still uses them.

 If we don't want to risk any change to the semantics, we can (1) grit
 our teeth and keep SnapshotNow around or (2) use an instantaneous MVCC
 snapshot there, and accept that people who have very large connection
 counts and extremely heavy use of those functions may see a
 performance regression.

 Of those I'd go for (2); it's unlikely that an extra snapshot would be
 visible compared to the query roundtrip overhead.  But another attractive
 possibility is to make these functions use GetActiveSnapshot(), which
 would avoid taking any new snapshot.

You could probably construct a case where the overhead is visible, if
you ran the functions many times in a single query, but arguably no
one does that.  Also, Andres's test case that involves running BEGIN;
SELECT txid_current(); very short sleep; COMMIT; in several hundred
sessions at once is pretty brutal on PGXACT and makes the overhead of
taking extra snapshots a lot more visible.

I'm not too familiar with GetActiveSnapshot(), but wouldn't that
change the user-visible semantics?  If, for example, someone's using
that function to test whether the row has been updated since their
snapshot was taken, that use case would get broken.  SnapshotSelf
would be change from the current behavior in many fewer cases than
using an older snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Robert Haas
On Sun, Jul 14, 2013 at 3:13 PM, Greg Smith g...@2ndquadrant.com wrote:
 Accordingly, the current behavior--no delay--is already the best possible
 throughput.  If you apply a write timing change and it seems to increase
 TPS, that's almost certainly because it executed less checkpoint writes.
 It's not a fair comparison.  You have to adjust any delaying to still hit
 the same end point on the checkpoint schedule. That's what my later
 submissions did, and under that sort of controlled condition most of the
 improvements went away.

This is all valid logic, but I don't think it's makes the patch a bad
idea.  What KONDO Mitsumasa is proposing (or proposed at one point,
upthread), is that when an fsync takes a long time, we should wait
before issuing the next fsync, and the delay should be proportional to
how long the previous fsync took.  On a system that's behaving well,
where fsyncs are always fast, that's going to make very little
difference.  On a system where fsync is sometimes very very slow, that
might result in the checkpoint overrunning its time budget - but SO
WHAT?

I mean, yes, we want checkpoints to complete in the time specified,
but if the I/O system is completely flogged, I suspect most people
would prefer to overrun the checkpoint's time budget rather than have
all foreground activity grind to a halt until the checkpoint finishes.
 As I'm pretty sure you've pointed out in the past, when this
situation develops, the checkpoint may be doomed to overrun whether we
like it or not.  We should view this as an emergency pressure release
valve; if we think not everyone will want it, then make it a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Greg Smith
Please stop all this discussion of patents in this area.  Bringing up a 
US patents here makes US list members more likely to be treated as 
willful infringers of that patent: 
http://www.ipwatchdog.com/patent/advanced-patent/willful-patent-infringement/ 
if the PostgreSQL code duplicates that method one day.


The idea of surveying patents in some area so that their methods can be 
avoided in code you develop, that is a reasonable private stance to 
take.  But don't do that on the lists.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Greg Smith

On 7/18/13 11:04 AM, Robert Haas wrote:

On a system where fsync is sometimes very very slow, that
might result in the checkpoint overrunning its time budget - but SO
WHAT?


Checkpoints provide a boundary on recovery time.  That is their only 
purpose.  You can always do better by postponing them, but you've now 
changed the agreement with the user about how long recovery might take.


And if you don't respect the checkpoint boundary, what you can't do is 
then claim better execution performance than something that did.  It's 
always possible to improvement throughput by postponing I/O.  SO WHAT? 
If that's OK, you don't need complicated logic to do that.  Increase 
checkpoint_timeout.  The system with checkpoint_timeout at 6 minutes 
will always outperform one where it's 5.


You don't need to introduce a feedback loop--something that has 
significant schedule stability implications if it gets out of 
control--just to spread I/O out further.  I'd like to wander down the 
road of load-sensitive feedback for database operations, especially for 
maintenance work.  But if I build something that works mainly because it 
shifts the right edge of the I/O deadline forward, I am not fooled into 
thinking I did something awesome.  That's cheating, getting better 
performance mainly by throwing out the implied contract with the 
user--the one over their expected recovery time after a crash.  And I'm 
not excited about complicating the PostgreSQL code to add a new way to 
do that, not when checkpoint_timeout is already there with a direct, 
simple control on the exact same trade-off.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Alvaro Herrera
Robert Haas escribió:
 On Thu, Jul 18, 2013 at 10:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  1. snapshot-error-v1.patch introduces a new special snapshot, called
  SnapshotError.  In the cases where we set SnapshotNow as a sort of
  default snapshot, this patch changes the code to use SnapshotError
  instead.  This affects scan-xs_snapshot in genam.c and
  estate-es_snapshot in execUtils.c.  This passes make check-world, so
  apparently there is no code in the core distribution that does this.
  However, this is safer for third-party code, which will ERROR instead
  of seg faulting.  The alternative approach would be to use
  InvalidSnapshot, which I think would be OK too if people dislike this
  approach.
 
  FWIW, I think using InvalidSnapshot would be preferable to introducing
  a new concept for what's pretty much the same thing.
 
 Andres voted the other way on the previous thread.  I'll wait and see
 if there are any other opinions.  The SnapshotError concept seemed
 attractive to me initially, but I'm not as excited about it after
 seeing how it turned out, so maybe it's best to do it as you suggest.

Yeah ... SnapshotError is a way to ensure the server doesn't crash if an
extension hasn't been fixed in order not to cause a crash if it doesn't
use the APIs correctly.  However, there's many other ways for a
C-language extension to cause crashes, so I don't think this is buying
us much.

  With that done, the only remaining uses of SnapshotNow in our code
  base will be in currtid_byreloid() and currtid_byrelname().  So far no
  one on this list has been able to understand clearly what the purpose
  of those functions is, so I'm copying this email to pgsql-odbc in case
  someone there can provide more insight.
 
  I had the idea they were used for a client-side implementation of WHERE
  CURRENT OF.  Perhaps that's dead code and could be removed entirely?
 
 It's been reported that ODBC still uses them.

They don't show up in a quick grep of psqlodbc's source code, FWIW.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Robert Haas
On Thu, Jul 18, 2013 at 11:41 AM, Greg Smith g...@2ndquadrant.com wrote:
 On 7/18/13 11:04 AM, Robert Haas wrote:
 On a system where fsync is sometimes very very slow, that
 might result in the checkpoint overrunning its time budget - but SO
 WHAT?

 Checkpoints provide a boundary on recovery time.  That is their only
 purpose.  You can always do better by postponing them, but you've now
 changed the agreement with the user about how long recovery might take.

 And if you don't respect the checkpoint boundary, what you can't do is then
 claim better execution performance than something that did.  It's always
 possible to improvement throughput by postponing I/O.  SO WHAT? If that's
 OK, you don't need complicated logic to do that.  Increase
 checkpoint_timeout.  The system with checkpoint_timeout at 6 minutes will
 always outperform one where it's 5.

 You don't need to introduce a feedback loop--something that has significant
 schedule stability implications if it gets out of control--just to spread
 I/O out further.  I'd like to wander down the road of load-sensitive
 feedback for database operations, especially for maintenance work.  But if I
 build something that works mainly because it shifts the right edge of the
 I/O deadline forward, I am not fooled into thinking I did something awesome.
 That's cheating, getting better performance mainly by throwing out the
 implied contract with the user--the one over their expected recovery time
 after a crash.  And I'm not excited about complicating the PostgreSQL code
 to add a new way to do that, not when checkpoint_timeout is already there
 with a direct, simple control on the exact same trade-off.

That's not the same trade-off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Alvaro Herrera
Greg Smith escribió:
 On 7/18/13 11:04 AM, Robert Haas wrote:
 On a system where fsync is sometimes very very slow, that
 might result in the checkpoint overrunning its time budget - but SO
 WHAT?
 
 Checkpoints provide a boundary on recovery time.  That is their only
 purpose.  You can always do better by postponing them, but you've
 now changed the agreement with the user about how long recovery
 might take.
 
 And if you don't respect the checkpoint boundary, what you can't do
 is then claim better execution performance than something that did.
 It's always possible to improvement throughput by postponing I/O.
 SO WHAT? If that's OK, you don't need complicated logic to do that.
 Increase checkpoint_timeout.  The system with checkpoint_timeout at
 6 minutes will always outperform one where it's 5.

I think the idea is to have a system in which most of the time the
recovery time will be that for checkpoint_timeout=5, but in those
(hopefully rare) cases where checkpoints take a bit longer, the recovery
time will be that for checkpoint_timeout=6.

In any case, if the system crashes past minute 5 after the previous
checkpoint (the worst possible timing), the current checkpoint will have
already started, so recovery will take slightly less time because some
flush work had already been done.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Robert Haas
On Thu, Jul 18, 2013 at 11:54 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 They don't show up in a quick grep of psqlodbc's source code, FWIW.

Hmm.  Maybe we should just remove them and see if anyone complains.
We could always put them back (or make them available via contrib) if
it's functionality someone actually needs.  The last discussion of
those functions was in 2007 and nobody seemed too sure back then
either, so maybe the rumor that anyone is actually using this is no
more than rumor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Andres Freund
On 2013-07-18 12:01:39 -0400, Robert Haas wrote:
 On Thu, Jul 18, 2013 at 11:54 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  They don't show up in a quick grep of psqlodbc's source code, FWIW.
 
 Hmm.  Maybe we should just remove them and see if anyone complains.
 We could always put them back (or make them available via contrib) if
 it's functionality someone actually needs.  The last discussion of
 those functions was in 2007 and nobody seemed too sure back then
 either, so maybe the rumor that anyone is actually using this is no
 more than rumor.

I am pretty sure they are still used. A quick grep on a not too old
checkout prooves that... Note that the sql accessible functions are
named currtid and currtid2 (yes, really)...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New statistics for WAL buffer dirty writes

2013-07-18 Thread Alvaro Herrera

What happened to this patch?  We were waiting on an updated version from
you.


Satoshi Nagayasu wrote:
 (2012/12/10 3:06), Tomas Vondra wrote:
 On 29.10.2012 04:58, Satoshi Nagayasu wrote:
 2012/10/24 1:12, Alvaro Herrera wrote:
 Satoshi Nagayasu escribi�:
 
 With this patch, walwriter process and each backend process
 would sum up dirty writes, and send it to the stat collector.
 So, the value could be saved in the stat file, and could be
 kept on restarting.
 
 The statistics could be retreive with using
 pg_stat_get_xlog_dirty_writes() function, and could be reset
 with calling pg_stat_reset_shared('walwriter').
 
 Now, I have one concern.
 
 The reset time could be captured in globalStats.stat_reset_timestamp,
 but this value is the same with the bgwriter one.
 
 So, once pg_stat_reset_shared('walwriter') is called,
 stats_reset column in pg_stat_bgwriter does represent
 the reset time for walwriter, not for bgwriter.
 
 How should we handle this?  Should we split this value?
 And should we have new system view for walwriter?
 
 I think the answer to the two last questions is yes.  It doesn't seem to
 make sense, to me, to have a single reset timings for what are
 effectively two separate things.
 
 Please submit an updated patch to next CF.  I'm marking this one
 returned with feedback.  Thanks.
 
 
 I attached the latest one, which splits the reset_time
 for bgwriter and walwriter, and provides new system view,
 called pg_stat_walwriter, to show the dirty write counter
 and the reset time.
 
 I've done a quick review of the v4 patch:
 
 Thanks for the review, and sorry for my delayed response.
 
 1) applies fine on HEAD, compiles fine
 
 2) make installcheck fails because of a difference in the 'rules'
 test suite (there's a new view pg_stat_walwriter - see the
 attached patch for a fixed version or expected/rules.out)
 
 Ah, I forgot about the regression test. I will fix it. Thanks.
 
 3) I do agree with Alvaro that using the same struct for two separate
 components (bgwriter and walwriter) seems a bit awkward. For example
 you need to have two separate stat_reset fields, the reset code
 becomes much more verbose (because you need to list individual
 fields) etc.
 
 So I'd vote to either split this into two structures or keeping it
 as a single structure (although with two views on top of it).
 
 Ok, I will split it into two structs, PgStat_BgWriterGlobalStats and
 PgStat_WalWriterGlobalStats, and will modify PgStat_GlobalStats to
 hold those two structs in the stat collector.
 
 4) Are there any other fields that might be interesting? Right now
 there's just dirty_writes but I guess there are other values. E.g.
 how much data was actually written etc.?
 
 AFAIK, I think those numbers can be obtained by calling
 pg_current_xlog_insert_location() or pg_current_xlog_location(),
 but if we need it, I will add it.
 
 Regards,


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] is a special cost for external sort?

2013-07-18 Thread Pavel Stehule
Hello

I found a slow query with large external sort. I expect, so external
sort should be penalized. Is it?

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Alvaro Herrera
Andres Freund escribió:
 On 2013-07-18 12:01:39 -0400, Robert Haas wrote:
  On Thu, Jul 18, 2013 at 11:54 AM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
   They don't show up in a quick grep of psqlodbc's source code, FWIW.
  
  Hmm.  Maybe we should just remove them and see if anyone complains.
  We could always put them back (or make them available via contrib) if
  it's functionality someone actually needs.  The last discussion of
  those functions was in 2007 and nobody seemed too sure back then
  either, so maybe the rumor that anyone is actually using this is no
  more than rumor.
 
 I am pretty sure they are still used. A quick grep on a not too old
 checkout prooves that... Note that the sql accessible functions are
 named currtid and currtid2 (yes, really)...

Ah, yeah, that does show up.  I had grepped for 'currtid_'.  Sorry.
They're all in positioned_load() in results.c.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Robert Haas
On Thu, Jul 18, 2013 at 12:25 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Ah, yeah, that does show up.  I had grepped for 'currtid_'.  Sorry.
 They're all in positioned_load() in results.c.

Well, in that case, we'll have to keep it around.  I still wish we
could get a clear answer to the question of how it's being used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Greg Smith

On 7/18/13 12:00 PM, Alvaro Herrera wrote:

I think the idea is to have a system in which most of the time the
recovery time will be that for checkpoint_timeout=5, but in those
(hopefully rare) cases where checkpoints take a bit longer, the recovery
time will be that for checkpoint_timeout=6.


I understand the implementation.  My point is that if you do that, the 
fair comparison is to benchmark it against a current system where 
checkpoint_timeout=6 minutes.  That is a) simpler, b) requires no code 
change, and c) makes the looser standards the server is now settling for 
transparent to the administrator.  Also, my expectation is that it would 
perform better all of the time, not just during the periods this new 
behavior kicks in.


Right now we have checkpoint_completion_target as a GUC for controlling 
what's called the spread of a checkpoint over time.  That sometimes goes 
over, but that's happening against the best attempts of the server to do 
better.


The first word that comes to mind for for just disregarding the end time 
is that it's a sloppy checkpoint.  There is all sorts of sloppy behavior 
you might do here, but I've worked under the assumption that ignoring 
the contract with the administrator was frowned on by this project.  If 
people want this sort of behavior in the server, I'm satisfied my 
distaste for the idea and the reasoning behind it is clear now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Josh Berkus
On 07/18/2013 04:25 AM, Amit Kapila wrote:
 On Thursday, July 18, 2013 12:38 AM Josh Berkus wrote:
 On 07/17/2013 12:01 PM, Alvaro Herrera wrote:
 Both of these seem like they would make troubleshooters' lives more
 difficult.  I think we should just parse the auto file automatically
 after parsing postgresql.conf, without requiring the include
 directive
 to be there.

 Wait, I thought the auto file was going into the conf.d directory?
 
 Auto file is going into config directory, but will that make any difference 
 if we have to parse it automatically after postgresql.conf.

Well, I thought that the whole conf.d directory automatically got parsed
after postgresql.conf.  No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-18 Thread Gurjeet Singh
On Thu, Jul 18, 2013 at 10:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:


  Because simpler code is less likely to have bugs and is easier to
  maintain.

 I agree with that point, but one should also remember Polya's Inventor's
 Paradox: the more general problem may be easier to solve.  That is, if
 done right, code that fully flattens an AND tree might actually be
 simpler than code that does just a subset of the transformation.  The
 current patch fails to meet this expectation,


The current patch does completely flatten any type of tree (left/right-deep
or bushy) without recursing, and right-deep and bushy tree processing is
what Robert is recommending to defer to recursive processing. Maybe I
haven't considered a case where it doesn't flatten the tree; do you have an
example in mind.


 but maybe you just haven't
 thought about it the right way.

 My concerns about this patch have little to do with that, though, and
 much more to do with the likelihood that it breaks some other piece of
 code that is expecting AND/OR to be strictly binary operators, which
 is what they've always been in parsetrees that haven't reached the
 planner.  It doesn't appear to me that you've done any research on that
 point whatsoever


No, I haven't, and I might not be able to research it for a few more weeks.


 you have not even updated the comment for BoolExpr
 (in primnodes.h) that this patch falsifies.


I will fix that.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


[HACKERS] Simple documentation typo patch

2013-07-18 Thread David Christensen
Hey folks, this corrects typos going back to 
75c6519ff68dbb97f73b13e9976fb8075bbde7b8 where EUC_JIS_2004 and SHIFT_JIS_2004 
were first added.

These typos are present in all supported major versions of PostgreSQL, back 
through 8.3; I didn't look any further than that. :-)




0001-Doc-patch-for-typo-in-built-in-conversion-type-names.patch
Description: Binary data



Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com
785-727-1171




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] is a special cost for external sort?

2013-07-18 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 I found a slow query with large external sort. I expect, so external
 sort should be penalized. Is it?

See cost_sort() in src/backend/optimizer/path/costsize.c

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument

2013-07-18 Thread Fujii Masao
On Thu, Jul 18, 2013 at 1:49 PM, Rushabh Lathia
rushabh.lat...@gmail.com wrote:



 On Thu, Jul 18, 2013 at 9:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote:

 (2013/07/18 2:31), Fujii Masao wrote:

 On Tue, Jul 16, 2013 at 3:00 PM, Satoshi Nagayasu sn...@uptime.jp
 wrote:

 (2013/07/04 3:58), Fujii Masao wrote:

 For the test, I just implemented the regclass-version of pg_relpages()
 (patch attached) and tested some cases. But I could not get that
 problem.

   SELECT pg_relpages('hoge');-- OK
   SELECT pg_relpages(oid) FROM pg_class WHERE relname = 'hoge';
 -- OK
   SELECT pg_relpages(relname) FROM pg_class WHERE relname = 'hoge';
 -- OK


 In the attached patch, I cleaned up three functions to have
 two types of arguments for each, text and regclass.

pgstattuple(text)
pgstattuple(regclass)
pgstatindex(text)
pgstatindex(regclass)
pg_relpages(text)
pg_relpages(regclass)

 I still think a regclass argument is more appropriate for passing
 relation/index name to a function than text-type, but having both
 arguments in each function seems to be a good choice at this moment,
 in terms of backward-compatibility.

 Docs needs to be updated if this change going to be applied.


 Yes, please.


 Updated docs and code comments, etc. PFA.

Thanks for updating the patch. Committed.

 'make installcheck' failed in my machine.


 Hmm, it works on my box...


 Works for me too.

Hmm... make installcheck still failed on my box. That's because
you added several SELECT queries into sql/pgstattuple.sql, but
you just added only two results into expected/pgstattuple.out.
I corrected the regression test code of pgstattuple.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Fujii Masao
On Fri, Jul 19, 2013 at 2:45 AM, Josh Berkus j...@agliodbs.com wrote:
 On 07/18/2013 04:25 AM, Amit Kapila wrote:
 On Thursday, July 18, 2013 12:38 AM Josh Berkus wrote:
 On 07/17/2013 12:01 PM, Alvaro Herrera wrote:
 Both of these seem like they would make troubleshooters' lives more
 difficult.  I think we should just parse the auto file automatically
 after parsing postgresql.conf, without requiring the include
 directive
 to be there.

 Wait, I thought the auto file was going into the conf.d directory?

 Auto file is going into config directory, but will that make any difference
 if we have to parse it automatically after postgresql.conf.

 Well, I thought that the whole conf.d directory automatically got parsed
 after postgresql.conf.  No?

No, in the previous patch. We needed to set include_dir to 'config' (though
that's the default).

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-18 Thread Stephen Frost
* Greg Smith (g...@2ndquadrant.com) wrote:
 The first word that comes to mind for for just disregarding the end
 time is that it's a sloppy checkpoint.  There is all sorts of sloppy
 behavior you might do here, but I've worked under the assumption
 that ignoring the contract with the administrator was frowned on by
 this project.  If people want this sort of behavior in the server,
 I'm satisfied my distaste for the idea and the reasoning behind it
 is clear now.

For my part, I agree with Greg on this.  While we might want to provide
an option of go ahead and go past checkpoint timeout if the server gets
too busy to keep up, I don't think it should be the default.

To be honest, I'm also not convinced that this approach is better than
the existing mechanism where the user can adjust checkpoint_timeout to
be higher if they're ok with recovery taking longer and I share Greg's
concern about this backoff potentially running away and causing
checkpoints which never complete or do so far outside the configured
time.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Josh Berkus
On 07/17/2013 08:15 PM, Andrew Gierth wrote:
 The spec defines two types of aggregate function classed as ordered set
 function, as follows:
  
 1. An inverse distribution function taking one argument (which must be
a grouped column or otherwise constant within groups) plus a sorted
group with exactly one column:
  
=# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ...
  
The motivating example for this (and the only ones in the spec) are
percentile_cont and percentile_disc, to return a percentile result
from a continuous or discrete distribution. (Thus
percentile_cont(0.5) within group (order by x) is the spec's version
of a median(x) function.)

One question is how this relates to the existing

   SELECT agg_func(x order by y)

... syntax.  Clearly there's some extra functionality here, but the two
are very similar conceptually.

 2. A hypothetical set function taking N arguments of arbitrary types
(a la VARIADIC any, rather than a fixed list) plus a sorted group
with N columns of matching types:
  
=# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ...
  
(where typeof(p1)==typeof(q1) and so on, at least up to trivial
conversions)
  
The motivating example here is to be able to do rank(p1,p2,...) to
return the rank that the specified values would have had if they were
added to the group.

Wow, I can't possibly grasp the purpose of this.  Maybe a practical example?

 We've also had an expression of interest in extending this to allow
 percentile_disc(float8[]) and percentile_cont(float8[]) returning
 arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return an
 array containing the bounds, median and quartiles in one go. This is an
 extension to the spec but it seems sufficiently obviously useful to be
 worth supporting.

To be specific, I asked for this because it's already something I do
using PL/R, although in PL/R it's pretty much limited to floats.

Anyway, for anyone who isn't following why we want this: statitical
summary reports.  For example, I'd love to be able to do a quartile
distribution of query execution times without resorting to R.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Settings of SSL context for PGserver and for libpq

2013-07-18 Thread Dmitrij K
Dear Developers.
Could you do things written in this message ?
/// +/* Target 
auditorium of this doc are: developers the Postgresql, developers apps c/c++, 
paranoiacs .  A hosting(dedicted/virtual) is not safe place for storing the 
private key/cert. Here is presented a concept secured way to initialize context 
SSL for PGserver.  This doc has three parts: 1) module, that will load 
dynamically by the PGserver for initializing the context SSL, and import this 
into the PGserver address space. 2) some server actions (checking cert  pkey), 
and adding new variable into `pg_config'. 3) and some recomendations for adding 
new procs into API libpq.  PS: this way is not panacea, but is protecting from 
fools. PSS: sorry for my english.  Thanks for your attention.  -- The best 
regards.*/

/// +
/* 1) Part first: Module side: module is a shared library written on `C' - for 
get symbol `PGimportSSLCTX' by `dlsym'  from the PGserver.*/
// [SOURCEFILE=pg_import_sslctx.c]// for compile it by GCC: gcc -shared -fPIC 
-c pg_import_sslctx.c  ld -shared -export-dynamic pg_import_sslctx.o -o 
libpg_import_sslctx.so -ldl -lc
#include stdint.h#include sys/types.h#include openssl/ssl.h#include 
openssl/x509.h
// function for export SSL_CTX into the PGserver address spaceextern int 
PGimportSSLCTX ( SSL_CTX **pp_ctx );
void *h_ssl; // handle of `libssl'
// cert and private key in a encrypted form (simulation)uint8_t pkey [] = {'s', 
'u', 'p', 'e', 'r', ' ', 'e', 'n', 'c', 'r', 'y', 'p', 't', 'e', 'd', ' ', 'k', 
'e', 'y', 0};uint8_t cert [] = {'s', 'u', 'p', 'e', 'r', ' ', 'e', 'n', 'c', 
'r', 'y', 'p', 't', 'e', 'd', ' ', 'c', 'e', 'r', 't', 0};
// blank proc decrypting the cert/pkey (simulation)void decrypt (uint8_t *data) 
{ return ; }
// simulation of trusted certsuint8_t trusted_cert1 [] = {0, 5, 7, 12, 76, 4, 
9};uint8_t trusted_cert2 [] = {89, 5, 4, 12, 56, 11, 0};

// NOTE : for windows, you can use the `dlfcn-win32' , or you can use code 
below:#ifdef _WIN32#include windows.h#define RTLD_LAZY   
0x1#define RTLD_NOW0x2#define RTLD_BINDING_MASK   
0x3#define RTLD_NOLOAD 0x4#define RTLD_GLOBAL 0x00100#define 
RTLD_LOCAL  0#define RTLD_NODELETE   0x01000
inline void* dlopen ( const char *f, int m ){ HMODULE *hdll; DWORD flags;  
flags = LOAD_WITH_ALTERED_SEARCH_PATH; if( (m  RTLD_LAZY) ){ flags |= 
DONT_RESOLVE_DLL_REFERENCES; } if(f == NULL){ hdll = (HMODULE *) 
GetModuleHandle ( NULL ); } else { hdll = (HMODULE *) LoadLibraryEx ( f, NULL, 
flags ); } return (void *) hdll;}
inline int dlclose ( void* h ) { if ( !FreeLibrary ((HMODULE)h) ){ return -1; } 
return 0;}
inline void* dlsym ( void* h, const char *name ) { return  (void *) 
GetProcAddress ( (HMODULE)h, name );}
#else
#include dlfcn.h #include sys/mman.h
#endif

// this procedure will be called after loading the shared libvoid _init () { 
#if defined (_WIN32) // I don't know how to off swapping on Windows, sorry 
:(... const char *filedll = libssl.dll; #else mlockall ( MCL_CURRENT ); // 
swap off const char *filedll = libssl.so; // `libssl.so' is link to 
`libssl.so.1.0.0' usually #endif  // We are trying load `libssl' h_ssl = dlopen 
( filedll, RTLD_NOW | RTLD_NOLOAD ); if ( !h_ssl ) { h_ssl = dlopen ( filedll, 
RTLD_NOW | RTLD_LOCAL ); }}
// this procedure will be called before unloading the shared libvoid _fini () { 
if ( h_ssl ) { dlclose ( h_ssl ); } // unload `libssl' #ifndef _WIN32 
munlockall (); #endif}

// pointers to procedures SSL, imported form shared `libssl'const SSL_METHOD* 
(*p_TLSv1_server_method)(void);SSL_CTX* (*p_SSL_CTX_new)(const 
SSL_METHOD*);void (*p_SSL_CTX_free)(SSL_CTX*);int 
(*p_SSL_CTX_use_PrivateKey)(SSL_CTX*,EVP_PKEY*);int   
(*p_SSL_CTX_use_certificate)(SSL_CTX*,X509*);void 
(*p_SSL_CTX_set_verify)(SSL_CTX*,int ,int(*)(int,X509_STORE_CTX*));long 
(*p_SSL_CTX_ctrl)(SSL_CTX*,int,long,void*);

// realisation:int PGimportSSLCTX ( SSL_CTX **pp_ctx ) { int ret; SSL_CTX *ctx; 
const SSL_METHOD *meth;  ret = 0;  if ( !h_ssl ) { goto end; } // libssl was 
not loaded  if ( !( p_TLSv1_server_method = dlsym ( h_ssl, 
TLSv1_server_method ) ) ) { goto end; } // fails get symbol if ( !( meth = 
p_TLSv1_server_method () ) ) { goto end; } // fails get TLSv1_server_method  if 
( !( p_SSL_CTX_new = dlsym ( h_ssl, SSL_CTX_new ) ) ) { goto end; } // fails 
get symbol if ( !( ctx = p_SSL_CTX_new ( meth) ) ) { goto end; } // fails 
create new context  // decrypt ours the private key and the cert decrypt ( pkey 
); decrypt ( cert );  if ( !( p_SSL_CTX_use_PrivateKey = dlsym ( h_ssl, 
SSL_CTX_use_PrivateKey ) ) ) { goto end; } // fails get symbol if ( !( 
p_SSL_CTX_use_certificate = dlsym ( h_ssl, SSL_CTX_use_certificate ) ) ) { 
goto end; } // fails get symbol // maybe we need do `d2i_PrivateKey'  before 
setup pkey/cert into the context ? .. if ( p_SSL_CTX_use_PrivateKey ( ctx, 

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Alvaro Herrera
Fujii Masao escribió:
 On Fri, Jul 19, 2013 at 2:45 AM, Josh Berkus j...@agliodbs.com wrote:
  On 07/18/2013 04:25 AM, Amit Kapila wrote:
  On Thursday, July 18, 2013 12:38 AM Josh Berkus wrote:
  On 07/17/2013 12:01 PM, Alvaro Herrera wrote:
  Both of these seem like they would make troubleshooters' lives more
  difficult.  I think we should just parse the auto file automatically
  after parsing postgresql.conf, without requiring the include
  directive
  to be there.
 
  Wait, I thought the auto file was going into the conf.d directory?
 
  Auto file is going into config directory, but will that make any difference
  if we have to parse it automatically after postgresql.conf.
 
  Well, I thought that the whole conf.d directory automatically got parsed
  after postgresql.conf.  No?
 
 No, in the previous patch. We needed to set include_dir to 'config' (though
 that's the default).

I know this has been discussed already, but I want to do a quick summary
of existing proposals, point out their drawbacks, and present a proposal
of my own.  Note I ignore the whole file-per-setting vs.
one-file-to-rule-them-all, because the latter has already been shot
down.  So live ideas floated here are:

1. have a config/postgresql.auto.conf file, require include_dir or
   include in postgresql.conf
   This breaks if user removes the config/ directory, or if the user
   removes the include_dir directive from postgresql.conf.  ALTER SYSTEM
   is in the business of doing mkdir() or failing altogether if the dir
   is not present.  Doesn't seem friendly.

2. have a conf.d/ directory, put the file therein; no include or
   include_dir directive is necessary.
   I think this is a separate patch and merits its own discussion.  This
   might be a good idea, but I don't think that this is the
   way to implement ALTER SYSTEM.  If users don't want to allow conf.d
   they can remove it, but that would break ALTER SYSTEM unnecessarily.
   Since they are separate features it seems to me that they should
   function independently.

I think we should just put the config directives of ALTER SYSTEM into a
file, not dir, alongside postgresql.conf; I would suggest
postgresql.auto.conf.  This file is parsed automatically after
postgresql.conf, without requiring an include directive in
postgresql.conf.  If the user does not want that file, they can remove
it; but a future ALTER SYSTEM will create the file again.  No need to
parse stuff to find out whether the directory exists, or the file
exists, or such nonsense.

I think the only drawback of this is that there's no way to disable
ALTER SYSTEM (i.e. there's no directory you can remove to prevent the
thing from working).  But is this a desirable property?  I mean, if we
want to disallow ALTER SYSTEM I think we should provide a direct way to
do that, perhaps allow_alter_system = off in postgresql.conf; but I
don't think we need to provide this, really, at least not in the first
implementation.

Note that the Debian packaging uses postgres-writable directories for
its configuration files, so the daemon is always able to create the file
in the config dir.

This seems the simplest way; config tools such as Puppet know they
always need to consider the postgresql.auto.conf file.

I think the whole business of parsing the file, and then verifying
whether the auto file has been parsed, is nonsensical and should be
removed from the patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fatal error after starting postgres : sys identifiers must be different

2013-07-18 Thread Indrajit Roychoudhury
Hi,

Could you please let me know what does the error system identifiers are
same mean? Below is the snapshot from one of the masters.
I am setting up BDR as per the wiki
  http://wiki.postgresql.org/wiki/BDR_User_Guide#Initial_setup
and source @
 git://git.postgresql.org/git/users/andresfreund/postgres.git

irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$ ./postgres -D
~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin/data2/
LOG:  bgworkers, connection: dbname=testdb2
LOG:  option: dbname, val: testdb2
LOG:  registering background worker: bdr apply: ubuntuirc2
LOG:  loaded library bdr
LOG:  database system was shut down at 2013-03-17 10:56:52 PDT
LOG:  doing logical startup from 0/17B6410
LOG:  starting up replication identifier with ckpt at 0/17B6410
LOG:  autovacuum launcher started
LOG:  starting background worker process bdr apply: ubuntuirc2
LOG:  database system is ready to accept connections
LOG:  bdr apply: ubuntuirc2 initialized on testdb2, remote dbname=testdb2
replication=true fallback_application_name=bdr
FATAL:  system identifiers must differ between the nodes
DETAIL:  Both system identifiers are 5856368744762683487.
LOG:  worker process: bdr apply: ubuntuirc2 (PID 16712) exited with exit
code 1
^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down

pgcontrol_data outputs different database system ids for the two nodes. So
don't understand why it says identifiers are same.
postgresql.conf content in one of the masters is like this-

/
shared_preload_libraries = 'bdr'
bdr.connections = 'ubuntuirc2'
bdr.ubuntuirc2.dsn = 'dbname=testdb2'
/

Two nodes are ubuntuirc and ubuntuirc2. Above is the output of the
postgresql.conf from ubuntuirc.

Any help on this will be appreciated.

Thanks.


On Thu, Jul 18, 2013 at 9:48 AM, Indrajit Roychoudhury 
indrajit.roychoudh...@gmail.com wrote:

 Hi Alvaro,

 Could you please let me know what does the error system identifiers are
 same mean? Below is the snapshot from one of the masters.
 I am setting up BDR as per the wiki
   http://wiki.postgresql.org/wiki/BDR_User_Guide#Initial_setup
 and source @
  git://git.postgresql.org/git/users/andresfreund/postgres.git

 irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$ ./postgres
 -D ~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin/data2/
 LOG:  bgworkers, connection: dbname=testdb2
 LOG:  option: dbname, val: testdb2
 LOG:  registering background worker: bdr apply: ubuntuirc2
 LOG:  loaded library bdr
 LOG:  database system was shut down at 2013-03-17 10:56:52 PDT
 LOG:  doing logical startup from 0/17B6410
 LOG:  starting up replication identifier with ckpt at 0/17B6410
 LOG:  autovacuum launcher started
 LOG:  starting background worker process bdr apply: ubuntuirc2
 LOG:  database system is ready to accept connections
 LOG:  bdr apply: ubuntuirc2 initialized on testdb2, remote dbname=testdb2
 replication=true fallback_application_name=bdr
 FATAL:  system identifiers must differ between the nodes
 DETAIL:  Both system identifiers are 5856368744762683487.
 LOG:  worker process: bdr apply: ubuntuirc2 (PID 16712) exited with exit
 code 1
 ^CLOG:  received fast shutdown request
 LOG:  aborting any active transactions
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down

 Thanks.



Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Josh Berkus
Alvaro,

 I think the only drawback of this is that there's no way to disable
 ALTER SYSTEM (i.e. there's no directory you can remove to prevent the
 thing from working).  But is this a desirable property?  I mean, if we
 want to disallow ALTER SYSTEM I think we should provide a direct way to
 do that, perhaps allow_alter_system = off in postgresql.conf; but I
 don't think we need to provide this, really, at least not in the first
 implementation.

Agreed that turning alter system off by deleting the directory is NOT a
desireable feature.  I'm also unclear on the desire to disable ALTER
SYSTEM; if someone has superuser access, then they can just use
pg_write_file to add config directives anyway, no?  So there's not any
security value in disabling it.  Maybe there's a case I'm not thinking of.

Of course, people *can* disable it by creating a blank
postgresql.auto.conf file in the right place and making it
non-writeable, if they want.

We are missing one feature, which is the ability to relocate the
postgresql.auto.conf file if relocating it is desireable according to
some sysadmin spec.  This kind of ties into another patch which was
discussed on this list -- the ability to relocate the recovery.conf
file.  Personally, I think that it would be better for the users if we
provided a way to relocate *all* conf files to the same master
directory, but that we don't need a way to relocate each config file
individually, but that's a longer discussion.

In other words, let's accept an ALTER SYSTEM patch which doesn't support
relocating, and then argue whether a second patch which supports
relocating is needed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Andrew Gierth
Josh Berkus wrote:
 On 07/17/2013 08:15 PM, Andrew Gierth wrote:
  The spec defines two types of aggregate function classed as ordered set
  function, as follows:
   
  1. An inverse distribution function taking one argument (which must be
 a grouped column or otherwise constant within groups) plus a sorted
 group with exactly one column:
   
 =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ...
   
 The motivating example for this (and the only ones in the spec) are
 percentile_cont and percentile_disc, to return a percentile result
 from a continuous or discrete distribution. (Thus
 percentile_cont(0.5) within group (order by x) is the spec's version
 of a median(x) function.)
 
 One question is how this relates to the existing
 
SELECT agg_func(x order by y)
 
 ... syntax.  Clearly there's some extra functionality here, but the two
 are very similar conceptually.

Well, as you probably know, the spec is a whole pile of random
special-case syntax and any similarities are probably more accidental
than anything else.

A major difference is that in agg(x order by y), the values of y are
not passed to the aggregate function - they serve no purpose other
than controlling the order of the x values. Whereas in WITHIN GROUP,
the values in the ORDER BY ... clause are in some sense the primary
input to the aggregate, and the p argument is secondary and can't
vary between rows of the group.

Our implementation does heavily reuse the existing executor mechanics
for ORDER BY in aggregates, and it also reuses a fair chunk of the
parser code for it, but there are significant differences.

[of hypothetical set functions]
 Wow, I can't possibly grasp the purpose of this.  Maybe a practical
 example?

=# select rank(123) within group (order by x)
 from (values (10),(50),(100),(200),(500)) v(x);

would return 1 row containing the value 4, because if you added the
value 123 to the grouped values, it would have been ranked 4th.

Any time you want to calculate what the rank, dense_rank or cume_dist
would be of a specific row within a group without actually adding the
row to the group, this is how it's done.

I don't have any practical examples to hand, but this beast seems to
be implemented in at least Oracle and MSSQL so I guess it has uses.

[on supporting arrays of percentiles]
 To be specific, I asked for this because it's already something I do
 using PL/R, although in PL/R it's pretty much limited to floats.

percentile_cont is limited to floats and intervals in the spec; to be
precise, it's limited to taking args of either interval or any numeric
type, and returns interval for interval args, and approximate numeric
with implementation-defined precision, i.e. some form of float, for
numeric-type args. The definition requires interpolation between values,
so it's not clear that there's any point in trying to allow other types.

percentile_disc is also limited to floats and intervals in the spec, but
I see absolutely no reason whatsoever for this, since the definition
given is valid for any type with ordering operators; there is no reason
not to make it fully polymorphic. (The requirement for ordering will be
enforced in parse-analysis anyway, by the ORDER BY transformations, and
the function simply returns one of the input values unaltered.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Josh Berkus
Andrew,

 Well, as you probably know, the spec is a whole pile of random
 special-case syntax and any similarities are probably more accidental
 than anything else.

Hah, I didn't realize that our ordered aggregate syntax even *was* spec.

 A major difference is that in agg(x order by y), the values of y are
 not passed to the aggregate function - they serve no purpose other
 than controlling the order of the x values. Whereas in WITHIN GROUP,
 the values in the ORDER BY ... clause are in some sense the primary
 input to the aggregate, and the p argument is secondary and can't
 vary between rows of the group.
 
 Our implementation does heavily reuse the existing executor mechanics
 for ORDER BY in aggregates, and it also reuses a fair chunk of the
 parser code for it, but there are significant differences.

Well, seems like it would work the same as

 agg_func(constx,coly,colz ORDER BY coly, colz)

... which means you could reuse a LOT of the internal plumbing.  Or am I
missing something?

Also, what would a CREATE AGGREGATE and state function definition for
custom WITHIN GROUP aggregates look like?

 Any time you want to calculate what the rank, dense_rank or cume_dist
 would be of a specific row within a group without actually adding the
 row to the group, this is how it's done.
 
 I don't have any practical examples to hand, but this beast seems to
 be implemented in at least Oracle and MSSQL so I guess it has uses.

Well, I still can't imagine a practical use for it, at least based on
RANK.  I certainly have no objections if you have the code, though.

I'll also point out that mode() requires ordered input as well, so add
that to the set of functions we'll want to eventually support.

One thing I find myself wanting with ordered aggregates is the ability
to exclude NULLs.  Thoughts?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH CHECK OPTION for auto-updatable views

2013-07-18 Thread Stephen Frost
Dean,


* Stephen Frost (sfr...@snowman.net) wrote:
 Thanks!  This is really looking quite good, but it's a bit late and I'm
 going on vacation tomorrow, so I didn't quite want to commit it yet. :)

Apologies on this taking a bit longer than I expected, but it's been
committed and pushed now.  Please take a look and let me know of any
issues you see with the changes that I made.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] [9.4 CF 1] Patches which desperately need code review

2013-07-18 Thread Josh Berkus
Hackers,

The following three patches really really need some code review love.
Until they get a code review, we can't close out the CommitFest:

Row-Level Security:
https://commitfest.postgresql.org/action/patch_view?id=874

Revive Line Type:
https://commitfest.postgresql.org/action/patch_view?id=1154

Performance Improvement by reducing WAL for Update Operation:
https://commitfest.postgresql.org/action/patch_view?id=984

Thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Alvaro Herrera
Josh Berkus escribió:

 We are missing one feature, which is the ability to relocate the
 postgresql.auto.conf file if relocating it is desireable according to
 some sysadmin spec.  This kind of ties into another patch which was
 discussed on this list -- the ability to relocate the recovery.conf
 file.

Well, postgresql.conf is already relocatable.  Is there any advantage in
being able to move postgresql.auto.conf to a different location than
postgresql.conf?  I don't see any.  I didn't follow the recovery.conf
discussion, but I imagine that the reason for wanting to be able to
relocate it is related to standby vs. master distinction.  This doesn't
apply to postgresql.auto.conf, I think, does it?

If people want to drill real down, they can always have a
postgresql.auto.conf that's a symlink. (In this light, we would ship an
empty postgresql.auto.conf in a freshly initdb'd system.  IIRC the
current patch already does that.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 CF 1] Patches which desperately need code review

2013-07-18 Thread Alvaro Herrera
Josh Berkus wrote:

 Revive Line Type:
 https://commitfest.postgresql.org/action/patch_view?id=1154

This one's easy -- we're waiting on a decision on whether to use A,B,C
text representation.  Honestly, it seems a no-brainer to me that this is
what it should use; the other representation seems to have too many
problems.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] is a special cost for external sort?

2013-07-18 Thread Jeff Janes
On Thu, Jul 18, 2013 at 9:14 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 I found a slow query with large external sort. I expect, so external
 sort should be penalized. Is it?

It tries to, but it doesn't seem to be much good at it.  In
particular, I think it does a poor job of estimating the CPU cost of
an external sort relative to an in-memory sort of a slightly smaller
data set.  In my experience adding the single tuple that actually
pushes you over the work_mem limit costs about 3x in CPU.

It is one of those things I started looking into a few times, but
never got anywhere before getting distracted.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fatal error after starting postgres : sys identifiers must be different

2013-07-18 Thread Michael Paquier
On Fri, Jul 19, 2013 at 5:02 AM, Indrajit Roychoudhury
indrajit.roychoudh...@gmail.com wrote:
 Could you please let me know what does the error system identifiers are
 same mean? Below is the snapshot from one of the masters.
 I am setting up BDR as per the wiki
   http://wiki.postgresql.org/wiki/BDR_User_Guide#Initial_setup
 and source @
  git://git.postgresql.org/git/users/andresfreund/postgres.git

 irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$ ./postgres -D
 ~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin/data2/
 LOG:  bgworkers, connection: dbname=testdb2
 LOG:  option: dbname, val: testdb2
 LOG:  registering background worker: bdr apply: ubuntuirc2
 LOG:  loaded library bdr
 LOG:  database system was shut down at 2013-03-17 10:56:52 PDT
 LOG:  doing logical startup from 0/17B6410
 LOG:  starting up replication identifier with ckpt at 0/17B6410
 LOG:  autovacuum launcher started
 LOG:  starting background worker process bdr apply: ubuntuirc2
 LOG:  database system is ready to accept connections
 LOG:  bdr apply: ubuntuirc2 initialized on testdb2, remote dbname=testdb2
 replication=true fallback_application_name=bdr
 FATAL:  system identifiers must differ between the nodes
 DETAIL:  Both system identifiers are 5856368744762683487.
I am not the best specialist about logical replication, but as it
looks to be a requirement to have 2 nodes with different system
identifiers, you shouldn't link 1 node to another node whose data
folder has been created from the base backup of the former (for
example create the 2nd node based on a base backup of the 1st node).
The error returned here would mean so.

 LOG:  worker process: bdr apply: ubuntuirc2 (PID 16712) exited with exit
 code 1
 ^CLOG:  received fast shutdown request
 LOG:  aborting any active transactions
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down

 pgcontrol_data outputs different database system ids for the two nodes. So
 don't understand why it says identifiers are same.
Are you sure? Please re-ckeck.

 postgresql.conf content in one of the masters is like this-

 /
 shared_preload_libraries = 'bdr'
 bdr.connections = 'ubuntuirc2'
 bdr.ubuntuirc2.dsn = 'dbname=testdb2'
 /

 Two nodes are ubuntuirc and ubuntuirc2. Above is the output of the
 postgresql.conf from ubuntuirc.
If this behavior is confirmed, I think that this bug should be
reported directly to Andres and not this mailing list, just because
logical replication is not integrated into Postgres core as of now.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Andrew Gierth
Josh Berkus wrote:
 Hah, I didn't realize that our ordered aggregate syntax even *was* spec.

The spec defines agg(x order by y) only for array_agg and xmlagg; the
generalization to arbitrary other aggregates is our extension. (But
kind of obvious really.)

  Our implementation does heavily reuse the existing executor mechanics
  for ORDER BY in aggregates, and it also reuses a fair chunk of the
  parser code for it, but there are significant differences.

 Well, seems like it would work the same as

  agg_func(constx,coly,colz ORDER BY coly, colz)

 ... which means you could reuse a LOT of the internal plumbing.  Or am I
 missing something?

You missed the part above which said ...does heavily reuse... :-)

i.e. we are in fact reusing a lot of the internal plumbing.

 Also, what would a CREATE AGGREGATE and state function definition for
 custom WITHIN GROUP aggregates look like?

Now this is exactly the part we haven't nailed down yet and want ideas
for.

The problem is, given that the parser is looking at:

  foo(p1,p2,...) within group (order by q1,q2,...)

how do we best represent the possible matching functions in pg_proc
and pg_aggregate? Our partial solution so far does not allow
polymorphism to work properly, so we need a better way; I'm hoping for
some independent suggestions before I post my own ideas.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Josh Berkus

 The problem is, given that the parser is looking at:
 
   foo(p1,p2,...) within group (order by q1,q2,...)
 
 how do we best represent the possible matching functions in pg_proc
 and pg_aggregate? Our partial solution so far does not allow
 polymorphism to work properly, so we need a better way; I'm hoping for
 some independent suggestions before I post my own ideas.

Yeah, you'd need to extend VARIADIC somehow.  That is, I should be able
to define a function as:

percentile_state (
pctl float,
ordercols VARIADIC ANY )
returns VARIADIC ANY

... so that it can handle the sorting.  Another way to look at it would be:

percentile_state (
pctl float,
orderedset ANONYMOUS ROW )
returns ANONYMOUS ROW as ...

... because really, what you're handing the state function is an
anonymous row type constructed of the order by phrase.  Of course, then
we have to have some way to manipulate the anonymous row from within the
function; at the very least, an equality operator.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-18 Thread Greg Smith

On 7/9/13 12:09 AM, Amit Kapila wrote:

   I think the first thing to verify is whether the results posted can be 
validated in some other environment setup by another person.
   The testcase used is posted at below link:
   http://www.postgresql.org/message-id/51366323.8070...@vmware.com


That seems easy enough to do here, Heikki's test script is excellent. 
The latest patch Hari posted on July 2 has one hunk that doesn't apply 
anymore now.  Inside src/backend/utils/adt/pg_lzcompress.c the patch 
tries to change this code:


-   if (hent)
+   if (hentno != INVALID_ENTRY)

But that line looks like this now:

if (hent != INVALID_ENTRY_PTR)

Definitions of those:

#define INVALID_ENTRY   0
#define INVALID_ENTRY_PTR   (hist_entries[INVALID_ENTRY])

I'm not sure if different error handling may be needed here now due the 
commit that changed this, or if the patch wasn't referring to the right 
type of error originally.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Tatsuo Ishii
 Hello Tatsuo
 
 I think current measurement method will give enough confusion if it's
 not provided with detailed explanation. Could you please provide doc
 updatation?
 
 Please find a v17 proposition with an updated and extended
 documentation, focussed on clarifying the lag measure and its
 implications, and taking into account the recent discussion on the
 list with you  Greg.

Thanks!

 However I'm not a native English speaker, if you find that some part
 are not clear enough, please tell me what can be improved further.

I'm not a native English speaker either... Greg, could you please
review the document?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New statistics for WAL buffer dirty writes

2013-07-18 Thread Satoshi Nagayasu

Will revise and re-resubmit for the next CF.

Regards,

2013/07/19 1:06, Alvaro Herrera wrote:


What happened to this patch?  We were waiting on an updated version from
you.


Satoshi Nagayasu wrote:

(2012/12/10 3:06), Tomas Vondra wrote:

On 29.10.2012 04:58, Satoshi Nagayasu wrote:

2012/10/24 1:12, Alvaro Herrera wrote:

Satoshi Nagayasu escribi�:


With this patch, walwriter process and each backend process
would sum up dirty writes, and send it to the stat collector.
So, the value could be saved in the stat file, and could be
kept on restarting.

The statistics could be retreive with using
pg_stat_get_xlog_dirty_writes() function, and could be reset
with calling pg_stat_reset_shared('walwriter').

Now, I have one concern.

The reset time could be captured in globalStats.stat_reset_timestamp,
but this value is the same with the bgwriter one.

So, once pg_stat_reset_shared('walwriter') is called,
stats_reset column in pg_stat_bgwriter does represent
the reset time for walwriter, not for bgwriter.

How should we handle this?  Should we split this value?
And should we have new system view for walwriter?


I think the answer to the two last questions is yes.  It doesn't seem to
make sense, to me, to have a single reset timings for what are
effectively two separate things.

Please submit an updated patch to next CF.  I'm marking this one
returned with feedback.  Thanks.



I attached the latest one, which splits the reset_time
for bgwriter and walwriter, and provides new system view,
called pg_stat_walwriter, to show the dirty write counter
and the reset time.


I've done a quick review of the v4 patch:


Thanks for the review, and sorry for my delayed response.


1) applies fine on HEAD, compiles fine

2) make installcheck fails because of a difference in the 'rules'
test suite (there's a new view pg_stat_walwriter - see the
attached patch for a fixed version or expected/rules.out)


Ah, I forgot about the regression test. I will fix it. Thanks.


3) I do agree with Alvaro that using the same struct for two separate
components (bgwriter and walwriter) seems a bit awkward. For example
you need to have two separate stat_reset fields, the reset code
becomes much more verbose (because you need to list individual
fields) etc.

So I'd vote to either split this into two structures or keeping it
as a single structure (although with two views on top of it).


Ok, I will split it into two structs, PgStat_BgWriterGlobalStats and
PgStat_WalWriterGlobalStats, and will modify PgStat_GlobalStats to
hold those two structs in the stat collector.


4) Are there any other fields that might be interesting? Right now
there's just dirty_writes but I guess there are other values. E.g.
how much data was actually written etc.?


AFAIK, I think those numbers can be obtained by calling
pg_current_xlog_insert_location() or pg_current_xlog_location(),
but if we need it, I will add it.

Regards,






--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hardware donation

2013-07-18 Thread Greg Smith

On 7/10/13 12:53 PM, Benedikt Grundmann wrote:

The server will probably be most interesting for the disks in it.  That
is where we spend the largest amount of time optimizing (for sequential
scan speed in particular):
22x600GB disks in a Raid6+0 (Raid0 of 2x 10disk raid 6 arrays) + 2 spare
disks. Overall size 8.7 TB in that configuration.


What is the RAID controller used in the server?  That doesn't impact the 
donation, I'm just trying to fit this one into my goals for finding 
useful community performance testing equipment.


There are a good number of systems floating around the community with HP 
controllers--I have even one myself now--but we could use more LSI Logic 
and Adaptec based systems.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Greg Smith

On 7/18/13 6:45 PM, Tatsuo Ishii wrote:

I'm not a native English speaker either... Greg, could you please
review the document?


Yes, I already took at look at it briefly.  The updates move in the 
right direction, but I can edit them usefully before commit.  I'll have 
that done by tomorrow and send out a new version.  I'm hopeful that v18 
will finally be the one that everyone likes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regex pattern with shorter back reference does NOT work as expected

2013-07-18 Thread Tom Lane
Jeevan Chalke jeevan.cha...@enterprisedb.com writes:
 Following example does not work as expected:
 
 -- Should return TRUE but returning FALSE
 SELECT 'Programmer' ~ '(\w).*?\1' as t;

For the archives' sake --- I've filed a report about this with the Tcl
crew.  They seem to have moved their bugtracker recently; it's no longer
at sourceforge but in their own ticket system.  This bug is at

https://core.tcl.tk/tcl/tktview/6585b21ca8fa6f3678d442b97241fdd43dba2ec0

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-07-18 Thread Karol Trzcionka
Current head 4cbe3ac3e86790d05c569de4585e5075a62a9b41 - patch applies
correct (only change needed in parallel_schedule).
However it fails on own regression tests (other tests pass).
Regards,
Karol


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ODBC] getting rid of SnapshotNow

2013-07-18 Thread Inoue, Hiroshi

(2013/07/18 23:54), Robert Haas wrote:

On Thu, Jul 18, 2013 at 10:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Robert Haas robertmh...@gmail.com writes:

1. snapshot-error-v1.patch introduces a new special snapshot, called
SnapshotError.  In the cases where we set SnapshotNow as a sort of
default snapshot, this patch changes the code to use SnapshotError
instead.  This affects scan-xs_snapshot in genam.c and
estate-es_snapshot in execUtils.c.  This passes make check-world, so
apparently there is no code in the core distribution that does this.
However, this is safer for third-party code, which will ERROR instead
of seg faulting.  The alternative approach would be to use
InvalidSnapshot, which I think would be OK too if people dislike this
approach.


FWIW, I think using InvalidSnapshot would be preferable to introducing
a new concept for what's pretty much the same thing.


Andres voted the other way on the previous thread.  I'll wait and see
if there are any other opinions.  The SnapshotError concept seemed
attractive to me initially, but I'm not as excited about it after
seeing how it turned out, so maybe it's best to do it as you suggest.


With that done, the only remaining uses of SnapshotNow in our code
base will be in currtid_byreloid() and currtid_byrelname().  So far no
one on this list has been able to understand clearly what the purpose
of those functions is, so I'm copying this email to pgsql-odbc in case
someone there can provide more insight.


I had the idea they were used for a client-side implementation of WHERE
CURRENT OF.  Perhaps that's dead code and could be removed entirely?


It's been reported that ODBC still uses them.


Though PostgreSQL's TID is similar to Orale's ROWID, it is transient
and changed after update operations unfortunately. I implemented
the currtid_xx functions to supplement the difference. For example

currtid(relname, original tid)

(hopefully) returns the current tid of the original row when it is
updated.

regards,
Hiroshi Inoue



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compiler warning in UtfToLocal and LocalToUtf (conv.c)

2013-07-18 Thread Karol Trzcionka
Hello,
in the current master head (4cbe3ac3e86790d05c569de4585e5075a62a9b41),
I've noticed the compiler warnings in src/backend/utils/mb/conv.c
conv.c: In function ‘UtfToLocal’:
conv.c:252:23: error: ‘iutf’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
...
conv.c: In function ‘LocalToUtf’:
conv.c:301:23: error: ‘iiso’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
...
The compiler doesn't know that the 'l' may varies between 1 and 4. Hot
fix may be:
1. preinitialize it
2. delete last if statement (change else-if to else)
3. change it to switch-case and set default behaviour
Regards,
Karol


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Greg Smith

On 7/18/13 4:02 PM, Alvaro Herrera wrote:


I think we should just put the config directives of ALTER SYSTEM into a
file, not dir, alongside postgresql.conf; I would suggest
postgresql.auto.conf.  This file is parsed automatically after
postgresql.conf, without requiring an include directive in
postgresql.conf.


It is possible to build ALTER SYSTEM SET this way.  One of the goals of 
the implementation style used here wasn't just to accomplish that narrow 
feature though.  We keep running into situations where a standard, 
directory based configuration system would make things easier.  Each 
time that happens, someone comes up with another way to avoid doing it. 
 I think that approach contributes to stagnation in the terrible status 
quo of PostgreSQL configuration management.


I thought this was a good spot to try and re-draw this line because I 
don't want just one program that is able to create new configuration 
entries easily.  I want to see a whole universe of them.  ALTER SYSTEM 
SET, tuning helpers, replication helpers, logging helpers, vacuum 
schedulers.  All of them *could* just dump a simple file into a config 
directory with code anyone can write.  And having ALTER SYSTEM SET do 
that provides a strong precedent for how it can be done.  (I'd like to 
see initdb do that instead of hacking the system postgresql.conf as if 
sed-style edits were still the new hotness, but that's a future change)


My claim, and that's one informed by writing pgtune, is that by far the 
hardest part of writing a configuration addition tool is parsing a 
postgresql.conf file.  The expectation right now is that all changes 
must happen there.  Creating a new snippet of configuration settings is 
easy, but no tools know where to put one right now.  Instead we just 
keep coming up with single, hard-coded file names that people have to 
know in order to manage their installations.



This seems the simplest way; config tools such as Puppet know they
always need to consider the postgresql.auto.conf file.


Like this idea.  What administrators really want, whether they realize 
it or not, is to point Puppet at a configuration directory.  Then the 
problem of what are the config files in the new version of Postgres 
happens once more and then it's over.  Exactly what the files in there 
are named should be completely under control of the administrator.


We're never going to reach that unless we lead by example though.  The 
database's configuration pushes people toward using a small number of 
files with magic names--postgresql.conf, recovery.conf, and now 
postgresql.auto.conf in your proposal.  Meanwhile, all sensible 
UNIX-style projects with complicated configurations in text files has 
moved toward a configuration directory full of them.  Here are some 
directories on the last RHEL6 system I was editing configuration on this 
week:


/etc/httpd/conf.d/
/etc/munin/conf.d/
/etc/ld.so.conf.d/
/etc/munin/conf.d/
/etc/dovecot/conf.d/
/etc/yum/pluginconf.d/

Some of them didn't get the memo that the right standard name is conf.d 
now, but they're the minority.


It's fine to have a postgresql.conf file that you *can* make all your 
changes to, for people who want to stay in the old monolithic approach. 
 But if there were also a conf.d directory under there, many classes of 
administrator would breath a sign of relief.  With all the precedents 
people may have already ran into, with the right naming can be made 
discoverable and familiar to a lot of administrators.


Telling them instead that there's this new postgresql.auto.conf file 
that suddenly they have to worry about--I'd say don't even bother if 
that's how you want to do it.  That's making the problem I've been 
trying to simplify for five years now even worse.



I think the whole business of parsing the file, and then verifying
whether the auto file has been parsed, is nonsensical and should be
removed from the patch.


That was just trying to keep people from screwing up their configuration 
while they get used to things.  That and some of the other sanity 
checking is not necessary, it was just trying to make the transition to 
the new configuration approach less error-prone.  I don't think anyone 
would disagree that the current patch is doing enough of that error 
checking work that the error checking itself is the most likely thing to 
break.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fatal error after starting postgres : sys identifiers must be different

2013-07-18 Thread Andres Freund
Hi!

On 2013-07-19 07:31:07 +0900, Michael Paquier wrote:
 If this behavior is confirmed, I think that this bug should be
 reported directly to Andres and not this mailing list, just because
 logical replication is not integrated into Postgres core as of now.

I think I agree, although I don't know where to report it, but to me
personally, so far. Hackers seems to be the wrong crowd for it, given
most of the people on it haven't even heard of bdr, much less read its
code.
We're definitely planning to propose it for community inclusion in some
form, but there are several rather large preliminary steps (like getting
changeset extraction in) that need to be done first.

Not sure what's best.

On 2013-07-19 07:31:07 +0900, Michael Paquier wrote:
 On Fri, Jul 19, 2013 at 5:02 AM, Indrajit Roychoudhury
 indrajit.roychoudh...@gmail.com wrote:
  Could you please let me know what does the error system identifiers are
  same mean? Below is the snapshot from one of the masters.
  I am setting up BDR as per the wiki
http://wiki.postgresql.org/wiki/BDR_User_Guide#Initial_setup
  and source @
   git://git.postgresql.org/git/users/andresfreund/postgres.git
 
  irc1@ubuntuirc:~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin$ ./postgres -D
  ~/bdr2/postgres-d6ed89e/postgres-bdr-bin/bin/data2/
  LOG:  bgworkers, connection: dbname=testdb2
  LOG:  option: dbname, val: testdb2
  LOG:  registering background worker: bdr apply: ubuntuirc2
  LOG:  loaded library bdr
  LOG:  database system was shut down at 2013-03-17 10:56:52 PDT
  LOG:  doing logical startup from 0/17B6410
  LOG:  starting up replication identifier with ckpt at 0/17B6410
  LOG:  autovacuum launcher started
  LOG:  starting background worker process bdr apply: ubuntuirc2
  LOG:  database system is ready to accept connections
  LOG:  bdr apply: ubuntuirc2 initialized on testdb2, remote dbname=testdb2
  replication=true fallback_application_name=bdr
  FATAL:  system identifiers must differ between the nodes
  DETAIL:  Both system identifiers are 5856368744762683487.

 I am not the best specialist about logical replication, but as it
 looks to be a requirement to have 2 nodes with different system
 identifiers, you shouldn't link 1 node to another node whose data
 folder has been created from the base backup of the former (for
 example create the 2nd node based on a base backup of the 1st node).
 The error returned here would mean so.

Yes, that's correct.

  LOG:  worker process: bdr apply: ubuntuirc2 (PID 16712) exited with exit
  code 1
  ^CLOG:  received fast shutdown request
  LOG:  aborting any active transactions
  LOG:  autovacuum launcher shutting down
  LOG:  shutting down
 
  pgcontrol_data outputs different database system ids for the two nodes. So
  don't understand why it says identifiers are same.
 Are you sure? Please re-ckeck.
 
  postgresql.conf content in one of the masters is like this-
 
  /
  shared_preload_libraries = 'bdr'
  bdr.connections = 'ubuntuirc2'
  bdr.ubuntuirc2.dsn = 'dbname=testdb2'
  /
 
  Two nodes are ubuntuirc and ubuntuirc2. Above is the output of the
  postgresql.conf from ubuntuirc.

The problem seems to be that your dsn doesn't include a hostname but
only a database name. So, what probably happens is both your hosts try
to connect to themselves since connecting to the local host is the
default when no hostname is specified. Which is one of the primary
reasons the requirement for differing system identifiers exist...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning in UtfToLocal and LocalToUtf (conv.c)

2013-07-18 Thread Tom Lane
Karol Trzcionka karl...@gmail.com writes:
 in the current master head (4cbe3ac3e86790d05c569de4585e5075a62a9b41),
 I've noticed the compiler warnings in src/backend/utils/mb/conv.c

Hm, what compiler version are you using?  I've never seen such a warning
(and that code hasn't changed in some time).

I agree the code could stand to be fixed, but I'm just curious as to
why this is only being seen now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning in UtfToLocal and LocalToUtf (conv.c)

2013-07-18 Thread Karol Trzcionka
W dniu 19.07.2013 02:42, Tom Lane pisze:
 Hm, what compiler version are you using? I've never seen such a
 warning (and that code hasn't changed in some time). 
gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC)
Regards,
Karol


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Foreign Tables as Partitions

2013-07-18 Thread David Fetter
Folks,

Please find attached a PoC patch to implement $subject.

So far, with a lot of help from Andrew Gierth, I've roughed out an
implementation which allows you to ALTER FOREIGN TABLE so it inherits
a local table.

TBD: CREATE FOREIGN TABLE ... INHERITS ..., docs, regression testing,
etc., etc.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/doc/src/sgml/ref/alter_foreign_table.sgml 
b/doc/src/sgml/ref/alter_foreign_table.sgml
index 723aa07..984b2e2 100644
--- a/doc/src/sgml/ref/alter_foreign_table.sgml
+++ b/doc/src/sgml/ref/alter_foreign_table.sgml
@@ -42,6 +42,7 @@ ALTER FOREIGN TABLE [ IF EXISTS ] replaceable 
class=PARAMETERname/replaceab
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable 
SET ( replaceable class=PARAMETERattribute_option/replaceable = 
replaceable class=PARAMETERvalue/replaceable [, ... ] )
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable 
RESET ( replaceable class=PARAMETERattribute_option/replaceable [, ... ] )
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable 
OPTIONS ( [ ADD | SET | DROP ] replaceable 
class=PARAMETERoption/replaceable ['replaceable 
class=PARAMETERvalue/replaceable'] [, ... ])
+INHERIT replaceable class=PARAMETERtable_name/replaceable[, ... ]
 OWNER TO replaceable class=PARAMETERnew_owner/replaceable
 OPTIONS ( [ ADD | SET | DROP ] replaceable 
class=PARAMETERoption/replaceable ['replaceable 
class=PARAMETERvalue/replaceable'] [, ... ])
 /synopsis
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cb87d90..a434f39 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3123,7 +3123,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
pass = AT_PASS_MISC;
break;
case AT_AddInherit: /* INHERIT */
-   ATSimplePermissions(rel, ATT_TABLE);
+   ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
/* This command never recurses */
ATPrepAddInherit(rel);
pass = AT_PASS_MISC;
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index e249628..eda21fd 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1342,6 +1342,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
 */
childrte = copyObject(rte);
childrte-relid = childOID;
+   childrte-relkind = newrelation-rd_rel-relkind;
childrte-inh = false;
childrte-requiredPerms = 0;
parse-rtable = lappend(parse-rtable, childrte);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] confusing typedefs in jsonfuncs.c

2013-07-18 Thread Peter Eisentraut
The new jsonfuncs.c has some confusing typedef scheme.  For example, it
has a bunch of definitions like this:

typedef struct getState
{
...
} getState, *GetState;

So GetState is a pointer to getState.  I have never seen that kind of
convention before.

This then leads to code like

GetStatestate;

state = palloc0(sizeof(getState));

which has useless mental overhead.

But the fact that GetState is really a pointer isn't hidden at all,
because state is then derefenced with - or cast from or to void*.  So
whatever abstraction might have been intended isn't there at all.  And
all of this is an intra-file interface anyway.

And to make this even more confusing, other types such as ColumnIOData
and JsonLexContext are not pointers but structs directly.

I think a more typical PostgreSQL code convention is to use capitalized
camelcase for structs, and use explicit pointers for pointers.  I have
attached a patch that cleans this up, in my opinion.

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a231736..ecfe063 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -51,11 +51,11 @@
 static inline void json_lex(JsonLexContext *lex);
 static inline void json_lex_string(JsonLexContext *lex);
 static inline void json_lex_number(JsonLexContext *lex, char *s);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction sem);
+static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
 static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static void report_invalid_token(JsonLexContext *lex);
 static int	report_json_context(JsonLexContext *lex);
@@ -70,12 +70,11 @@ static void array_to_json_internal(Datum array, StringInfo result,
 	   bool use_line_feeds);
 
 /* the null action object used for pure validation */
-static jsonSemAction nullSemAction =
+static JsonSemAction nullSemAction =
 {
 	NULL, NULL, NULL, NULL, NULL,
 	NULL, NULL, NULL, NULL, NULL
 };
-static JsonSemAction NullSemAction = nullSemAction;
 
 /* Recursive Descent parser support routines */
 
@@ -170,7 +169,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
 
 	/* validate it */
 	lex = makeJsonLexContext(result, false);
-	pg_parse_json(lex, NullSemAction);
+	pg_parse_json(lex, nullSemAction);
 
 	/* Internal representation is the same as text, for now */
 	PG_RETURN_TEXT_P(result);
@@ -222,7 +221,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
 
 	/* Validate it. */
 	lex = makeJsonLexContext(result, false);
-	pg_parse_json(lex, NullSemAction);
+	pg_parse_json(lex, nullSemAction);
 
 	PG_RETURN_TEXT_P(result);
 }
@@ -260,7 +259,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
  * pointer to a state object to be passed to those routines.
  */
 void
-pg_parse_json(JsonLexContext *lex, JsonSemAction sem)
+pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 {
 	JsonTokenType tok;
 
@@ -296,7 +295,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
  *	  - object field
  */
 static inline void
-parse_scalar(JsonLexContext *lex, JsonSemAction sem)
+parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem-scalar;
@@ -332,7 +331,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
 }
 
 static void
-parse_object_field(JsonLexContext *lex, JsonSemAction sem)
+parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
 	 * an object field is fieldname : value where value can be a scalar,
@@ -380,7 +379,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
 }
 
 static void
-parse_object(JsonLexContext *lex, JsonSemAction sem)
+parse_object(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
 	 * an object is a possibly empty sequence of object fields, separated by
@@ -428,7 +427,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
 }
 
 static void
-parse_array_element(JsonLexContext *lex, JsonSemAction sem)
+parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 {
 	json_aelem_action astart = sem-array_element_start;
 	json_aelem_action aend = sem-array_element_end;
@@ -459,7 +458,7 @@ static void array_to_json_internal(Datum array, StringInfo result,
 }
 
 static void
-parse_array(JsonLexContext *lex, JsonSemAction sem)
+parse_array(JsonLexContext *lex, JsonSemAction *sem)
 {
 	

Re: [HACKERS] confusing typedefs in jsonfuncs.c

2013-07-18 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 The new jsonfuncs.c has some confusing typedef scheme.  For example, it
 has a bunch of definitions like this:

 typedef struct getState
 {
 ...
 } getState, *GetState;

 So GetState is a pointer to getState.  I have never seen that kind of
 convention before.

Yeah, this is randomly different from everywhere else in PG.  The more
usual convention if you want typedefs for both the struct and the
pointer type is that the pointer type is FooBar and the struct type is
FooBarData.  This way seems seriously typo-prone.

 I think a more typical PostgreSQL code convention is to use capitalized
 camelcase for structs, and use explicit pointers for pointers.  I have
 attached a patch that cleans this up, in my opinion.

That way is fine with me too.

If you commit this, please hit 9.3 as well, so that we don't have
back-patching issues.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-18 Thread Stephen Frost
Greg,

* Greg Smith (g...@2ndquadrant.com) wrote:
 That seems easy enough to do here, Heikki's test script is
 excellent. The latest patch Hari posted on July 2 has one hunk that
 doesn't apply anymore now.  Inside
 src/backend/utils/adt/pg_lzcompress.c the patch tries to change this
 code:
 
 -   if (hent)
 +   if (hentno != INVALID_ENTRY)

hentno certainly doesn't make much sense here- it's only used at the top
of the function to keep things a bit cleaner when extracting the address
into hent from hist_entries:

hentno = hstart[pglz_hist_idx(input, end, mask)];
hent = hist_entries[hentno];

Indeed, as the referenced conditional is inside the following loop:

while (hent != INVALID_ENTRY_PTR)

and, since hentno == 0 implies hent == INVALID_ENTRY_PTR, the
conditional would never fail (which is what was happening prior to
Heikki commiting the fix for this, changing the conditional to what is
below).

 But that line looks like this now:
 
 if (hent != INVALID_ENTRY_PTR)

Right, this is correct- it's useful to check the new value for hent
after it's been updated by:

hent = hent-next;

and see if it's possible to drop out early.

 I'm not sure if different error handling may be needed here now due
 the commit that changed this, or if the patch wasn't referring to
 the right type of error originally.

I've not looked at anything regarding this beyond this email, but I'm
pretty confident that the change Heikki committed was the correct one.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Noah Misch
On Thu, Jul 18, 2013 at 10:33:15PM +, Andrew Gierth wrote:
 Josh Berkus wrote:
  Well, seems like it would work the same as
 
   agg_func(constx,coly,colz ORDER BY coly, colz)

I'd try transforming WITHIN GROUP into the above during parse analysis.  The
default would be the transformation for hypothetical set functions:

agg(x,y,z) WITHIN GROUP (ORDER BY a,b,c) - agg(x,y,z ORDER BY a,b,c)

Add a CREATE AGGREGATE option, say SQL_INVERSE_DISTRIBUTION_FUNCTION =
{true|false} or SQLIDF, that chooses the IDF transformation:

agg(x) WITHIN GROUP (ORDER BY y) - agg(x, y ORDER BY y)

Then there's perhaps no new core aggregation or function candidacy machinery.
(I don't know whether VARIADIC transition functions work today, but that would
become an orthogonal project.)  Compare how we handle standard interval typmod
syntax; only the parser and deparser know about it.

Atri's description upthread sounded pretty similar to that.

  Also, what would a CREATE AGGREGATE and state function definition for
  custom WITHIN GROUP aggregates look like?
 
 Now this is exactly the part we haven't nailed down yet and want ideas
 for.

PERCENTILE_DISC would be declared as (float8, anyelement) with that SQLIDF
option.  To diagnose nonsensical calls made through nonstandard syntax, it
could dig into its AggState to verify that its second argument is equal() to
its first ORDER BY expression.

There would be a question of whether to accept the WITHIN GROUP syntax for any
aggregate or just for those for which the standard indicates it.  Then follows
the question of when to deparse as WITHIN GROUP and when to deparse as the
internal syntax.  I'd lean toward accepting WITHIN GROUP for any aggregate but
deparsing that way only SQLIDF aggregates and aggregates with names matching
the standard hypothetical set function names.  Or you could add a second
CREATE AGGREGATE option requesting hypothetical-set-function deparse style.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 (I don't know whether VARIADIC transition functions work today, but that would
 become an orthogonal project.)

Coincidentally enough, some Salesforce folk were asking me about allowing
VARIADIC aggregates just a few days ago.  I experimented enough to find
out that if you make an array-accepting transition function, and then
force the aggregate's pg_proc entry to look like it's variadic (by
manually setting provariadic and some other fields), then everything
seems to Just Work: the parser and executor are both fine with it.
So I think all that's needed here is to add some syntax support to
CREATE AGGREGATE, and probably make some tweaks in pg_dump.  I was
planning to go work on that sometime soon.

Having said that, though, what Andrew seemed to want was VARIADIC ANY,
which is a *completely* different kettle of fish, since the actual
parameters can't be converted to an array.  I'm not sure if that's
as easy to support.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Differences in WHERE clause of SELECT

2013-07-18 Thread Prabakaran, Vaishnavi
Hi,

Thanks for your responses.

The specific use case which I am interested in is 

 Numeric LIKE Pattern_string .

I'm willing to attempt a patch to support the specific use case above by adding 
implicit casts, without modifying the entire casting rules.

Is this something that is likely to be included in the code ? 

Thanks  Regards,
Vaishnavi

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kevin Grittner
Sent: Wednesday, 17 July 2013 6:23 AM
To: Robert Haas; Merlin Moncure
Cc: Tom Lane; Josh Berkus; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Differences in WHERE clause of SELECT

Robert Haas robertmh...@gmail.com wrote:

 We can certainly continue to play whack-a-mole and dream up a new 
 solution every time a really intolerable variant of this problem comes 
 up.  But that doesn't seem good to me.  It means that every case 
 behaves a little different from every other case, and the whole thing 
 is kinda arcane and hard to understand, even for hackers.

If you're building up a list of things that generate errors in PostgreSQL but 
not other DBMS products, make sure you have this:

test=# create table t(d date);
CREATE TABLE
test=# insert into t values (NULL);
INSERT 0 1
test=# insert into t values (COALESCE(NULL, NULL));
ERROR:  column d is of type date but expression is of type text LINE 1: 
insert into t values (COALESCE(NULL, NULL));
  ^
HINT:  You will need to rewrite or cast the expression.

From a user perspective, it's hard to explain why COALESCE(NULL,
NULL) fails in a location that a bare NULL works.  From the perspective of 
those working on the code, and looking at the problem from the inside out, it 
seems sane; but that's the only perspective from which it does.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Differences in WHERE clause of SELECT

2013-07-18 Thread Tom Lane
Prabakaran, Vaishnavi vaishna...@fast.au.fujitsu.com writes:
 The specific use case which I am interested in is 

  Numeric LIKE Pattern_string .

 I'm willing to attempt a patch to support the specific use case above by 
 adding implicit casts, without modifying the entire casting rules.

 Is this something that is likely to be included in the code ? 

No, especially not if you do it by adding implicit casts.  That would
have unfortunate side-effects in all sorts of contexts.

If you're dead set on having this sort of behavior, you can do it today
with a custom operator, for instance:

regression=# select 1.4 like 'foo';
ERROR:  operator does not exist: numeric ~~ unknown
LINE 1: select 1.4 like 'foo';
   ^
HINT:  No operator matches the given name and argument type(s). You might need 
to add explicit type casts.
regression=# create function numericlike(numeric, text) returns bool as
regression-# 'select $1::text like $2' language sql;
CREATE FUNCTION
regression=# create operator ~~ (leftarg = numeric, rightarg = text,
regression(# procedure = numericlike);
CREATE OPERATOR
regression=# select 1.4 like 'foo';
 ?column? 
--
 f
(1 row)

I'd suggest doing that rather than making changes that are likely to
have side-effects on behavior entirely unrelated to LIKE.  In addition,
you won't have to sell the community on whether this behavior is
actually a good idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] AGG_PLAIN thinks sorts are free

2013-07-18 Thread Jeff Janes
AGG_PLAIN sometimes does sorts, but it thinks they are free.  Also, under
explain analyze it does not explicitly report whether the sort was external
or not, nor report the disk or memory usage, the way other sorts do.  I
don't know if those two things are related or not.

This behavior seems to be ancient, at least back to 8.4.

Does someone more familiar with this part of the code know if this is a
simple oversight or a fundamental design issue?

Here is a test case, in which adding a distinct increases the run time
500% but doesn't change the estimate at all:

create table foo as select (random()*100)::int as val from
generate_series(1,2000);

analyze foo;


explain (analyze,buffers) select count(distinct val) from foo;
   QUERY PLAN

-
 Aggregate  (cost=338497.20..338497.21 rows=1 width=4) (actual
time=28185.597..28185.598 rows=1 loops=1)
   Buffers: shared hit=192 read=88304, temp read=112326 written=112326
   I/O Timings: read=200.810
   -  Seq Scan on foo  (cost=0.00..288496.96 rows=2096 width=4)
(actual time=0.040..2192.281 rows=2000 loops=1)
 Buffers: shared hit=192 read=88304
 I/O Timings: read=200.810
 Total runtime: 28185.628 ms

explain (analyze,buffers) select count(val) from foo;
   QUERY PLAN

-
 Aggregate  (cost=338497.20..338497.21 rows=1 width=4) (actual
time=4230.892..4230.892 rows=1 loops=1)
   Buffers: shared hit=224 read=88272
   I/O Timings: read=145.003
   -  Seq Scan on foo  (cost=0.00..288496.96 rows=2096 width=4)
(actual time=0.098..2002.396 rows=2000 loops=1)
 Buffers: shared hit=224 read=88272
 I/O Timings: read=145.003
 Total runtime: 4230.948 ms

Cheers,

Jeff


Re: [HACKERS] [v9.4] row level security

2013-07-18 Thread Stephen Frost
* Greg Smith (g...@2ndquadrant.com) wrote:
 On 7/18/13 7:57 PM, Karol Trzcionka wrote:
 Current head 4cbe3ac3e86790d05c569de4585e5075a62a9b41 - patch applies
 correct (only change needed in parallel_schedule).
 However it fails on own regression tests (other tests pass).
 
 I got a rejected hunk in src/backend/nodes/nodeFuncs.c as well as
 that parallel_schedule issue.  Maybe you didn't get the nodeFuncs
 change but didn't notice that?  That might explain why the tests
 didn't work for you either.

The nodeFuncs.c hunk seems likely to have been impacted by the patch I
committed today (WITH CHECK OPTION), so I doubt that was the issue.

 Attached is an updated patch where I tried to only fix the two small
 hunks of bit rot.  I get All 140 tests passed here, on a Mac no
 less.

Thanks for updating the patch, I ran into the failed hunks too and
expected to have to deal with them. :)

 I did a brief code scan through the patch just to get a feel for how
 the feature is put together, and what you'd need to know for a
 deeper review.  

That would be extremely helpful..  Wasn't there a wiki page about this
feature which might also help?  Bigger question- is it correct for the
latest version of the patch?

 (I'm trying to get customer time approved to work on
 this a lot more)  The code was easier to follow than I expected.
 The way it completely avoids even getting into the security label
 integration yet seems like a successful design partitioning.  This
 isn't nearly as scary as the SEPostgres patches.  There are some
 useful looking utility functions that dump information about what's
 going on too.
 
 The bulk of the complexity is how the feature modifies query nodes
 to restrict what rows come through them.  Some familiarity with that
 part of the code is what you'd need to take on reviewing this in
 detail. That and a week of time to spend trudging through it.  If
 anyone is looking for an educational challenge on query execution,
 marching through all of these changes to validate they work as
 expected would do that.

I'm hoping to find time this weekend to look into this patch myself, but
the weekend is also filling up with other activities, so we'll see.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] AGG_PLAIN thinks sorts are free

2013-07-18 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 AGG_PLAIN sometimes does sorts, but it thinks they are free.  Also, under
 explain analyze it does not explicitly report whether the sort was external
 or not, nor report the disk or memory usage, the way other sorts do.  I
 don't know if those two things are related or not.

DISTINCT (and also ORDER BY) properties of aggregates are implemented
at runtime; the planner doesn't really do anything about them, except
suppress the choice it might otherwise make of using hashed aggregation.
Since the behavior is entirely local to the Agg plan node, it's also
not visible to the EXPLAIN ANALYZE machinery.

Arguably we should have the planner add on some cost factor for such
aggregates, but that would have no effect whatever on the current level
of plan, and could only be useful if this was a subquery whose cost
would affect choices in an outer query level.  Which is a case that's
pretty few and far between AFAIK (do you have a real-world example where
it matters?).  So it's something that hasn't gotten to the top of
anybody's to-do list.

An arguably more useful thing to do would be to integrate this behavior
into the planner more completely, so that (for instance) if only one
aggregate had ORDER BY then we would make the underlying query produce
that order instead of implementing a sort locally in the Agg node.
That hasn't risen to the top of the to-do list either, as yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-07-18 Thread Greg Smith

On 7/18/13 11:03 PM, Stephen Frost wrote:

Wasn't there a wiki page about this
feature which might also help?  Bigger question- is it correct for the
latest version of the patch?


https://wiki.postgresql.org/wiki/RLS has accumulated a lot of older 
debris that may or may not be useful here.


This useful piece was just presented at PGCon: 
http://www.pgcon.org/2013/schedule/attachments/273_PGcon2013-kaigai-row-level-security.pdf

That is very up to date intro to the big picture issues.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgindent behavior we could do without

2013-07-18 Thread Noah Misch
On Thu, Jul 18, 2013 at 12:27:21AM -0400, Tom Lane wrote:
 It's always annoyed me that pgindent insists on adjusting vertical
 whitespace around #else and related commands.  This has, for example,
 rendered src/include/storage/barrier.h nigh illegible: you get things
 like
 
 /*
  * lwsync orders loads with respect to each other, and similarly with stores.
  * But a load can be performed before a subsequent store, so sync must be used
  * for a full memory barrier.
  */
 #define pg_memory_barrier() __asm__ __volatile__ (sync : : : memory)
 #define pg_read_barrier()   __asm__ __volatile__ (lwsync : : : memory)
 #define pg_write_barrier()  __asm__ __volatile__ (lwsync : : : memory)
 #elif defined(__alpha) || defined(__alpha__)/* Alpha */
 
 which makes it look like this block of code has something to do with
 Alpha.

Agreed.  I've similarly disliked how pgindent adds a blank line between an #if
and a multi-line comment, like at the top of get_restricted_token().

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgindent behavior we could do without

2013-07-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Thu, Jul 18, 2013 at 12:27:21AM -0400, Tom Lane wrote:
 It's always annoyed me that pgindent insists on adjusting vertical
 whitespace around #else and related commands.  This has, for example,
 rendered src/include/storage/barrier.h nigh illegible: you get things
 like

 Agreed.  I've similarly disliked how pgindent adds a blank line between an #if
 and a multi-line comment, like at the top of get_restricted_token().

AFAICT it forces a blank line before a multiline comment in most
contexts; #if isn't special (though it does seem to avoid doing that
just after a left brace).  I too don't much care for that behavior,
although it's not as detrimental to readability as removing blank lines
can be.

In general, pgindent isn't nearly as good about managing vertical
whitespace as it is about horizontal spacing ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Josh Berkus
Greg,

 I thought this was a good spot to try and re-draw this line because I
 don't want just one program that is able to create new configuration
 entries easily.  I want to see a whole universe of them.  ALTER SYSTEM
 SET, tuning helpers, replication helpers, logging helpers, vacuum
 schedulers.  All of them *could* just dump a simple file into a config
 directory with code anyone can write.  And having ALTER SYSTEM SET do
 that provides a strong precedent for how it can be done.  (I'd like to
 see initdb do that instead of hacking the system postgresql.conf as if
 sed-style edits were still the new hotness, but that's a future change)

Thank you.  I wanted to say this, but I couldn't find a way to express it.

 Some of them didn't get the memo that the right standard name is conf.d
 now, but they're the minority.

Apparently we didn't get the memo either.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of SnapshotNow

2013-07-18 Thread Noah Misch
On Thu, Jul 18, 2013 at 08:46:48AM -0400, Robert Haas wrote:
 1. snapshot-error-v1.patch introduces a new special snapshot, called
 SnapshotError.  In the cases where we set SnapshotNow as a sort of
 default snapshot, this patch changes the code to use SnapshotError
 instead.  This affects scan-xs_snapshot in genam.c and
 estate-es_snapshot in execUtils.c.  This passes make check-world, so
 apparently there is no code in the core distribution that does this.
 However, this is safer for third-party code, which will ERROR instead
 of seg faulting.  The alternative approach would be to use
 InvalidSnapshot, which I think would be OK too if people dislike this
 approach.

I don't have a strong opinion.  Anything it diagnoses is a code bug, probably
one that makes the affected extension useless until it's fixed.  But the patch
is small and self-contained.  I think the benefit, more than making things
safer in production, would be reducing the amount of time the developer needs
to zero in on the problem.  It wouldn't be the first time we've done that;
compare AtEOXact_Buffers().  Does this particular class of bug deserve that
aid?  I don't know.

 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow
 to use SnapshotSelf instead.  These include pgrowlocks(),
 pgstat_heap(), and get_actual_variable_range().  In all of those
 cases, only an approximately-correct answer is needed, so the change
 should be fine.  I'd also generally expect that it's very unlikely for
 any of these things to get called in a context where the table being
 scanned has been updated by the current transaction after the most
 recent command-counter increment, so in practice the change in
 semantics will probably not be noticeable at all.

SnapshotSelf is awfully special; currently, you can grep for all uses of it
and find a collection of callers with highly-technical needs.  Diluting that
with a handful of callers that legitimately preferred SnapshotNow but don't
care enough to mind SnapshotSelf in its place brings a minor loss of clarity.

From an accuracy perspective, GetActiveSnapshot() does seem ideal for
get_actual_variable_range().  That's independent of any hurry to remove
SnapshotNow.  A possible disadvantage is that older snapshots could waste time
scanning back through newer index entries, when a more-accessible value would
be good enough for estimation purposes.

To me, the major advantage of removing SnapshotNow is to force all third-party
code to reevaluate.  But that could be just as well achieved by renaming it
to, say, SnapshotImmediate.  If there are borderline-legitimate SnapshotNow
uses in our code base, I'd lean toward a rename instead.  Even if we decide to
remove every core use, third-party code might legitimately reach a different
conclusion on similar borderline cases.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-18 Thread Amit Kapila
On Friday, July 19, 2013 1:33 AM Alvaro Herrera wrote:
 Fujii Masao escribió:
  On Fri, Jul 19, 2013 at 2:45 AM, Josh Berkus j...@agliodbs.com
 wrote:
   On 07/18/2013 04:25 AM, Amit Kapila wrote:
   On Thursday, July 18, 2013 12:38 AM Josh Berkus wrote:
   On 07/17/2013 12:01 PM, Alvaro Herrera wrote:
   Both of these seem like they would make troubleshooters' lives
 more
   difficult.  I think we should just parse the auto file
 automatically
   after parsing postgresql.conf, without requiring the include
   directive
   to be there.
  
   Wait, I thought the auto file was going into the conf.d
 directory?
  
   Auto file is going into config directory, but will that make any
 difference
   if we have to parse it automatically after postgresql.conf.
  
   Well, I thought that the whole conf.d directory automatically got
 parsed
   after postgresql.conf.  No?
 
  No, in the previous patch. We needed to set include_dir to 'config'
 (though
  that's the default).
 
 I know this has been discussed already, but I want to do a quick
 summary
 of existing proposals, point out their drawbacks, and present a
 proposal
 of my own.  Note I ignore the whole file-per-setting vs.
 one-file-to-rule-them-all, because the latter has already been shot
 down.  So live ideas floated here are:
 
 1. have a config/postgresql.auto.conf file, require include_dir or
include in postgresql.conf
This breaks if user removes the config/ directory, or if the user
removes the include_dir directive from postgresql.conf.  ALTER
 SYSTEM
is in the business of doing mkdir() or failing altogether if the dir
is not present.  Doesn't seem friendly.
 
 2. have a conf.d/ directory, put the file therein; no include or
include_dir directive is necessary.
I think this is a separate patch and merits its own discussion.
 This
might be a good idea, but I don't think that this is the
way to implement ALTER SYSTEM.  If users don't want to allow conf.d
they can remove it, but that would break ALTER SYSTEM unnecessarily.
Since they are separate features it seems to me that they should
function independently.
 
 I think we should just put the config directives of ALTER SYSTEM into a
 file, not dir, alongside postgresql.conf; I would suggest
 postgresql.auto.conf.  This file is parsed automatically after
 postgresql.conf, without requiring an include directive in
 postgresql.conf.  If the user does not want that file, they can remove
 it; but a future ALTER SYSTEM will create the file again.  

 No need to
 parse stuff to find out whether the directory exists, or the file
 exists, or such nonsense.

  I think this will be removed from patch, once we implement automatic
parsing of config/postgresql.auto.conf
  after parsing postgresql.conf
 

 I think the only drawback of this is that there's no way to disable
 ALTER SYSTEM (i.e. there's no directory you can remove to prevent the
 thing from working).  But is this a desirable property?  I mean, if we
 want to disallow ALTER SYSTEM I think we should provide a direct way to
 do that, perhaps allow_alter_system = off in postgresql.conf; but I
 don't think we need to provide this, really, at least not in the first
 implementation.
 
 Note that the Debian packaging uses postgres-writable directories for
 its configuration files, so the daemon is always able to create the
 file
 in the config dir.
 
 This seems the simplest way; config tools such as Puppet know they
 always need to consider the postgresql.auto.conf file.
 


 I think the whole business of parsing the file, and then verifying
 whether the auto file has been parsed, is nonsensical and should be
 removed from the patch.

  Okay, I understood your concern about checking during parsing to 
  find error whether directory/file exist.
  In the next version, I will remove this error.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-18 Thread Hari Babu

On Friday, July 19, 2013 4:11 AM Greg Smith wrote:
On 7/9/13 12:09 AM, Amit Kapila wrote:
I think the first thing to verify is whether the results posted can be 
 validated in some other environment setup by another person.
The testcase used is posted at below link:
http://www.postgresql.org/message-id/51366323.8070...@vmware.com

That seems easy enough to do here, Heikki's test script is excellent. 
The latest patch Hari posted on July 2 has one hunk that doesn't apply 
anymore now.

The Head code change from Heikki is correct.
During the patch rebase to latest PG LZ optimization code, the above code 
change is missed.

Apart from the above changed some more changes are done in the patch, those 
are. 

1. corrected some comments in the code 
2. Added a validity check as source and history length combined cannot be more 
than or equal to 8192. 

Thanks for the review, please find the latest patch attached in the mail.

Regards,
Hari babu.



pglz-with-micro-optimization-compress-using-newdata-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of SnapshotNow

2013-07-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 To me, the major advantage of removing SnapshotNow is to force all
 third-party code to reevaluate.  But that could be just as well
 achieved by renaming it to, say, SnapshotImmediate.  If there are
 borderline-legitimate SnapshotNow uses in our code base, I'd lean
 toward a rename instead.  Even if we decide to remove every core use,
 third-party code might legitimately reach a different conclusion on
 similar borderline cases.

Meh.  If there is third-party code with a legitimate need for
SnapshotNow, all we'll have done is to create an annoying version
dependency for them.  So if we think that's actually a likely scenario,
we shouldn't rename it.  But the entire point of this change IMO is that
we *don't* think there is a legitimate use-case for SnapshotNow.

Indeed, I'm thinking I don't believe in SnapshotSelf anymore either.
It's got all the same consistency issues as SnapshotNow.  In fact, it
has *more* issues, because it's also vulnerable to weirdnesses caused by
inconsistent ordering of tuple updates among multiple tuples updated by
the same command.

Why not tell people to use SnapshotDirty if they need a
not-guaranteed-consistent result?  At least then it's pretty obvious
that you're getting some randomness in with your news.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Minmax indexes

2013-07-18 Thread Alvaro Herrera
Alvaro Herrera wrote:

 This is a preliminary proposal for Minmax indexes.  I'm experimenting
 with the code, but it's too crude to post yet, so here's a document
 explaining what they are and how they work, so that reviewers can poke
 holes to have the design improved.

After going further with the implementation, I have added a new concept,
the reverse range map.

Reverse Range Map
-

The reverse range map is a separate fork each Minmax index has.  Its purpose
is to let a way to quickly find the index tuple corresponding to a page range;
for a given heap page number, there's an O(1) way to obtain the TID of the
corresponding index tuple.
 
To scan the index, we first obtain the TID of index tuple for page 0.  If
this returns a valid TID, we read that tuple to determine the min/max bounds
for this page range.  If it returns invalid, then the range is unsummarized,
and the scan must return the whole range as needing scan.  After this
index entry has been processed, we obtain the TID of the index tuple for
page 0+pagesPerRange (currently this is a compile-time constant, but
there's no reason this cannot be a per-index property).  Continue adding
pagesPerRange until we reach the end of the heap.

To read the TID during an index scan, we follow this protocol:

* read revmap page
* obtain share lock on the buffer
* read the TID
* LockTuple that TID (using the index as relation).  A shared lock is
  sufficient.  We need the LockTuple to prevent VACUUM from recycling
  the index tuple; see below.
* release buffer lock
* read the index tuple
* release the tuple lock

On heap insert/update, it is normally cheaper to update the summary tuple
(grab the old tuple, expand the min/max range according to the new value
being inserted, insert the new index tuple, update the reverse range map)
than letting the range be unsummarized, which would require scanning the
pages in the range.

[However, when many updates on a range are going to be done, it'd be
better to mark it as unsummarized (i.e. set the revmap TID to Invalid)
and do a resummarization later, to prevent the index from bloating.
Also, it'd be sensible to allow postponing of the index update, if many
tuples are inserted; something like GIN's pending list.  We would need
to keep track the TIDs of the inserted heap tuples, so that we can
figure out the new min/max values, without having to scan the whole
range.]

To update the summary tuple for a page range, we use this protocol:

* insert a new index tuple anywhere; note its TID
* read revmap page
* obtain exclusive lock on buffer
* write the TID
* release lock

This ensures no concurrent reader can obtain a partially-written TID.
Note we don't need a tuple lock here.  Concurrent scans don't have to
worry about whether they got the old or new index tuple: if they get the
old one, the tighter values are okay from a correctness standpoint because
due to MVCC they can't possibly see the just-inserted heap tuples anyway.

For vacuuming, we need to figure out which index tuples are no longer
referenced from the reverse range map.  This requires some brute force,
but is simple:

1) scan the complete index, store each existing TIDs in a dynahash.
   Hash key is the TID, hash value is a boolean initially set to false.
2) scan the complete revmap sequentially, read the TIDs on each page.  Share
   lock on each page is sufficient.  For each TID so obtained, grab the
   element from the hash and update the boolean to true.
3) Scan the index again; for each tuple found, search the hash table.
   If the tuple is not present, it must have been added after our initial
   scan; ignore it.  If the hash flag is true, then the tuple is referenced;
   ignore it.  If the hash flag is false, then the index tuple is no longer
   referenced by the revmap; but they could be about to be accessed by a
   concurrent scan.  Do ConditionalLockTuple.  If this fails, ignore the
   tuple, it will be deleted by a future vacuum.  If lock is acquired,
   then we can safely remove the index tuple.
4) Index pages with free space can be detected by this second scan.  Register
   those with the FSM.

Note this doesn't require scanning the heap at all, or being involved in
the heap's cleanup procedure.  (In particular, tables with only minmax
indexes do not require the removed tuples' TIDs to be collected during
the heap cleanup pass.)   XXX I think this is wrong in detail; we
probably need to be using LockBufferForCleanup.

With the reverse range map it is very easy to see which page ranges in
the heap need resummarization; it's the ones marked with InvalidTid.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-07-18 Thread Fabien COELHO


I'm not a native English speaker either... Greg, could you please 
review the document?


Yes, I already took at look at it briefly.  The updates move in the right 
direction, but I can edit them usefully before commit.


Great, thanks for your help!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers