Re: [HACKERS] pg_stat_transaction patch

2010-05-25 Thread Takahiro Itagaki

Joel Jacobson j...@gluefinance.com wrote:

 I applied all the changes on 9.0beta manually and then it compiled without
 any assertion failures.
 
 I also changed the oids to a different unused range, since the ones I used
 before had been taken in 9.0beta1.

Thanks, but you still need to test your patch:

 - You need to check your patch with make check, because it requires
   adjustments in rule test; Your pg_stat_transaction_function is the
   longest name in the system catalog.

 - You need to configure postgres with --enable-cassert to enable internal
   varidations. The attached test case failed with the following TRAP.
TRAP: FailedAssertion(!(entry-trans == ((void *)0)), File: pgstat.c, Line: 
715)
TRAP: FailedAssertion(!(tabstat-trans == trans), File: pgstat.c, Line: 
1758)

 I suspect it is because get_tabstat_entry for some reason returns NULL, in
 for example pg_stat_get_transaction_tuples_inserted(PG_FUNCTION_ARGS).
 
 Does the function look valid? If you can find the error in it, the other
 functions probably have the same problem.

For the above trap, we can see the comment:
/* Shouldn't have any pending transaction-dependent counts */
We don't expect to read stats entries during transactions. I'm not sure
whether accessing transitional stats during transaction is safe or not.

We might need to go other directions, for example:
  - Use session stats instead transaction stats. You can see the same
information in difference of counters between before and after the
transaction.
  - Export pgBufferUsage instead of relation counters. They are
buffer counters for all relations, but we can obviously export
them because they are just plain variables.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Heikki Linnakangas

On 24/05/10 22:49, Alvaro Herrera wrote:

Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:


Problem: currently, if your database has a large amount of cold data,
such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
needs to touch it thanks to the visibility map.  However, every
freeze_age transactions, very old pages need to be sucked into memory
and rewritten just in order to freeze those pages.  This can have a huge
impact on system performance, and seems unjustified because the pages
are not actually being used.


I think this is nonsense.  If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*.  The problem is vacuum
reading them in so that it can verify there's nothing to do.  If we want
to avoid *reading* those pages, this solution is useless:


Suggested resolution: we would add a 4-byte field to the *page* header
which would track the XID wraparound count.


because you still have to read the page.


What's missing from the suggestion is that relfrozenxid and datfrozenxid 
also need to be expanded to 8-bytes. That way you effectively have 
8-byte XIDs, which means that you never need to vacuum to avoid XID 
wraparound.


You still need to freeze to truncate clog, though, but if you have the 
disk space, you can now do that every 100 billion transactions for 
example if you wish.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread KaiGai Kohei
(2010/05/25 12:19), Robert Haas wrote:
 On Mon, May 24, 2010 at 9:27 PM, Stephen Frostsfr...@snowman.net  wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 We have two options; If the checker function takes the list of 
 RangeTblEntry,
 it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely,
 if the checker function takes arguments in my patch, it will be comfortable
 to DoCopy(), but ExecCheckRTPerms().

 In my patch, it takes 6 arguments, but we can reference all of them from
 the given RangeTblEntry. On the other hand, if DoCopy() has to set up
 a pseudo RangeTblEntry to call checker function, it entirely needs to set
 up similar or a bit large number of variables.

 I don't know that it's really all that difficult to set up an RT in
 DoCopy or RI_Initial_Check().  In my opinion, those are the strange or
 corner cases- not the Executor code, through which all 'regular' DML is
 done.  It makes me wonder if COPY shouldn't have been implemented using
 the Executor instead, but that's, again, a completely separate topic.
 It wasn't, but it wants to play like it operates in the same kind of way
 as INSERT, so it needs to pick up the slack.
 
 I think this approach is definitely worth investigating.  KaiGai, can
 you please work up what the patch would look like if we do it this
 way?

OK, the attached patch reworks it according to the way.

* ExecCheckRTEPerms() becomes to take 2nd argument the caller to suggest
  behavior on access violation. The 'abort' argument is true, it raises
  an error using aclcheck_error() or ereport(). Otherwise, it returns
  false immediately without rest of checks.

* DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms()
  with locally built RangeTblEntry.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 21,26 
--- 21,27 
  #include arpa/inet.h
  
  #include access/heapam.h
+ #include access/sysattr.h
  #include access/xact.h
  #include catalog/namespace.h
  #include catalog/pg_type.h
***
*** 37,43 
  #include rewrite/rewriteHandler.h
  #include storage/fd.h
  #include tcop/tcopprot.h
- #include utils/acl.h
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/memutils.h
--- 38,43 
***
*** 725,733  DoCopy(const CopyStmt *stmt, const char *queryString)
  	List	   *force_notnull = NIL;
  	bool		force_quote_all = false;
  	bool		format_specified = false;
- 	AclMode		required_access = (is_from ? ACL_INSERT : ACL_SELECT);
- 	AclMode		relPerms;
- 	AclMode		remainingPerms;
  	ListCell   *option;
  	TupleDesc	tupDesc;
  	int			num_phys_attrs;
--- 725,730 
***
*** 988,993  DoCopy(const CopyStmt *stmt, const char *queryString)
--- 985,995 
  
  	if (stmt-relation)
  	{
+ 		RangeTblEntry	rte;
+ 		Bitmapset	   *columnsSet = NULL;
+ 		List		   *attnums;
+ 		ListCell	   *cur;
+ 
  		Assert(!stmt-query);
  		cstate-queryDesc = NULL;
  
***
*** 998,1026  DoCopy(const CopyStmt *stmt, const char *queryString)
  		tupDesc = RelationGetDescr(cstate-rel);
  
  		/* Check relation permissions. */
! 		relPerms = pg_class_aclmask(RelationGetRelid(cstate-rel), GetUserId(),
! 	required_access, ACLMASK_ALL);
! 		remainingPerms = required_access  ~relPerms;
! 		if (remainingPerms != 0)
  		{
! 			/* We don't have table permissions, check per-column permissions */
! 			List	   *attnums;
! 			ListCell   *cur;
! 
! 			attnums = CopyGetAttnums(tupDesc, cstate-rel, attnamelist);
! 			foreach(cur, attnums)
! 			{
! int			attnum = lfirst_int(cur);
  
! if (pg_attribute_aclcheck(RelationGetRelid(cstate-rel),
! 		  attnum,
! 		  GetUserId(),
! 		  remainingPerms) != ACLCHECK_OK)
! 	aclcheck_error(ACLCHECK_NO_PRIV, ACL_KIND_CLASS,
!    RelationGetRelationName(cstate-rel));
! 			}
  		}
  
  		/* check read-only transaction */
  		if (XactReadOnly  is_from  !cstate-rel-rd_islocaltemp)
  			PreventCommandIfReadOnly(COPY FROM);
--- 1000,1025 
  		tupDesc = RelationGetDescr(cstate-rel);
  
  		/* Check relation permissions. */
! 		attnums = CopyGetAttnums(tupDesc, cstate-rel, attnamelist);
! 		foreach (cur, attnums)
  		{
! 			int	attnum = lfirst_int(cur) - FirstLowInvalidHeapAttributeNumber;
  
! 			columnsSet = bms_add_member(columnsSet, attnum);
  		}
  
+ 		memset(rte, 0, sizeof(rte));
+ 		rte.type = T_RangeTblEntry;
+ 		rte.rtekind = RTE_RELATION;
+ 		rte.relid = RelationGetRelid(cstate-rel);
+ 		rte.requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);
+ 		if (is_from)
+ 			rte.modifiedCols = columnsSet;
+ 		else
+ 			rte.selectedCols = columnsSet;
+ 
+ 		ExecCheckRTEPerms(rte, true);
+ 
  		/* check read-only transaction */
  		if (XactReadOnly  is_from  !cstate-rel-rd_islocaltemp)
  			PreventCommandIfReadOnly(COPY FROM);
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 63,68  

Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Joseph Adams
I started a wiki article for brainstorming the JSON API:
http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
substantial changes to the draft of the API based on discussion here
and on the #postgresql IRC channel.

Is it alright to use the wiki for brainstorming, or should it stay on
the mailing list or go somewhere else?

I'll try not to spend too much time quibbling over the specifics as I
tend to do.  While the brainstorming is going on, I plan to start
implementing the datatype by itself so I can establish an initial
working codebase.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 6:08 , Sam Vilain wrote:
 http://www.postgresql.org/docs/8.4/static/sql-savepoint.html
 
 Lead us to believe that if you roll back to the same savepoint name
 twice in a row, that you might start walking back through the
 savepoints.  I guess I missed the note on ROLLBACK TO SAVEPOINT that
 that is not how it works.
 
 Here is the section:
 
 SQL requires a savepoint to be destroyed automatically when another
 savepoint with the same name is established. In PostgreSQL, the old
 savepoint is kept, though only the more recent one will be used when
 rolling back or releasing. (Releasing the newer savepoint will cause the
 older one to again become accessible to ROLLBACK TO SAVEPOINT and
 RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

I'm confused. The sentence in brackets Releasing the newer savepoint will 
cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and 
RELEASE SAVEPOINT implies that you *will* walk backwards through all the 
savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? If 
that is not how it actually works, then this whole paragraph is wrong, I'd say.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be

2010-05-25 Thread Fujii Masao
On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, OK, I think that makes sense.  Would you care to propose a patch?

 Yep. Here is the patch.

 This patch distinguishes normal shutdown from unexpected exit, while the
 server is in recovery. That is, when smart or fast shutdown is requested
 during recovery, the bgwriter sets the ControlFile-state to new-introduced
 DB_SHUTDOWNED_IN_RECOVERY state.

This patch is worth applying for 9.0? If not, I'll add it into
the next CF for 9.1.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby performance and deadlocking

2010-05-25 Thread Simon Riggs

Some performance problems have been reported on HS from two users: Erik
and Stefan.

The characteristics of those issues have been that performance is
* sporadically reduced, though mostly runs at full speed
* context switch storms reported as being associated

So we're looking for something that doesn't always happen, but when it
does it involves lots of processes and context switching.

Unfortunately neither test reporter has been able to re-run tests,
leaving me not much to go on. Though since I know the code well, I can
focus in on likely suspects fairly easily; in this case I think I have a
root cause.

Earlier this year I added deadlock detection into Startup process when
it waits for a buffer pin. The deadlock detection was simplified since
it doesn't wait for deadlock_timeout before acting, it just immediately
sends a signal to all active processes to resolve the deadlock, even if
the buffer pin is released very soon afterwards. Heikki questioned this
implementation at the time, though I said it was easier to start simple
and add more code if problems arose and time allowed. It's clear that
with 100+ connections and reasonably frequent buffer pin waits, as would
occur when accessing same data blocks on both primary and standby, that
the current too-simple coding would cause performance issues, as Heikki
implied. Certainly actual deadlocks are much rarer than buffer pin
waits, so the current coding is wasteful.

The following patch adds some simple logic to make the Startup process
wait for deadlock_timeout before it sends the deadlock resolution
signals. It does that by refactoring the API to
enable_standby_sigalrm(), though doesn't change other behaviour or add
new features.

Viewpoints?

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***
*** 388,399  ResolveRecoveryConflictWithBufferPin(void)
  	}
  	else if (MaxStandbyDelay  0)
  	{
  		/*
! 		 * Send out a request to check for buffer pin deadlocks before we
! 		 * wait. This is fairly cheap, so no need to wait for deadlock timeout
! 		 * before trying to send it out.
  		 */
! 		SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
  	}
  	else
  	{
--- 388,402 
  	}
  	else if (MaxStandbyDelay  0)
  	{
+ 		TimestampTz now = GetCurrentTimestamp();
+ 
  		/*
! 		 * Set timeout for deadlock check (only)
  		 */
! 		if (enable_standby_sig_alarm(now, now, true))
! 			sig_alarm_enabled = true;
! 		else
! 			elog(FATAL, could not set timer for process wakeup);
  	}
  	else
  	{
***
*** 410,443  ResolveRecoveryConflictWithBufferPin(void)
  		}
  		else
  		{
! 			TimestampTz fin_time;		/* Expected wake-up time by timer */
! 			long		timer_delay_secs;		/* Amount of time we set timer
!  * for */
! 			int			timer_delay_usecs;
! 
! 			/*
! 			 * Send out a request to check for buffer pin deadlocks before we
! 			 * wait. This is fairly cheap, so no need to wait for deadlock
! 			 * timeout before trying to send it out.
! 			 */
! 			SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
  
  			/*
! 			 * How much longer we should wait?
  			 */
! 			fin_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay);
! 
! 			TimestampDifference(now, fin_time,
! timer_delay_secs, timer_delay_usecs);
  
  			/*
! 			 * It's possible that the difference is less than a microsecond;
! 			 * ensure we don't cancel, rather than set, the interrupt.
  			 */
! 			if (timer_delay_secs == 0  timer_delay_usecs == 0)
! timer_delay_usecs = 1;
! 
! 			if (enable_standby_sig_alarm(timer_delay_secs, timer_delay_usecs, fin_time))
  sig_alarm_enabled = true;
  			else
  elog(FATAL, could not set timer for process wakeup);
--- 413,431 
  		}
  		else
  		{
! 			TimestampTz max_standby_time;
  
  			/*
! 			 * At what point in the future do we hit MaxStandbyDelay?
  			 */
! 			max_standby_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay);
! 			Assert(max_standby_time  now);
  
  			/*
! 			 * Wake up at MaxStandby delay, and check for deadlocks as well
! 			 * if we will be waiting longer than deadlock_timeout
  			 */
! 			if (enable_standby_sig_alarm(now, max_standby_time, false))
  sig_alarm_enabled = true;
  			else
  elog(FATAL, could not set timer for process wakeup);
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***
*** 85,90  static TimestampTz timeout_start_time;
--- 85,91 
  
  /* statement_fin_time is valid only if statement_timeout_active is true */
  static TimestampTz statement_fin_time;
+ static TimestampTz statement_fin_time2; /* valid only in recovery */
  
  
  static void RemoveProcFromArray(int code, Datum arg);
***
*** 1619,1641  handle_sig_alarm(SIGNAL_ARGS)
   * To avoid various edge cases, we must be careful to do nothing
   * when there is nothing to be done.  We also need to 

Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-25 Thread Heikki Linnakangas

On 25/05/10 13:03, Florian Pflug wrote:

On May 25, 2010, at 6:08 , Sam Vilain wrote:

http://www.postgresql.org/docs/8.4/static/sql-savepoint.html

Lead us to believe that if you roll back to the same savepoint name
twice in a row, that you might start walking back through the
savepoints.  I guess I missed the note on ROLLBACK TO SAVEPOINT that
that is not how it works.

Here is the section:

SQL requires a savepoint to be destroyed automatically when another
savepoint with the same name is established. In PostgreSQL, the old
savepoint is kept, though only the more recent one will be used when
rolling back or releasing. (Releasing the newer savepoint will cause the
older one to again become accessible to ROLLBACK TO SAVEPOINT and
RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.


I'm confused. The sentence in brackets Releasing the newer savepoint will cause the older one to again 
become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT implies that you *will* walk backwards 
through all the savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, 
no? If that is not how it actually works, then this whole paragraph is wrong, I'd say.


Releasing the newer savepoint will cause the older one to again become 
accessible, as the doc says, but rolling back to a savepoint does not 
implicitly release it. You'll have to use RELEASE SAVEPOINT for that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-25 Thread Fujii Masao
On Tue, May 18, 2010 at 3:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
 (2)
 pg_ctl -ms stop emits the following warning whenever there is the
 backup_label file in $PGDATA.

       WARNING: online backup mode is active
       Shutdown will not complete until pg_stop_backup() is called.

 This warning doesn't fit in with the shutdown during recovery case.
 Since smart shutdown might be requested by other than pg_ctl, the
 warning should be emitted in server side rather than client, I think.
 How about moving the warning to the server side?

 Though I'm not sure if this should be fixed for 9.0, I attached the
 patch (move_bkp_cancel_warning_v1.patch).

This patch is worth applying for 9.0? If not, I'll add it into
the next CF for 9.1.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression testing for psql

2010-05-25 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
  Of course, if people want to suggest tests that just shouldn't be
  included, I can go through and strip things out.
 
 Well...  I'm a little reluctant to believe that we should have 3.3M of
 tests for the entire backend and 5M of tests just for psql.  Then,
 too, there's the fact that many of these tests fail on my machine
 because my username is not sfrost, and/or because of row-ordering
 differences on backslash commands without enough ORDER BY to fully
 determine the output order.

Yeah, you know, I had fully intended to go grepping through the output
last night to check for things like that, but my wife decided I needed
sleep instead. :)  Sorry about that.  Still, it's more of a general
proposal than something I think should be committed as-is.  Should we
try to deal with those kinds of differences, or just eliminate the tests
which are dependent on username, etc?  It definitely strikes me that
there's a fair bit of code in psql we're not exercising in some fashion
in the regression suite... :/

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread Stephen Frost
KaiGai,

* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 OK, the attached patch reworks it according to the way.

I havn't looked at it yet, but the hook was added to ExecCheckRTPerms(),
not RTE.  This was for two main reasons- it seemed simpler to us and it
meant that any security module implemented would have access to
essentially everything we know the query is going to use all at once
(instead of on a per-range-table basis).  That could be particularly
useful if you wanted to, say, enforce a constraint that says no two
tables of different labels shall ever be used in the same query at the
same time (perhaps with some caveats on that, etc).

Could you change this patch to use ExecCheckRTPerms() instead?

 * ExecCheckRTEPerms() becomes to take 2nd argument the caller to suggest
   behavior on access violation. The 'abort' argument is true, it raises
   an error using aclcheck_error() or ereport(). Otherwise, it returns
   false immediately without rest of checks.
 
 * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms()
   with locally built RangeTblEntry.

Does this change fix the issue you had in RI_Initial_Check()?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 12:18 , Heikki Linnakangas wrote:
 On 25/05/10 13:03, Florian Pflug wrote:
 On May 25, 2010, at 6:08 , Sam Vilain wrote:
 http://www.postgresql.org/docs/8.4/static/sql-savepoint.html
 
 Lead us to believe that if you roll back to the same savepoint name
 twice in a row, that you might start walking back through the
 savepoints.  I guess I missed the note on ROLLBACK TO SAVEPOINT that
 that is not how it works.
 
 Here is the section:
 
 SQL requires a savepoint to be destroyed automatically when another
 savepoint with the same name is established. In PostgreSQL, the old
 savepoint is kept, though only the more recent one will be used when
 rolling back or releasing. (Releasing the newer savepoint will cause the
 older one to again become accessible to ROLLBACK TO SAVEPOINT and
 RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.
 
 I'm confused. The sentence in brackets Releasing the newer savepoint will 
 cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and 
 RELEASE SAVEPOINT implies that you *will* walk backwards through all the 
 savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? 
 If that is not how it actually works, then this whole paragraph is wrong, 
 I'd say.
 
 Releasing the newer savepoint will cause the older one to again become 
 accessible, as the doc says, but rolling back to a savepoint does not 
 implicitly release it. You'll have to use RELEASE SAVEPOINT for that.

Ah, now I get it. Thanks.

Would changing Releasing the newer savepoint will cause ...  to Explicitly 
releasing the newer savepoint or maybe even Explicitly releasing the newer 
savepoint with RELEASE SAVEPOINT will cause ... make things clearer?

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 5:37 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I started a wiki article for brainstorming the JSON API:
 http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
 substantial changes to the draft of the API based on discussion here
 and on the #postgresql IRC channel.

 Is it alright to use the wiki for brainstorming, or should it stay on
 the mailing list or go somewhere else?

Well, I think it's fine to use the wiki for brainstorming, but before
you change the design you probably need to talk about it here.  You
can't rely on everyone on -hackers to follow changes on a wiki page
somewhere.  It looks like the API has been overhauled pretty heavily
since the last version we talked about here, and I'm not sure I
understand it.

 I'll try not to spend too much time quibbling over the specifics as I
 tend to do.  While the brainstorming is going on, I plan to start
 implementing the datatype by itself so I can establish an initial
 working codebase.

Sounds good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 3:21 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 The subtle point here is whether you consider the view from the outside 
 (in the sense of what a read-only transaction started at an arbitrary time 
 can or cannot observe), or from the inside (what updating transactions can 
 observe and might base their updates on).
 
 The former case is completely determined by the commit ordering of the 
 transactions, while the latter is not - otherwise serializability wouldn't 
 be such a hard problem.
 
 BTW, doesn't all this logic fall in a heap as soon as you consider
 read-committed transactions?


Why would it? There's still a well defined point in time at which the 
transaction's effects become visible, and every other transaction commits 
either before that time or after that time. An observer started between two 
transactions sees the first's changes but not the second's. One replace 
observing read committed transactions by a series of smaller repeatable read 
transactions, since the observers are read-only anyway.

This of course says nothing about what state the updating transactions 
themselves see as the current state. For e.g. replication that is adequate, 
since you'd not replay the original commands but rather the effects they had in 
terms of physical tuple updates. On replay, the effects of a transaction to 
therefor not depend on the state the transaction sees.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-25 Thread Alex Goncharov
,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) *
| No equivalent of FETCH_COUNT is available at the libpq level, so I
| assume that the interface I am using is smart enough not to send
| gigabytes of data to FE.
| 
| Where does the result set (GBs of data) reside after I call
| PQexecPrepared?  On BE, I hope?

Sorry for asking again...

No sarcasm meant: is there no straightforward answer here?  Or nobody
is certain?  Or a wrong list?

Thanks,

-- Alex -- alex-goncha...@comcast.net --


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Dan Ports d...@csail.mit.edu:

 On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote:

 Replicating or recreating the whole predicate locking and conflict
 detection on slaves is not feasible for performance reasons. (I
 won't elaborate unless someone feels that's not intuitively
 obvious.) The only sane way I can see to have a slave database allow
 serializable behavior is to WAL-log the acquisition of a snapshot by
 a serializable transaction, and the rollback or commit, on the
 master, and to have the serializable snapshot build on a slave
 exclude any serializable transactions for which there are still
 concurrent serializable transactions. Yes, that does mean WAL-
 logging the snapshot acquisition even if the transaction doesn't yet
 have an xid, and WAL-logging the commit or rollback even if it never
 acquires an xid.

 One important observation is that any anomaly that occurs on the slave
 can be resolved by aborting a local read-only transaction. This is a
 good thing, because the alternatives are too horrible to consider.

 You could possibly cut the costs of predicate locking by having the
 master ship with each transaction the list of predicate locks it
 acquired. But you'd still have to track locks for read-only
 transactions, so maybe that's not a significant cost improvement. On
 the other hand, if you're willing to pay the price of serializability
 on the master, why not the slaves too?

I don't understand the problem. According to me, in the context of
SSI, a read-only slave can just map SERIALIZABLE to the technical
implementation of REPEATABLE READ (i.e., the currently-existing
SERIALIZABLE). The union of the transactions on the master and the
slave(s) will still exhibit SERIALIZABLE behavior because the
transactions on the slave cannot write anything and are therefore
irrelevant.

Is anything wrong with that reasoning?

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-25 Thread Yeb Havinga

Alex Goncharov wrote:

,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) *
| No equivalent of FETCH_COUNT is available at the libpq level, so I
| assume that the interface I am using is smart enough not to send
| gigabytes of data to FE.
| 
| Where does the result set (GBs of data) reside after I call

| PQexecPrepared?  On BE, I hope?

Sorry for asking again...

No sarcasm meant: is there no straightforward answer here?  Or nobody
is certain?  Or a wrong list?
  
The straighforward answer is that the libpq frontend c-library does not 
support something like the JDBC client's setFetchSize.


The GBs of data are gathered at the site of the libpq client (pgresult 
object gathered/allocated while consuming result input from backend).


regards,
Yeb Havinga



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-25 Thread Abhijit Menon-Sen
At 2010-05-25 07:35:34 -0400, alex-goncha...@comcast.net wrote:

 | Where does the result set (GBs of data) reside after I call
 | PQexecPrepared?  On BE, I hope?

Unless you explicitly declare and fetch from an SQL-level cursor, your
many GBs of data are going to be transmitted to libpq, which will eat
lots of memory. (The wire protocol does have something like cursors,
but libpq does not use them, it retrieves the entire result set.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Magnus Hagander
On Tue, May 25, 2010 at 12:57, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 25, 2010 at 5:37 AM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 I started a wiki article for brainstorming the JSON API:
 http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
 substantial changes to the draft of the API based on discussion here
 and on the #postgresql IRC channel.

 Is it alright to use the wiki for brainstorming, or should it stay on
 the mailing list or go somewhere else?

 Well, I think it's fine to use the wiki for brainstorming, but before
 you change the design you probably need to talk about it here.  You
 can't rely on everyone on -hackers to follow changes on a wiki page
 somewhere.  It looks like the API has been overhauled pretty heavily
 since the last version we talked about here, and I'm not sure I
 understand it.

The general idea that most people have been using, and that I think is
correct, is to have the discussion here on the list, and then keep a
summary of the current state of it on the wiki page so it's easier for
someone entering the discussion to catch up on where it is.


 I'll try not to spend too much time quibbling over the specifics as I
 tend to do.  While the brainstorming is going on, I plan to start
 implementing the datatype by itself so I can establish an initial
 working codebase.

 Sounds good.

Agreed.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread Stephen Frost
KaiGai,

* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 OK, the attached patch reworks it according to the way.

Reviewing this patch, there are a whole slew of problems.

#1: REALLY BIG ISSUE- Insufficient comment updates.  You've changed
function definitions in a pretty serious way as well as moved some code
around such that some of the previous comments don't make sense.  You
have got to update comments when you're writing a patch.  Indeed, the
places I see a changes in comments are when you've removed what appears
to still be valid and appropriate comments, or places where you've added
comments which are just blatently wrong with the submitted patch.

#2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of
this patch- don't, we're in feature-freeze right now and should not be
adding hooks at this time.

#3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to
utils/acl and instead added executor/executor.h to rt_triggers.c.
I don't particularly like that.  I admit that DoCopy() already knew
about the executor, and if that were the only case outside of the
executor where ExecCheckRTPerms() was getting called it'd probably be
alright, but we already have another place that wants to use it, so
let's move it to a more appropriate place.

#4: As mentioned previously, the hook (which should be added in a
separate patch anyway) makes more sense to me to be in
ExecCheckRTPerms(), not ExecCheckRTEPerms().  This also means that we
need to be calling ExecCheckRTPerms() from DoCopy and
RI_Initial_Check(), to make sure that the hook gets called.  To that
end, I wouldn't even expose ExecCheckRTEPerms() outside of acl.c.  Also,
there should be a big comment about not using or calling
ExecCheckRTEPerms() directly outside of ExecCheckRTPerms() since the
hook would then be skipped.

#5: In DoCopy, you can remove relPerms and remainingPerms, but I'd
probably leave required_access up near the top and then just use it to
set rte-required_access directly rather than moving that bit deep down
into the function.

#6: I havn't checked yet, but if there are other things in an RTE which
would make sense in the DoCopy case, beyond just what's needed for the
permissions checking, and which wouldn't be 'correct' with a NULL'd
value, I would set those.  Yes, we're building the RTE to check
permissions, but we don't want someone downstream to be suprised when
they make a change to something in the permissions checking and discover
that a value in RTE they expected to be there wasn't valid.  Even more
so, if there are function helpers which can be used to build an RTE, we
should be using them.  The same goes for RI_Initial_Check().

#7: I'd move the conditional if (is_from) into the foreach which is
building the columnsSet and eliminate the need for columnsSet; I don't
see that it's really adding much here.

#8: When moving ExecCheckRTPerms(), you should rename it to be more like
the other function calls in acl.h  Perhaps pg_rangetbl_aclcheck()?
Also, it should return an actual AclResult instead of just true/false.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-25 Thread Alex Goncharov
,--- Abhijit Menon-Sen (Tue, 25 May 2010 17:26:18 +0530) *
| Unless you explicitly declare and fetch from an SQL-level cursor, your
| many GBs of data are going to be transmitted to libpq, which will eat
| lots of memory. (The wire protocol does have something like cursors,
| but libpq does not use them, it retrieves the entire result set.)
,--- Yeb Havinga (Tue, 25 May 2010 14:08:51 +0200) *
| The GBs of data are gathered at the site of the libpq client (pgresult 
| object gathered/allocated while consuming result input from backend).
`--*

Thank you very much!

-- Alex -- alex-goncha...@comcast.net --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-25 Thread Robert Haas
On Mon, May 24, 2010 at 11:35 PM, Bruce Momjian br...@momjian.us wrote:
 Have you read the docs?  It does mention the issue with /contrib and
 stuff.  How do I document a limitation I don't know about?  This is all
 very vague.  Please suggest some wording.

OK, here's an attempt.  Please fact-check.

--

General Limitations

pg_upgrade relies on binary compatibility between the old and new
on-disk formats, including the on-disk formats of individual data
types.  pg_upgrade attempts to detect cases in which the on-disk
format has changed; for example, it verifies that the old and new
clusters have the same value for --enable-integer-datetimes.  However,
there is no systematic way for pg_upgrade to detect problems of this
type; it has hard-coded knowledge of the specific cases known to exist
in core PostgreSQL, including /contrib.  If third-party or
user-defined data types or access methods are used, it is the user's
responsibility to verify that the versions loaded into the old and new
clusters use compatible on-disk formats.  If they do not, pg_upgrade
may appear to work but subsequently crash or silently corrupt data.

pg_upgrade also relies on ABI compatibility between modules loaded
into the old and new clusters.  For example, if an SQL function in the
old cluster is defined to call a particular C function, pg_upgrade
will recreate SQL function in the new cluster and will configure it to
call the same C function.  If no such C function can be found by the
new cluster, pg_upgrade will simply fail.  However, if a C function of
the same name exists in the new cluster, but expects a different
number of arguments or different types of arguments, then it is likely
to crash the system when called.  In the worst case, data corruption
could result.

--

Also, the following sentence appears not to fit with our only to 9.0
policy: For Windows users, note that due to different integer
datetimes settings used by the one-click installer and the MSI
installer, it is only possible to upgrade from version 8.3 of the
one-click distribution to version 8.4 of the one-click distribution.
It is not possible to upgrade from the MSI installer to the one-click
installer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-25 Thread Andrew Chernow

On 05/25/2010 07:35 AM, Alex Goncharov wrote:

,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) *
| No equivalent of FETCH_COUNT is available at the libpq level, so I
| assume that the interface I am using is smart enough not to send
| gigabytes of data to FE.
|
| Where does the result set (GBs of data) reside after I call
| PQexecPrepared?  On BE, I hope?

Sorry for asking again...

No sarcasm meant: is there no straightforward answer here?  Or nobody
is certain?  Or a wrong list?



Issue multiple queries and make use of LIMIT/OFFSET.  You'll have to go 
manual on this one.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread Stephen Frost
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms()
   with locally built RangeTblEntry.

Maybe I missed it somewhere, but we still need to address the case where
the user doesn't have those SELECT permissions that we're looking for in
RI_Initial_Check(), right?  KaiGai, your patch should be addressing that
in a similar fashion..

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard (was: Add xpath_exists Function)

2010-05-25 Thread Mike Fowler
I've been reading the SQL/XML standard and discovered that it defines a 
function named XMLEXISTS that does exactly what the todo item 
xpath_exists defines. My original patch named the function as per the 
todo but I think using the function name from the standard is a better 
idea. So this patch is the same as before, but the function is now named 
XMLEXISTS instead of xpath_exists.


Regards,

--
Mike Fowler
Registered Linux user: 379787

Index: src/backend/utils/adt/xml.c
===
RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/src/backend/utils/adt/xml.c,v
retrieving revision 1.97
diff -c -r1.97 xml.c
*** src/backend/utils/adt/xml.c	3 Mar 2010 17:29:45 -	1.97
--- src/backend/utils/adt/xml.c	25 May 2010 14:02:33 -
***
*** 3495,3497 
--- 3495,3668 
  	return 0;
  #endif
  }
+ 
+ /*
+  * Determines if the node specified by the supplied XPath exists
+  * in a given XML document, returning a boolean.
+  *
+  * It is up to the user to ensure that the XML passed is in fact
+  * an XML document - XPath doesn't work easily on fragments without
+  * a context node being known.
+  */
+ Datum
+ xmlexists(PG_FUNCTION_ARGS)
+ {
+ #ifdef USE_LIBXML
+ 	text	   *xpath_expr_text = PG_GETARG_TEXT_P(0);
+ 	xmltype*data = PG_GETARG_XML_P(1);
+ 	ArrayType  *namespaces = PG_GETARG_ARRAYTYPE_P(2);
+ 	xmlParserCtxtPtr ctxt = NULL;
+ 	xmlDocPtr	doc = NULL;
+ 	xmlXPathContextPtr xpathctx = NULL;
+ 	xmlXPathCompExprPtr xpathcomp = NULL;
+ 	char	   *datastr;
+ 	int32		len;
+ 	int32		xpath_len;
+ 	xmlChar*string;
+ 	xmlChar*xpath_expr;
+ 	int			i;
+ 	int			ndim;
+ 	Datum	   *ns_names_uris;
+ 	bool	   *ns_names_uris_nulls;
+ 	int			ns_count;
+ 	int			result;
+ 
+ 	/*
+ 	 * Namespace mappings are passed as text[].  If an empty array is passed
+ 	 * (ndim = 0, 0-dimensional), then there are no namespace mappings.
+ 	 * Else, a 2-dimensional array with length of the second axis being equal
+ 	 * to 2 should be passed, i.e., every subarray contains 2 elements, the
+ 	 * first element defining the name, the second one the URI.  Example:
+ 	 * ARRAY[ARRAY['myns', 'http://example.com'], ARRAY['myns2',
+ 	 * 'http://example2.com']].
+ 	 */
+ 	ndim = ARR_NDIM(namespaces);
+ 	if (ndim != 0)
+ 	{
+ 		int		   *dims;
+ 
+ 		dims = ARR_DIMS(namespaces);
+ 
+ 		if (ndim != 2 || dims[1] != 2)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_DATA_EXCEPTION),
+ 	 errmsg(invalid array for XML namespace mapping),
+ 	 errdetail(The array must be two-dimensional with length of the second axis equal to 2.)));
+ 
+ 		Assert(ARR_ELEMTYPE(namespaces) == TEXTOID);
+ 
+ 		deconstruct_array(namespaces, TEXTOID, -1, false, 'i',
+ 		  ns_names_uris, ns_names_uris_nulls,
+ 		  ns_count);
+ 
+ 		Assert((ns_count % 2) == 0);	/* checked above */
+ 		ns_count /= 2;			/* count pairs only */
+ 	}
+ 	else
+ 	{
+ 		ns_names_uris = NULL;
+ 		ns_names_uris_nulls = NULL;
+ 		ns_count = 0;
+ 	}
+ 
+ 	datastr = VARDATA(data);
+ 	len = VARSIZE(data) - VARHDRSZ;
+ 	xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
+ 	if (xpath_len == 0)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_DATA_EXCEPTION),
+  errmsg(empty XPath expression)));
+ 
+ 	string = (xmlChar *) palloc((len + 1) * sizeof(xmlChar));
+ 	memcpy(string, datastr, len);
+ 	string[len] = '\0';
+ 
+ 	xpath_expr = (xmlChar *) palloc((xpath_len + 1) * sizeof(xmlChar));
+ 	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
+ 	xpath_expr[xpath_len] = '\0';
+ 
+ 	pg_xml_init();
+ 	xmlInitParser();
+ 
+ 	PG_TRY();
+ 	{
+ 		/*
+ 		 * redundant XML parsing (two parsings for the same value during one
+ 		 * command execution are possible)
+ 		 */
+ 		ctxt = xmlNewParserCtxt();
+ 		if (ctxt == NULL)
+ 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		could not allocate parser context);
+ 		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
+ 		if (doc == NULL)
+ 			xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
+ 		could not parse XML document);
+ 		xpathctx = xmlXPathNewContext(doc);
+ 		if (xpathctx == NULL)
+ 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		could not allocate XPath context);
+ 		xpathctx-node = xmlDocGetRootElement(doc);
+ 		if (xpathctx-node == NULL)
+ 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
+ 		could not find root XML element);
+ 
+ 		/* register namespaces, if any */
+ 		if (ns_count  0)
+ 		{
+ 			for (i = 0; i  ns_count; i++)
+ 			{
+ char	   *ns_name;
+ char	   *ns_uri;
+ 
+ if (ns_names_uris_nulls[i * 2] ||
+ 	ns_names_uris_nulls[i * 2 + 1])
+ 	ereport(ERROR,
+ 			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 	  errmsg(neither namespace name nor URI may be null)));
+ ns_name = TextDatumGetCString(ns_names_uris[i * 2]);
+ ns_uri = TextDatumGetCString(ns_names_uris[i * 2 + 1]);
+ if (xmlXPathRegisterNs(xpathctx,
+ 	   (xmlChar *) ns_name,
+ 	   (xmlChar *) ns_uri) != 0)
+ 	ereport(ERROR,		/* is 

Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-25 Thread Andrew Dunstan



Alex Goncharov wrote:

,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) *
| No equivalent of FETCH_COUNT is available at the libpq level, so I
| assume that the interface I am using is smart enough not to send
| gigabytes of data to FE.
| 
| Where does the result set (GBs of data) reside after I call

| PQexecPrepared?  On BE, I hope?

Sorry for asking again...

No sarcasm meant: is there no straightforward answer here?  Or nobody
is certain?  Or a wrong list?


  


You have been given the answer. Please re-read the replies, e.g. the one 
from Abhijit Menon-Sen.


The data is saved on the client side before the call returns. If that 
uses too much memory, use a cursor.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Joseph Adams
 Well, I think it's fine to use the wiki for brainstorming, but before
 you change the design you probably need to talk about it here.  You
 can't rely on everyone on -hackers to follow changes on a wiki page
 somewhere.  It looks like the API has been overhauled pretty heavily
 since the last version we talked about here, and I'm not sure I
 understand it.

I'll try to explain it in one big nutshell:

Instead of, for instance, json_to_number('5') and number_to_json(5), I
propose changing it to from_json(5)::INT and to_json('5').  Note how
from_json simply returns TEXT containing the underlying value for the
user to cast.  I plan to make calling to_json/from_json with arrays or
objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw
an error for now, as implementing all the specifics of this could be
quite distracting.

If I'm not mistaken, json_object([content [AS name] [, ...]] | *)
RETURNS json can't be implemented without augmenting the grammar (as
was done with xmlforest), so I considered making it take a RECORD
parameter like the hstore(RECORD) function does, as was suggested on
IRC.  However, this may be inadequate for selecting some columns but
not others.  Using examples from hstore:

SELECT hstore(foo) FROM foo;  = 'e=2.71828, pi=3.14159'
-- this works, but what if we only want one field?

SELECT hstore(pi) FROM foo;
-- function type error

SELECT hstore(row(pi)) FROM foo;  = 'f1=3.14159'
-- field name is lost

SELECT hstore(bar) FROM (select pi FROM foo) AS bar;  = 'f1=3.14159'
-- ugly, and field name is *still* lost

To get (and set, which I overlooked before), use json_get and
json_set.  These take JSONPath expressions, but I don't plan to
implement all sorts of fancy features during the summer.  However, I
do plan to support some kind of parameter substitution so you can do
this:

json_get('[0,1,4,9,16,25]', '[%]' %% 2)= '4'::TEXT

For this use case, though, it would be simpler to say:

'[0,1,4,9,16,25]'::JSON - 2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-25 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 24, 2010 at 11:35 PM, Bruce Momjian br...@momjian.us wrote:
  Have you read the docs? ?It does mention the issue with /contrib and
  stuff. ?How do I document a limitation I don't know about? ?This is all
  very vague. ?Please suggest some wording.
 
 OK, here's an attempt.  Please fact-check.
 
 --
 
 General Limitations
 
 pg_upgrade relies on binary compatibility between the old and new
 on-disk formats, including the on-disk formats of individual data
 types.  pg_upgrade attempts to detect cases in which the on-disk
 format has changed; for example, it verifies that the old and new
 clusters have the same value for --enable-integer-datetimes.  However,
 there is no systematic way for pg_upgrade to detect problems of this
 type; it has hard-coded knowledge of the specific cases known to exist
 in core PostgreSQL, including /contrib.  If third-party or
 user-defined data types or access methods are used, it is the user's
 responsibility to verify that the versions loaded into the old and new
 clusters use compatible on-disk formats.  If they do not, pg_upgrade
 may appear to work but subsequently crash or silently corrupt data.

OK, I have added a mention of the issues above, in a more abbreviated
format.

 pg_upgrade also relies on ABI compatibility between modules loaded
 into the old and new clusters.  For example, if an SQL function in the
 old cluster is defined to call a particular C function, pg_upgrade
 will recreate SQL function in the new cluster and will configure it to
 call the same C function.  If no such C function can be found by the
 new cluster, pg_upgrade will simply fail.  However, if a C function of
 the same name exists in the new cluster, but expects a different
 number of arguments or different types of arguments, then it is likely
 to crash the system when called.  In the worst case, data corruption
 could result.

These issues are not unique to pg_upgrade, and could happen even in a
pg_dump restore.

 Also, the following sentence appears not to fit with our only to 9.0
 policy: For Windows users, note that due to different integer
 datetimes settings used by the one-click installer and the MSI
 installer, it is only possible to upgrade from version 8.3 of the
 one-click distribution to version 8.4 of the one-click distribution.
 It is not possible to upgrade from the MSI installer to the one-click
 installer.

Agreed.  I added a 8.4 or later mention.  It is not worth calling it
9.0 or later because then I would have to update this mention for
every major release.

Applied patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
Index: doc/src/sgml/pgupgrade.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/pgupgrade.sgml,v
retrieving revision 1.10
diff -c -c -r1.10 pgupgrade.sgml
*** doc/src/sgml/pgupgrade.sgml	24 May 2010 17:43:39 -	1.10
--- doc/src/sgml/pgupgrade.sgml	25 May 2010 14:50:36 -
***
*** 16,21 
--- 16,31 
9.0.1 - 9.0.4.
   /para
  
+  para
+   applicationpg_upgrade/ works because, though new features are
+   regularly added to Postgres major releases, the internal data storage
+   format rarely changes.  applicationpg_upgrade/ does its best to
+   make sure the old and new clusters are binary-compatible, e.g.  by
+   checking for compatible compile-time settings.  It is important that
+   any external modules are also binary compatibile, though this cannot
+   be checked by applicationpg_upgrade/.
+  /para
+ 
   sect2
titleSupported Versions/title
  
***
*** 440,446 
sect2
titleLimitations in migrating emphasisfrom/ PostgreSQL 8.3/title
   
-  
para
 Upgrading from PostgreSQL 8.3 has additional restrictions not present
 when upgrading from later PostgreSQL releases.  For example,
--- 450,455 
***
*** 502,509 
 For Windows users, note that due to different integer datetimes settings
 used by the one-click installer and the MSI installer, it is only
 possible to upgrade from version 8.3 of the one-click distribution to
!version 8.4 of the one-click distribution. It is not possible to upgrade
!from the MSI installer to the one-click installer.
/para
   
para  
--- 511,518 
 For Windows users, note that due to different integer datetimes settings
 used by the one-click installer and the MSI installer, it is only
 possible to upgrade from version 8.3 of the one-click distribution to
!version 8.4 or later of the one-click distribution. It is not
!possible to upgrade from the MSI installer to the one-click installer.
/para
   
para  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clearing psql`s input buffer after auto-reconnect

2010-05-25 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting
...
 Now #1 might be the best long-term solution but I have no particular
 appetite to tackle it, and #2 is just too ugly to contemplate.  That
 leaves #3, which is a bit ugly in its own right but seems like the best
 fix we're likely to get.

 Comments, better ideas?

I like #3. If this were a more common event I might lean towards #1 
but it's not so #3 seems fine.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005251113
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv76TYACgkQvJuQZxSWSsiP6wCePU5TDpfFiv7MQpQ0vdIMms0d
XZcAoMES58ilXZr2m5TEfeRUeiuuuss2
=36Z9
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard (was: Add xpath_exists Function)

2010-05-25 Thread Erik Rijkers
On Tue, May 25, 2010 16:31, Mike Fowler wrote:
 I've been reading the SQL/XML standard and discovered that it defines a
 function named XMLEXISTS that does exactly what the todo item
 xpath_exists defines. My original patch named the function as per the
 todo but I think using the function name from the standard is a better
 idea. So this patch is the same as before, but the function is now named
 XMLEXISTS instead of xpath_exists.


I tried this path (cvs HEAD, applies without error), but get this error:

[...]
utils/adt/xml.o: In function `xmlexists':
/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src/backend/utils/adt/xml.c:3639:
 undefined
reference to `xmlXPathCompiledEvalToBoolean'
collect2: ld returned 1 exit status
make[2]: *** [postgres] Error 1
make[2]: Leaving directory 
`/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src/backend'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src'
make: *** [all] Error 2



./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.xmlexists 
--with-pgport=6548
--quiet --enable-depend --enable-cassert --enable-debug --with-openssl 
--with-perl --with-libxml
--with-libxslt


centos 5.4  2.6.18-164.el5  x86_64 GNU/Linux
libxml2.x86_64  2.6.26-2.1.2.8  installed
libxml2-devel.x86_642.6.26-2.1.2.8  installed



Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Mike Fowler

Erik Rijkers wrote:

libxml2.x86_64  2.6.26-2.1.2.8  installed
libxml2-devel.x86_642.6.26-2.1.2.8  installed
  


Thanks for testing my patch Erik. It turns out I've got libxml2 
installed at version 2.7.5. Searching the gnome mailing lists, it turns 
out xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very 
next version from yours, 2.6.27 (see: 
http://mail.gnome.org/archives/xml/2006-October/msg00119.html).


Regards,

--
Mike Fowler
Registered Linux user: 379787


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Simon Riggs
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:

 In some systems (data warehousing, replication), the order of commits is
 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.

I think it would be quite simple to read WAL. WALSender reads the WAL
file after its been flushed, so it would be simple for it to read a blob
of WAL and then extract the commit order from it.

Overall though, it would be easier and more efficient to *add* info to
WAL and then do all this processing *after* WAL has been transported
elsewhere. Extracting info with DDL triggers, normal triggers, commit
order and everything else seems like too much work to me. Every other
RDBMS has moved away from trigger-based replication and we should give
that serious consideration also.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 12:40 +0900, Fujii Masao wrote:
 On Tue, May 25, 2010 at 10:29 AM, Josh Berkus j...@agliodbs.com wrote:
  I agree that #4 should be done last, but it will be needed, not in the
  least by your employer ;-) .  I don't see any obvious way to make #4
  compatible with any significant query load on the slave, but in general
  I'd think that users of #4 are far more concerned with 0% data loss than
  they are with getting the slave to run read queries.
 
 Since #2 and #3 are enough for 0% data loss, I think that such users
 would be more concerned about what results are visible in the standby.
 No?

Please add #4 also. You can do that easily at the same time as #2 and
#3, and it will leave me free to fix the perceived conflict problems.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Mon, 2010-05-24 at 22:20 +0900, Fujii Masao wrote:

 Second, we need to discuss about how to specify the synch
 level. There are three approaches:
 
 * Per standby
   Since the purpose, location and H/W resource often differ
   from one standby to another, specifying level per standby
   (i.e., we set the level in recovery.conf) is a
   straightforward approach, I think. For example, we can
   choose #3 for high-availability standby near the master,
   and choose #1 (async) for the disaster recovery standby
   remote.
 
 * Per transaction
   Define the PGC_USERSET option specifying the level and
   specify it on the master in response to the purpose of
   transaction. In this approach, for example, we can choose
   #4 for the transaction which should be visible on the
   standby as soon as a success of the commit has been
   returned to a client. We can also choose #1 for
   time-critical but not mission-critical transaction.
 
 * Mix
   Allow users to specify the level per standby and
   transaction at the same time, and then calculate the real
   level from them by using some algorithm.
 
 Which should we adopt for 9.1? I'd like to implement the
 per-standby approach at first since it's simple and seems
 to cover more use cases. Thought?

-1

Synchronous replication implies that a commit should wait. This wait is
experienced by the transaction, not by other parts of the system. If we
define robustness at the standby level then robustness depends upon
unseen administrators, as well as the current up/down state of standbys.
This is action-at-a-distance in its worst form. 

Imagine having 2 standbys, 1 synch, 1 async. If the synch server goes
down, performance will improve and robustness will have been lost. What
good would that be?

Imagine a standby connected over a long distance. DBA brings up standby
in synch mode accidentally and the primary server hits massive
performance problems without any way of directly controlling this.

The worst aspect of standby-level controls is that nobody ever knows how
safe a transaction is. There is no definition or test for us to check
exactly how safe any particular transaction is. Also, the lack of safety
occurs at the time when you least want it - when one of your servers is
already down.

So I call per-standby settings simple, and broken in multiple ways.

Putting the control in the hands of the transaction owner (i.e. on the
master) is exactly where the control should be. I personally like the
idea of that being a USERSET, though could live with system wide
settings if need be. But the control must be on the *master* not on the
standbys.

The best parameter we can specify is the number of servers that we wish
to wait for confirmation from. That is a definition that easily manages
the complexity of having various servers up/down at any one time. It
also survives misconfiguration more easily, as well as providing a
workaround if replicating across a bursty network where we can't
guarantee response times, even of the typical response time is good.

(We've discussed this many times before over a period of years and not
really sure why we have to re-discuss this repeatedly just because
people disagree. You don't mention the earlier discussions, not sure
why. If we want to follow the community process, then all previous
discussions need to be taken into account, unless things have changed -
which they haven't: same topic, same people, AFAICS.)


-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Mon, 2010-05-24 at 18:29 -0700, Josh Berkus wrote:

 If people agree that the above is our roadmap, implementing
 per-standby first makes sense, and then we can implement per-session
 GUC later.

IMHO per-standby sounds simple, but is dangerously simplistic,
explained on another part of the thread.

We need to think clearly about failure modes and how they will be
handled. Failure modes and edge cases completely govern the design here.
All running smoothly isn't a major concern and so it appears that the
user interface can be done various ways.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 19:12 +0900, Fujii Masao wrote:
 On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
  Hmm, OK, I think that makes sense.  Would you care to propose a patch?
 
  Yep. Here is the patch.
 
  This patch distinguishes normal shutdown from unexpected exit, while the
  server is in recovery. That is, when smart or fast shutdown is requested
  during recovery, the bgwriter sets the ControlFile-state to new-introduced
  DB_SHUTDOWNED_IN_RECOVERY state.
 
 This patch is worth applying for 9.0? If not, I'll add it into
 the next CF for 9.1.

Presumably Robert will be applying the patch? It seems to address the
concern raised on the thread.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 12:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Synchronous replication implies that a commit should wait. This wait is
 experienced by the transaction, not by other parts of the system. If we
 define robustness at the standby level then robustness depends upon
 unseen administrators, as well as the current up/down state of standbys.
 This is action-at-a-distance in its worst form.

Maybe, but I can't help thinking people are going to want some form of
this.  The case where someone wants to do sync rep to the machine in
the next rack over and async rep to a server at a remote site seems
too important to ignore.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 12:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2010-05-25 at 19:12 +0900, Fujii Masao wrote:
 On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
  Hmm, OK, I think that makes sense.  Would you care to propose a patch?
 
  Yep. Here is the patch.
 
  This patch distinguishes normal shutdown from unexpected exit, while the
  server is in recovery. That is, when smart or fast shutdown is requested
  during recovery, the bgwriter sets the ControlFile-state to new-introduced
  DB_SHUTDOWNED_IN_RECOVERY state.

 This patch is worth applying for 9.0? If not, I'll add it into
 the next CF for 9.1.

 Presumably Robert will be applying the patch? It seems to address the
 concern raised on the thread.

Yes, I was planning to review it.  But if you or someone else would
like to cut in, that's OK too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 12:04 PM, Mike Fowler m...@mlfowler.com wrote:
 Erik Rijkers wrote:

 libxml2.x86_64          2.6.26-2.1.2.8  installed
 libxml2-devel.x86_64    2.6.26-2.1.2.8  installed


 Thanks for testing my patch Erik. It turns out I've got libxml2 installed at
 version 2.7.5. Searching the gnome mailing lists, it turns out
 xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next
 version from yours, 2.6.27 (see:
 http://mail.gnome.org/archives/xml/2006-October/msg00119.html).

We're unlikely to accept this patch if it changes the minimum version
of libxml2 required to compile PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Joshua D. Drake
On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote:
 On Tue, May 25, 2010 at 12:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  Synchronous replication implies that a commit should wait. This wait is
  experienced by the transaction, not by other parts of the system. If we
  define robustness at the standby level then robustness depends upon
  unseen administrators, as well as the current up/down state of standbys.
  This is action-at-a-distance in its worst form.
 
 Maybe, but I can't help thinking people are going to want some form of
 this.  The case where someone wants to do sync rep to the machine in
 the next rack over and async rep to a server at a remote site seems
 too important to ignore.

Uhh yeah, that is pretty much the standard use case. The next rack is
only 50% of the equation. The next part is the disaster recovery rack
over 100Mb (or even 10Mb) that is half way across the country. It is
common, very common.

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Confused about the buffer pool size

2010-05-25 Thread MMK
Hello All:
In the code (costsize.c), I see that effective_cache_size is set 
to DEFAULT_EFFECTIVE_CACHE_SIZE.
This is defined as follows in cost.h
#define DEFAULT_EFFECTIVE_CACHE_SIZE 16384
But when I say 
show shared_buffers in psql I get,
shared_buffers  28MB
In postgresql.conf file, the following lines appear
shared_buffers = 28MB                   # min 128kB           # (change 
requires restart)#temp_buffers = 8MB                     # min 800kB

So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages.
So should effective_cache_size be set to 3584 rather than the 16384?
Thanks,
MMK.













  

Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 10:52 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 Well, I think it's fine to use the wiki for brainstorming, but before
 you change the design you probably need to talk about it here.  You
 can't rely on everyone on -hackers to follow changes on a wiki page
 somewhere.  It looks like the API has been overhauled pretty heavily
 since the last version we talked about here, and I'm not sure I
 understand it.

 I'll try to explain it in one big nutshell:

 Instead of, for instance, json_to_number('5') and number_to_json(5), I
 propose changing it to from_json(5)::INT and to_json('5').  Note how
 from_json simply returns TEXT containing the underlying value for the
 user to cast.  I plan to make calling to_json/from_json with arrays or
 objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw
 an error for now, as implementing all the specifics of this could be
 quite distracting.

I don't see how that's an improvement over the previous design.  It
seems like it adds a lot of extra casting and removes useful list
operations without any corresponding advantage.

 If I'm not mistaken, json_object([content [AS name] [, ...]] | *)
 RETURNS json can't be implemented without augmenting the grammar (as
 was done with xmlforest), so I considered making it take a RECORD
 parameter like the hstore(RECORD) function does, as was suggested on
 IRC.  However, this may be inadequate for selecting some columns but
 not others.  Using examples from hstore:

 SELECT hstore(foo) FROM foo;  = 'e=2.71828, pi=3.14159'
 -- this works, but what if we only want one field?

 SELECT hstore(pi) FROM foo;
 -- function type error

 SELECT hstore(row(pi)) FROM foo;  = 'f1=3.14159'
 -- field name is lost

 SELECT hstore(bar) FROM (select pi FROM foo) AS bar;  = 'f1=3.14159'
 -- ugly, and field name is *still* lost

Yeah.  I'm not sure what to do about this problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 If we define robustness at the standby level then robustness
 depends upon unseen administrators, as well as the current
 up/down state of standbys.  This is action-at-a-distance in its
 worst form.
 
 Maybe, but I can't help thinking people are going to want some
 form of this.  The case where someone wants to do sync rep to the
 machine in the next rack over and async rep to a server at a
 remote site seems too important to ignore.
 
I think there may be a terminology issue here -- I took configure
by standby to mean that *at the master* you would specify rules for
each standby.  I think Simon took it to mean that each standby would
define the rules for replication to it.  Maybe this issue can
resolve gracefully with a bit of clarification?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Add _PG_init to PL language handler documentation

2010-05-25 Thread Jonathan Leto
Howdy,

This tiny doc patch adds _PG_init to the skeleton example code for a
PL. The information is quite valuable to PL authors, who might miss it
when it is described in the shared library documentation.

This patch was based off of 6e2ba96 in the git mirror and a colorized
diff can be viewed here:

http://github.com/leto/postgres/commit/a9e265a7f55a0605fb4c6135f0f689c8b89e9623

Duke

-- 
Jonathan Duke Leto
jonat...@leto.net
http://leto.net


pginit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote:
 On Tue, May 25, 2010 at 12:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  Synchronous replication implies that a commit should wait. This wait is
  experienced by the transaction, not by other parts of the system. If we
  define robustness at the standby level then robustness depends upon
  unseen administrators, as well as the current up/down state of standbys.
  This is action-at-a-distance in its worst form.
 
 Maybe, but I can't help thinking people are going to want some form of
 this.  
 The case where someone wants to do sync rep to the machine in
 the next rack over and async rep to a server at a remote site seems
 too important to ignore.

The use case of machine in the next rack over and async rep to a server
at a remote site *is* important, but you give no explanation as to why
that implies per-standby is the solution to it.

If you read the rest of my email, you'll see that I have explained the
problems per-standby settings would cause.

Please don't be so quick to claim it is me ignoring anything.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Mike Fowler

Robert Haas wrote:

On Tue, May 25, 2010 at 12:04 PM, Mike Fowler m...@mlfowler.com wrote:
  

Erik Rijkers wrote:


libxml2.x86_64  2.6.26-2.1.2.8  installed
libxml2-devel.x86_642.6.26-2.1.2.8  installed

  

Thanks for testing my patch Erik. It turns out I've got libxml2 installed at
version 2.7.5. Searching the gnome mailing lists, it turns out
xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next
version from yours, 2.6.27 (see:
http://mail.gnome.org/archives/xml/2006-October/msg00119.html).



We're unlikely to accept this patch if it changes the minimum version
of libxml2 required to compile PostgreSQL


Why? 2.6.27 is almost 4 years old.

I realise that my patch didn't update configure and configure.in, and 
indeed I didn't think of it when I responded to Erik (I'm too used to 
the Java world where people manage their own dependencies). I've now 
attached the updated patch which ups the check from version 2.6.23 to 
2.6.27.


Regards,

--
Mike Fowler
Registered Linux user: 379787

Index: configure
===
RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/configure,v
retrieving revision 1.679
diff -c -r1.679 configure
*** configure	13 May 2010 22:07:40 -	1.679
--- configure	25 May 2010 16:57:49 -
***
*** 9079,9087 
  
  if test $with_libxml = yes ; then
  
! { $as_echo $as_me:$LINENO: checking for xmlSaveToBuffer in -lxml2 5
! $as_echo_n checking for xmlSaveToBuffer in -lxml2...  6; }
! if test ${ac_cv_lib_xml2_xmlSaveToBuffer+set} = set; then
$as_echo_n (cached)  6
  else
ac_check_lib_save_LIBS=$LIBS
--- 9079,9087 
  
  if test $with_libxml = yes ; then
  
! { $as_echo $as_me:$LINENO: checking for xmlXPathCompiledEvalToBoolean in -lxml2 5
! $as_echo_n checking for xmlXPathCompiledEvalToBoolean in -lxml2...  6; }
! if test ${ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean+set} = set; then
$as_echo_n (cached)  6
  else
ac_check_lib_save_LIBS=$LIBS
***
*** 9099,9109 
  #ifdef __cplusplus
  extern C
  #endif
! char xmlSaveToBuffer ();
  int
  main ()
  {
! return xmlSaveToBuffer ();
;
return 0;
  }
--- 9099,9109 
  #ifdef __cplusplus
  extern C
  #endif
! char xmlXPathCompiledEvalToBoolean ();
  int
  main ()
  {
! return xmlXPathCompiledEvalToBoolean ();
;
return 0;
  }
***
*** 9129,9140 
  	 test $cross_compiling = yes ||
  	 $as_test_x conftest$ac_exeext
 }; then
!   ac_cv_lib_xml2_xmlSaveToBuffer=yes
  else
$as_echo $as_me: failed program was: 5
  sed 's/^/| /' conftest.$ac_ext 5
  
! 	ac_cv_lib_xml2_xmlSaveToBuffer=no
  fi
  
  rm -rf conftest.dSYM
--- 9129,9140 
  	 test $cross_compiling = yes ||
  	 $as_test_x conftest$ac_exeext
 }; then
!   ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean=yes
  else
$as_echo $as_me: failed program was: 5
  sed 's/^/| /' conftest.$ac_ext 5
  
! 	ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean=no
  fi
  
  rm -rf conftest.dSYM
***
*** 9142,9150 
conftest$ac_exeext conftest.$ac_ext
  LIBS=$ac_check_lib_save_LIBS
  fi
! { $as_echo $as_me:$LINENO: result: $ac_cv_lib_xml2_xmlSaveToBuffer 5
! $as_echo $ac_cv_lib_xml2_xmlSaveToBuffer 6; }
! if test x$ac_cv_lib_xml2_xmlSaveToBuffer = xyes; then
cat confdefs.h _ACEOF
  #define HAVE_LIBXML2 1
  _ACEOF
--- 9142,9150 
conftest$ac_exeext conftest.$ac_ext
  LIBS=$ac_check_lib_save_LIBS
  fi
! { $as_echo $as_me:$LINENO: result: $ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean 5
! $as_echo $ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean 6; }
! if test x$ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean = xyes; then
cat confdefs.h _ACEOF
  #define HAVE_LIBXML2 1
  _ACEOF
***
*** 9152,9159 
LIBS=-lxml2 $LIBS
  
  else
!   { { $as_echo $as_me:$LINENO: error: library 'xml2' (version = 2.6.23) is required for XML support 5
! $as_echo $as_me: error: library 'xml2' (version = 2.6.23) is required for XML support 2;}
 { (exit 1); exit 1; }; }
  fi
  
--- 9152,9159 
LIBS=-lxml2 $LIBS
  
  else
!   { { $as_echo $as_me:$LINENO: error: library 'xml2' (version = 2.6.27) is required for XML support 5
! $as_echo $as_me: error: library 'xml2' (version = 2.6.27) is required for XML support 2;}
 { (exit 1); exit 1; }; }
  fi
  
Index: configure.in
===
RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/configure.in,v
retrieving revision 1.627
diff -c -r1.627 configure.in
*** configure.in	13 May 2010 22:07:42 -	1.627
--- configure.in	25 May 2010 16:22:32 -
***
*** 940,946 
  fi
  
  if test $with_libxml = yes ; then
!   AC_CHECK_LIB(xml2, xmlSaveToBuffer, [], [AC_MSG_ERROR([library 'xml2' (version = 2.6.23) is required for XML support])])
  fi
  
  if test $with_libxslt = yes ; then
--- 940,946 
  fi
  
  if test $with_libxml = yes ; then
!   AC_CHECK_LIB(xml2, 

Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
  Simon Riggs si...@2ndquadrant.com wrote:
  If we define robustness at the standby level then robustness
  depends upon unseen administrators, as well as the current
  up/down state of standbys.  This is action-at-a-distance in its
  worst form.
  
  Maybe, but I can't help thinking people are going to want some
  form of this.  The case where someone wants to do sync rep to the
  machine in the next rack over and async rep to a server at a
  remote site seems too important to ignore.
  
 I think there may be a terminology issue here -- I took configure
 by standby to mean that *at the master* you would specify rules for
 each standby.  I think Simon took it to mean that each standby would
 define the rules for replication to it.  Maybe this issue can
 resolve gracefully with a bit of clarification?

The use case of machine in the next rack over and async rep to a server
at a remote site would require the settings

server.nextrack = synch
server.remotesite = async

which leaves open the question of what happens when nextrack is down.

In many cases, to give adequate performance in that situation people add
an additional server, so the config becomes

server.nextrack1 = synch
server.nextrack2 = synch
server.remotesite = async

We then want to specify for performance reasons that we can get a reply
from either nextrack1 or nextrack2, so it all still works safely and
quickly if one of them is down. How can we express that rule concisely?
With some difficulty.

My suggestion is simply to have a single parameter (name unimportant)

number_of_synch_servers_we_wait_for = N

which is much easier to understand because it is phrased in terms of the
guarantee given to the transaction, not in terms of what the admin
thinks is the situation.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Alvaro Herrera
Excerpts from Heikki Linnakangas's message of mar may 25 04:41:30 -0400 2010:
 On 24/05/10 22:49, Alvaro Herrera wrote:

  I think this is nonsense.  If you have 3-years-old sales transactions,
  and your database has any interesting churn, tuples those pages have
  been frozen for a very long time *already*.

 What's missing from the suggestion is that relfrozenxid and datfrozenxid 
 also need to be expanded to 8-bytes. That way you effectively have 
 8-byte XIDs, which means that you never need to vacuum to avoid XID 
 wraparound.

Hmm, so are we going to use the xid epoch more officially?  That's
entirely a new line of development, perhaps it opens new possibilities.

This sounds like extending Xid to 64 bits, without having to store the
high bits everywhere.  Was this discussed in the PGCon devs meeting?

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 1:10 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
  Simon Riggs si...@2ndquadrant.com wrote:
  If we define robustness at the standby level then robustness
  depends upon unseen administrators, as well as the current
  up/down state of standbys.  This is action-at-a-distance in its
  worst form.
 
  Maybe, but I can't help thinking people are going to want some
  form of this.  The case where someone wants to do sync rep to the
  machine in the next rack over and async rep to a server at a
  remote site seems too important to ignore.

 I think there may be a terminology issue here -- I took configure
 by standby to mean that *at the master* you would specify rules for
 each standby.  I think Simon took it to mean that each standby would
 define the rules for replication to it.  Maybe this issue can
 resolve gracefully with a bit of clarification?

 The use case of machine in the next rack over and async rep to a server
 at a remote site would require the settings

 server.nextrack = synch
 server.remotesite = async

 which leaves open the question of what happens when nextrack is down.

 In many cases, to give adequate performance in that situation people add
 an additional server, so the config becomes

 server.nextrack1 = synch
 server.nextrack2 = synch
 server.remotesite = async

 We then want to specify for performance reasons that we can get a reply
 from either nextrack1 or nextrack2, so it all still works safely and
 quickly if one of them is down. How can we express that rule concisely?
 With some difficulty.

Perhaps the difficulty here is that those still look like per-server
settings to me.  Just maybe with a different set of semantics.

 My suggestion is simply to have a single parameter (name unimportant)

 number_of_synch_servers_we_wait_for = N

 which is much easier to understand because it is phrased in terms of the
 guarantee given to the transaction, not in terms of what the admin
 thinks is the situation.

So I agree that we need to talk about whether or not we want to do
this.  I'll give my opinion.  I am not sure how useful this really is.
 Consider a master with two standbys.  The master commits a
transaction and waits for one of the two standbys, then acknowledges
the commit back to the user.  Then the master crashes.  Now what?
It's not immediately obvious which standby we should being online as
the primary, and if we guess wrong we could lose transactions thought
to be committed.  This is probably a solvable problem, with enough
work: we can write a script to check the last LSN received by each of
the two standbys and promote whichever one is further along.

But... what happens if the master and one standby BOTH crash
simultaneously?  There's no way of knowing (until we get at least one
of them back up) whether it's safe to promote the other standby.

I like the idea of a quorum commit type feature where we promise the
user that things are committed when enough servers have acknowledged
the commit.  But I think most people are not going to want that
configuration unless we also provide some really good management tools
that we don't have today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confused about the buffer pool size

2010-05-25 Thread Heikki Linnakangas

On 25/05/10 19:49, MMK wrote:

Hello All:
In the code (costsize.c), I see that effective_cache_size is set to 
DEFAULT_EFFECTIVE_CACHE_SIZE.
This is defined as follows in cost.h
#define DEFAULT_EFFECTIVE_CACHE_SIZE 16384
But when I say
show shared_buffers in psql I get,
shared_buffers  28MB
In postgresql.conf file, the following lines appear
shared_buffers = 28MB   # min 128kB   # (change 
requires restart)#temp_buffers = 8MB # min 800kB

So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages.
So should effective_cache_size be set to 3584 rather than the 16384?


No. Please see the manual for what effective_cache_size means:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote:
 On Tue, May 25, 2010 at 1:10 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote:
  Robert Haas robertmh...@gmail.com wrote:
   Simon Riggs si...@2ndquadrant.com wrote:
   If we define robustness at the standby level then robustness
   depends upon unseen administrators, as well as the current
   up/down state of standbys.  This is action-at-a-distance in its
   worst form.
  
   Maybe, but I can't help thinking people are going to want some
   form of this.  The case where someone wants to do sync rep to the
   machine in the next rack over and async rep to a server at a
   remote site seems too important to ignore.
 
  I think there may be a terminology issue here -- I took configure
  by standby to mean that *at the master* you would specify rules for
  each standby.  I think Simon took it to mean that each standby would
  define the rules for replication to it.  Maybe this issue can
  resolve gracefully with a bit of clarification?
 
  The use case of machine in the next rack over and async rep to a server
  at a remote site would require the settings
 
  server.nextrack = synch
  server.remotesite = async
 
  which leaves open the question of what happens when nextrack is down.
 
  In many cases, to give adequate performance in that situation people add
  an additional server, so the config becomes
 
  server.nextrack1 = synch
  server.nextrack2 = synch
  server.remotesite = async
 
  We then want to specify for performance reasons that we can get a reply
  from either nextrack1 or nextrack2, so it all still works safely and
  quickly if one of them is down. How can we express that rule concisely?
  With some difficulty.
 
 Perhaps the difficulty here is that those still look like per-server
 settings to me.  Just maybe with a different set of semantics.

(Those are the per-server settings.)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Robert Haas
On Tue, May 25, 2010 at 1:09 PM, Mike Fowler m...@mlfowler.com wrote:
 We're unlikely to accept this patch if it changes the minimum version
 of libxml2 required to compile PostgreSQL

 Why? 2.6.27 is almost 4 years old.

Because we work hard to minimize our dependencies and make them as
non-onerous as possible.

At a minimum, I think it's fair to say that the burden is on you to
justify what it's worth bumping the version number.  If there is some
major speed or performance advantage to using the newer API, maybe
we'll consider it.  But if it's just a few extra lines of code to work
around it, then it's better to write those extra lines of code rather
than potentially force users to upgrade packages they're otherwise
happy with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote:
 On Tue, May 25, 2010 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 ...
 
  The best parameter we can specify is the number of servers that we wish
  to wait for confirmation from. That is a definition that easily manages
  the complexity of having various servers up/down at any one time. It
  also survives misconfiguration more easily, as well as providing a
  workaround if replicating across a bursty network where we can't
  guarantee response times, even of the typical response time is good.
 
 
 This may be an incredibly naive question, but what happens to the
 transaction on the master if the number of confirmations is not
 received? Is this intended to create a situation where the master
 effectively becomes unavailable for write operations when its
 synchronous slaves are unavailable?

How we handle degraded mode is important, yes. Whatever parameters we
choose the problem will remain the same.

Should we just ignore degraded mode and respond as if nothing bad had
happened? Most people would say not.

If we specify server1 = synch and server2 = async we then also need to
specify what happens if server1 is down. People might often specify
if (server1 == down) server2 = synch.
So now we have 3 configuration settings, one quite complex.

It's much easier to say you want to wait for N servers to respond, but
don't care which they are. One parameter, simple and flexible.

In both cases, we have to figure what to do if we can't get either
server to respond. In replication there is no such thing as server
down just a server didn't reply in time X. So we need to define
timeouts.

So whatever we do, we need additional parameters to specify timeouts
(including wait-forever as an option) and action-on-timeout: commit or
rollback. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Michael Tharp

On 05/25/2010 01:09 PM, Mike Fowler wrote:

Why? 2.6.27 is almost 4 years old.


RHEL 5 ships with 2.6.26. I imagine that supporting it is very 
desirable, regardless of its age, since that is unfortunately still the 
latest version of RHEL.


-- m. tharp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote:
 So I agree that we need to talk about whether or not we want to do
 this.  I'll give my opinion.  I am not sure how useful this really is.
  Consider a master with two standbys.  The master commits a
 transaction and waits for one of the two standbys, then acknowledges
 the commit back to the user.  Then the master crashes.  Now what?
 It's not immediately obvious which standby we should being online as
 the primary, and if we guess wrong we could lose transactions thought
 to be committed.  This is probably a solvable problem, with enough
 work: we can write a script to check the last LSN received by each of
 the two standbys and promote whichever one is further along.
 
 But... what happens if the master and one standby BOTH crash
 simultaneously?  There's no way of knowing (until we get at least one
 of them back up) whether it's safe to promote the other standby.

Not much of a problem really, is it? If you have one server left out of
3, then you promote it OR you stay down - your choice.

There is no safe to promote knowledge in *any* scenario; you never
know what was on the primary, only what was received by the standby. If
you have N standbys still up, you can pick which using the algorithm you
mention.

Remember that the WAL is sequential, so its not like the commit order of
transactions will differ across servers if we use quorum commit. So not
a problem.

The multiple simultaneous case is fairly common for people that pick the
synch to server in next rack because there's a 100 reasons why we'd
take out both at the same time, ask JD.

 I like the idea of a quorum commit type feature where we promise the
 user that things are committed when enough servers have acknowledged
 the commit.  But I think most people are not going to want that
 configuration unless we also provide some really good management tools
 that we don't have today.

Good name.

Management tools has nothing to do with this; completely orthogonal.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Alastair Turner
On Tue, May 25, 2010 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
...

 The best parameter we can specify is the number of servers that we wish
 to wait for confirmation from. That is a definition that easily manages
 the complexity of having various servers up/down at any one time. It
 also survives misconfiguration more easily, as well as providing a
 workaround if replicating across a bursty network where we can't
 guarantee response times, even of the typical response time is good.


This may be an incredibly naive question, but what happens to the
transaction on the master if the number of confirmations is not
received? Is this intended to create a situation where the master
effectively becomes unavailable for write operations when its
synchronous slaves are unavailable?

Alastair Bell Turner

^F5

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Andrew Dunstan



Robert Haas wrote:

On Tue, May 25, 2010 at 1:09 PM, Mike Fowler m...@mlfowler.com wrote:
  

We're unlikely to accept this patch if it changes the minimum version
of libxml2 required to compile PostgreSQL
  

Why? 2.6.27 is almost 4 years old.



Because we work hard to minimize our dependencies and make them as
non-onerous as possible.

At a minimum, I think it's fair to say that the burden is on you to
justify what it's worth bumping the version number.  If there is some
major speed or performance advantage to using the newer API, maybe
we'll consider it.  But if it's just a few extra lines of code to work
around it, then it's better to write those extra lines of code rather
than potentially force users to upgrade packages they're otherwise
happy with.

  


The real issue is what's going to be available on most of the platforms 
we build on. Unfortunately, 2.6.26 is what's on my CentOS 5.4 boxes, for 
example. I'm sure we don't want to make 9.1 not buildable with the 
installed libraries on still fairly current RedHat-derived platforms.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 25, 2010 at 1:09 PM, Mike Fowler m...@mlfowler.com wrote:
 We're unlikely to accept this patch if it changes the minimum version
 of libxml2 required to compile PostgreSQL
 
 Why? 2.6.27 is almost 4 years old.

 Because we work hard to minimize our dependencies and make them as
 non-onerous as possible.

 At a minimum, I think it's fair to say that the burden is on you to
 justify what it's worth bumping the version number.

Yes.  Increasing the minimum required version of some library is a Big
Deal, we don't do it on a whim.  And we definitely don't do it just
because it's old.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 This sounds like extending Xid to 64 bits, without having to store the
 high bits everywhere.  Was this discussed in the PGCon devs meeting?

Yeah, that's what it would amount to.  It was not discussed at the dev
meeting --- it was an idea that came up one evening at PGCon.

I'm not sure whether this would imply having to widen xid to 64 bits
internally.  That could be a bit unpleasant as far as CPU and shared
memory space go, although every year that goes by makes 32-bit machines
less interesting as DB servers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Dan Ports
On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote:
 I don't understand the problem. According to me, in the context of
 SSI, a read-only slave can just map SERIALIZABLE to the technical
 implementation of REPEATABLE READ (i.e., the currently-existing
 SERIALIZABLE). The union of the transactions on the master and the
 slave(s) will still exhibit SERIALIZABLE behavior because the
 transactions on the slave cannot write anything and are therefore
 irrelevant.

This, unfortunately, isn't true in SSI.

Consider read-only transactions on a single node SSI database -- the
situation is the same for read-only transactions that run on a slave. 
These transactions can be part of anomalies, so they need to be checked
for conflicts and potentially aborted.

Consider Kevin's favorite example, where one table contains the current
date and the other is a list of receipts (initially empty). 
  T1 inserts (select current_date) into receipts, but doesn't commit
  T2 increments current_date and commits
  T3 reads both current_date and the receipt table
  T1 commits
  
T3, which is a read-only transaction, sees the incremented date and an
empty list of receipts. But T1 later commits a new entry in the
receipts table with the old date. No serializable ordering allows this.
However, if T3 hadn't performed its read, there'd be no problem; we'd
just serialize T1 before T2 and no one would be the wiser.

SSI would detect a potential conflict here, which we could resolve by
aborting T3. (We could also abort T1, but if this is a replicated
system this isn't always an option -- T3 might be running on the
slave, so only the slave will know about the conflict, and it can't
very well abort an update transaction on the master.)


There's another example of a read-only transaction anomaly that could
cause similar problems at
http://portal.acm.org/citation.cfm?doid=1031570.1031573, but I think
this one is easier to follow.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 20:18 , Dan Ports wrote:
 On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote:
 I don't understand the problem. According to me, in the context of
 SSI, a read-only slave can just map SERIALIZABLE to the technical
 implementation of REPEATABLE READ (i.e., the currently-existing
 SERIALIZABLE). The union of the transactions on the master and the
 slave(s) will still exhibit SERIALIZABLE behavior because the
 transactions on the slave cannot write anything and are therefore
 irrelevant.
 
 This, unfortunately, isn't true in SSI.
 
 Consider read-only transactions on a single node SSI database -- the
 situation is the same for read-only transactions that run on a slave. 
 These transactions can be part of anomalies, so they need to be checked
 for conflicts and potentially aborted.
 
 Consider Kevin's favorite example, where one table contains the current
 date and the other is a list of receipts (initially empty). 
  T1 inserts (select current_date) into receipts, but doesn't commit
  T2 increments current_date and commits
  T3 reads both current_date and the receipt table
  T1 commits
 
 T3, which is a read-only transaction, sees the incremented date and an
 empty list of receipts. But T1 later commits a new entry in the
 receipts table with the old date. No serializable ordering allows this.
 
 However, if T3 hadn't performed its read, there'd be no problem; we'd
 just serialize T1 before T2 and no one would be the wiser.

Hm, so in fact SSI sometimes allows the database to be inconsistent, but only 
as long as nobody tries to observe it?

Btw, I still don't get how this follows from the Cahill paper. For a 
transaction to lie on a dangerous circle, it needs incoming and outgoing edges 
in the conflict graph, right? But I'd have though that conflicts are always 
between a reader and a writer or between two writers. So how can a read-only 
transaction have incoming and outgoing edges?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-25 Thread Alvaro Herrera
Excerpts from Jesper Krogh's message of mié may 19 15:01:18 -0400 2010:

 But the distribution is very flat at the end, the last 128 values are 
 excactly
 1.00189e-05
 which means that any term sitting outside the array would get an estimate of
 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows

I don't know if this is related, but tsvector stats are computed and
stored per term, not per datum.  This is different from all other
datatypes.  Maybe there's code somewhere that's assuming per-datum and
coming up with the wrong estimates?  Or maybe the tsvector-specific code
contains a bug somewhere; maybe a rounding error?

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 
 Hm, so in fact SSI sometimes allows the database to be
 inconsistent, but only as long as nobody tries to observe it?
 
Not exactly.  The eventually-persisted state is always consistent,
but there can be a transitory committed state which would violate
user-defined constraints or business rules *if viewed*.  This is
what I've been on about -- the commit sequence is not necessarily
the same as the apparent order of execution.  A read-only
transaction, if run before the overlapping commits settle, can
view a state which is not consistent with any serial order of
execution, and might therefore break the rules.  SSI detects that
and rolls one of the transactions back if they're all running at
serializable transaction isolation in a single SSI database, but the
question is how to handle this when the read happens in a replica.
 
 Btw, I still don't get how this follows from the Cahill paper. For
 a transaction to lie on a dangerous circle, it needs incoming and
 outgoing edges in the conflict graph, right?
 
At least one of the transactions participating in the cycle does. 
There's no requirement that they all do.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Dan Ports
On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote:
 Hm, so in fact SSI sometimes allows the database to be inconsistent, but only 
 as long as nobody tries to observe it?

Yes. Note that even while it's in an inconsistent state, you can still
perform any query that doesn't observe the inconsistency -- hopefully
most queries fall into this category.

 Btw, I still don't get how this follows from the Cahill paper. For a 
 transaction to lie on a dangerous circle, it needs incoming and outgoing 
 edges in the conflict graph, right? But I'd have though that conflicts are 
 always between a reader and a writer or between two writers. So how can a 
 read-only transaction have incoming and outgoing edges?

Right, the read-only transaction can't have incoming edges, but it can
have outgoing edges. So it can't be the pivot itself (the transaction
with both outgoing and incoming edges), but it can cause *another*
transaction to be.

In the example I gave, T3 (the r/o transaction) has an outgoing edge to
T1, because it didn't see T1's concurrent update. T1 already had an
outgoing edge to T2, so adding in this incoming edge from T3 creates
the dangerous structure.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Jan Wieck janwi...@yahoo.com wrote:
 
 Have you ever looked at one of those queries, that Londiste or
 Slony issue against the provider DB in order to get all the log
 data that has been committed between two snapshots? Is that really
 the best you can think of?
 
No, I admit I haven't.  In fact, I was thinking primarily in terms
of log-driven situations, like HS.  What would be the best place for
me to look to come up to speed on your use case?  (I'm relatively
sure that the issue isn't that there's no information to find, but
that a sequential pass over all available information would take a
*long* time.)  I've been working through the issues on WAL-based
replicas, and have some additional ideas and alternatives, but I'd
like to see the big picture, including trigger-based replication,
before posting.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Yeb Havinga

Simon Riggs wrote:

How we handle degraded mode is important, yes. Whatever parameters we
choose the problem will remain the same.

Should we just ignore degraded mode and respond as if nothing bad had
happened? Most people would say not.

If we specify server1 = synch and server2 = async we then also need to
specify what happens if server1 is down. People might often specify
if (server1 == down) server2 = synch.
  
I have a hard time imagining including async servers in the quorum. If 
an async servers vote is necessary to reach quorum due to a 'real' sync 
standby server failure, it would mean that the async-intended standby is 
now also in sync with the master transactions. IMHO this is a bad 
situation, since instead of the DBA getting the error: not enough sync 
standbys to reach quorum, he'll now get database is slow complaints, 
only to find out later that too much sync standby servers went south. 
(under the assumption that async servers are mostly on too slow links to 
consider for sync standby).


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Dimitri Fontaine
Hi,

Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote:
 On Tue, May 25, 2010 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  The best parameter we can specify is the number of servers that we wish
  to wait for confirmation from. 
 
 This may be an incredibly naive question, but what happens to the
 transaction on the master if the number of confirmations is not
 received? 

 It's much easier to say you want to wait for N servers to respond, but
 don't care which they are. One parameter, simple and flexible.
[...]
 So whatever we do, we need additional parameters to specify timeouts
 (including wait-forever as an option) and action-on-timeout: commit or
 rollback. 

I was preparing an email on the line that we need each slave to declare
its desired minimum level of synchronicity, and have the master filter
that with what the transaction wants.

Scratch that.

Thinking about it some more, I see that Simon's proposal is both more
simple and effective: we already have Hot Standby and admin functions
that tells us the last replayed LSN. The bigger wins. So in case of
failover we know which slave to choose.

The only use case I can see for what I had in mind is to allow the user
to choose which server is trusted to have accurate data or better read
only performances. But if the link is slow, the code will soon enough
notice, mind you.

I'm still not sure about my preference here, but I can see why Simon's
proposal is simpler and addresses all concerns apart from forcing the
servers into a non-optimal setup for a gain that is uneasy to see.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confused about the buffer pool size

2010-05-25 Thread MMK
Hello Heikki:

This is what the documentation says (see below).

But it does not tell my anything about what the actual buffer size is.
How do I know what the real buffer size is? I am using 8.4.4 and I am running 
only one query at a time.

Cheers,

MMK.

Sets the planner's assumption about the effective size of the disk
cache that is available to a single query. This is factored into
estimates of the cost of using an index; a higher value makes it more
likely index scans will be used, a lower value makes it more likely
sequential scans will be used. When setting this parameter you should
consider both PostgreSQL's shared buffers and the portion of the 
kernel's disk cache that will be used for PostgreSQL
data files. Also, take into account the expected number of concurrent
queries on different tables, since they will have to share the
available space. This parameter has no effect on the size of shared
memory allocated by PostgreSQL, nor
does it reserve kernel disk cache; it is used only for estimation
purposes. The default is 128 megabytes (128MB).



--- On Tue, 5/25/10, Heikki Linnakangas heikki.linnakan...@enterprisedb.com 
wrote:

From: Heikki Linnakangas heikki.linnakan...@enterprisedb.com
Subject: Re: [HACKERS] Confused about the buffer pool size
To: MMK bom...@yahoo.com
Cc: PostgreSQL-development pgsql-hackers@postgresql.org
Date: Tuesday, May 25, 2010, 11:36 AM

On 25/05/10 19:49, MMK wrote:
 Hello All:
 In the code (costsize.c), I see that effective_cache_size is set to 
 DEFAULT_EFFECTIVE_CACHE_SIZE.
 This is defined as follows in cost.h
 #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384
 But when I say
 show shared_buffers in psql I get,
 shared_buffers  28MB
 In postgresql.conf file, the following lines appear
 shared_buffers = 28MB                   # min 128kB           # (change 
 requires restart)#temp_buffers = 8MB                     # min 800kB
 
 So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages.
 So should effective_cache_size be set to 3584 rather than the 16384?

No. Please see the manual for what effective_cache_size means:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE

--   Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



  

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Dan Ports d...@csail.mit.edu:

 On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote:

 I don't understand the problem. According to me, in the context of
 SSI, a read-only slave can just map SERIALIZABLE to the technical
 implementation of REPEATABLE READ (i.e., the currently-existing
 SERIALIZABLE). The union of the transactions on the master and the
 slave(s) will still exhibit SERIALIZABLE behavior because the
 transactions on the slave cannot write anything and are therefore
 irrelevant.

 This, unfortunately, isn't true in SSI.

 Consider read-only transactions on a single node SSI database -- the
 situation is the same for read-only transactions that run on a slave.
 These transactions can be part of anomalies, so they need to be checked
 for conflicts and potentially aborted.

 Consider Kevin's favorite example, where one table contains the current
 date and the other is a list of receipts (initially empty).
  T1 inserts (select current_date) into receipts, but doesn't commit
  T2 increments current_date and commits
  T3 reads both current_date and the receipt table
  T1 commits

 T3, which is a read-only transaction, sees the incremented date and an
 empty list of receipts. But T1 later commits a new entry in the
 receipts table with the old date. No serializable ordering allows this.
 However, if T3 hadn't performed its read, there'd be no problem; we'd
 just serialize T1 before T2 and no one would be the wiser.

 SSI would detect a potential conflict here, which we could resolve by
 aborting T3. (We could also abort T1, but if this is a replicated
 system this isn't always an option -- T3 might be running on the
 slave, so only the slave will know about the conflict, and it can't
 very well abort an update transaction on the master.)

Ah, indeed. I made the same reasoning mistake as Florian (presumably)
did: I didn't think of the fact that the read-only transaction doesn't
need to be the pivot.

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 20:48 , Dan Ports wrote:
 On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote:
 Hm, so in fact SSI sometimes allows the database to be inconsistent, but 
 only as long as nobody tries to observe it?
 
 Yes. Note that even while it's in an inconsistent state, you can still
 perform any query that doesn't observe the inconsistency -- hopefully
 most queries fall into this category.

Yeah, as long as you just walk by without looking, the database is happy ;-)

 Btw, I still don't get how this follows from the Cahill paper. For a 
 transaction to lie on a dangerous circle, it needs incoming and outgoing 
 edges in the conflict graph, right? But I'd have though that conflicts are 
 always between a reader and a writer or between two writers. So how can a 
 read-only transaction have incoming and outgoing edges?
 
 Right, the read-only transaction can't have incoming edges, but it can
 have outgoing edges. So it can't be the pivot itself (the transaction
 with both outgoing and incoming edges), but it can cause *another*
 transaction to be.
 
 In the example I gave, T3 (the r/o transaction) has an outgoing edge to
 T1, because it didn't see T1's concurrent update. T1 already had an
 outgoing edge to T2, so adding in this incoming edge from T3 creates
 the dangerous structure.

Hm, but for there to be an actual problem (and not a false positive), an actual 
dangerous circle has to exist in the dependency graph. The existence of a 
dangerous structure is just a necessary (but not sufficient) and easily 
checked-for condition for that, right? Now, if a read-only transaction only 
ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and 
hence any dangerous structure it is part of is a false positive.

I guess my line of reasoning is flawed somehow, but I cannot figure out why...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Florian Pflug f...@phlo.org:

 On May 25, 2010, at 20:18 , Dan Ports wrote:

 T3, which is a read-only transaction, sees the incremented date and an
 empty list of receipts. But T1 later commits a new entry in the
 receipts table with the old date. No serializable ordering allows this.

 However, if T3 hadn't performed its read, there'd be no problem; we'd
 just serialize T1 before T2 and no one would be the wiser.

 Hm, so in fact SSI sometimes allows the database to be inconsistent, but only 
 as long as nobody tries to observe it?

I would not call this an inconsistent state: it would become
inconsistent only after someone (e.g., T3) has observed it _and_ T1
commits.

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: Hiding data in postgresql

2010-05-25 Thread Hector Beyers
Hi guys,

(I tried the question in another forum first)

Does someone have any ideas how I can hide data without the meta data
noticing? To explain further, I would like to save some collection of data
where the meta-data does not see it. I am trying to do some security through
obscurity. It is for research purposes.

For example, populate a table with 1000 rows, but the meta-data only knows
of about 500 of them? Only on an export of a dump can you find all the data
again. Or maybe to make a hidden duplicate schema that can point to the
hidden data?

Does someone have any good ideas on how to achieve this or something
similar?

Kind regards
Hector



On Mon, May 24, 2010 at 9:16 PM, Hector Beyers hqbey...@gmail.com wrote:


 Hi guys,

 does ANYONE have any tips on hiding data on a database server? This means
 that data is stored in places that is not necessarily picked up in the
 schema of the database. I am doing some research on databases and need some
 direction.

 Any help or direction will be highly appreciated.

 Kind regards

 Hector




Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Florian Pflug f...@phlo.org:

 Hm, but for there to be an actual problem (and not a false positive), an
 actual dangerous circle has to exist in the dependency graph. The
 existence of a dangerous structure is just a necessary (but not
 sufficient) and easily checked-for condition for that, right? Now, if a
 read-only transaction only ever has outgoing edges, it cannot be part
 of a (dangerous or not) circle, and hence any dangerous structure it is
 part of is a false positive.

 I guess my line of reasoning is flawed somehow, but I cannot figure out why...

In the general case, wr dependencies also create must be serialized
before edges. It seems that those edges can be discarded when finding
a pivot, but if you want to go back to basics:

( means must be serialized before.)

* T1  T2, because T1 reads a version of a data element for which T2
later creates a newer version (rw between T1 and T2).
* T3  T1, because T3 reads a version of a data element for which T1
later creates a newer version (rw between T3 and T1).
* T2  T3, because T2 creates a version of a data element, which is
then read by T3 (wr between T2 and T3).

(As you can see, those 3 edges form a cycle.)

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 
 Hm, but for there to be an actual problem (and not a false
 positive), an actual dangerous circle has to exist in the
 dependency graph. The existence of a dangerous structure is just a
 necessary (but not sufficient) and easily checked-for condition
 for that, right? Now, if a read-only transaction only ever has
 outgoing edges, it cannot be part of a (dangerous or not) circle,
 and hence any dangerous structure it is part of is a false
 positive.
 
 I guess my line of reasoning is flawed somehow, but I cannot
 figure out why...
 
Here's why:
 
We're tracking rw-dependencies, where the time-arrow showing
effective order of execution points from the reader to the writer
(since the reader sees a state prior to the write, it effectively
executes before it).  These are important because there have to be
two such dependencies, one in to the pivot and one out from the
pivot, for a problem to exist.  (See various works by Dr. Alan
Fekete, et al, for details.)  But other dependencies can imply an
order of execution.  In particular, a wr-dependency, where a
transaction *can* see data committed by another transaction, implies
that the *writer* came first in the order of execution.  In this
example, the transaction which lists the receipts successfully reads
the control table update, but is not able to read the receipt
insert.  This completes the cycle, making it a real anomaly and not
a false positive.
 
Note that the wr-dependency can actually exist outside the database,
making it pretty much impossible to accurately tell a false positive
from a true anomaly when the pivot exists and the transaction
writing data which the pivot can't read commits first.  For example,
let's say that the update to the control table is committed from an
application which, seeing that its update came back without error,
proceeds to list the receipts for the old date in a subsequent
transaction.  You have a wr-dependency which is, in reality, quite
real and solid with no way to notice it within the database engine. 
That's why the techniques used in SSI are pretty hard to improve
upon beyond more detailed and accurate tracking of rw-conflicts.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/24/2010 9:30 AM, Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


In light of the proposed purging scheme, how would it be able to distinguish 
between those two cases (nothing there yet vs. was there but purged)?



There is a difference between an empty result set and an exception.


No, I meant how will the *function* know, if a superuser and/or some 
background process can purge records at any time?


The data contains timestamps which are supposedly taken in commit order. 
Checking the age of the last entry in the file should be simple enough 
to determine if the segment matches the max age configuration (if 
set). In the case of a superuser telling what to purge he would just 
call a function with a serial number (telling the obsolete segments).



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] mergejoin null handling (was Re: [PERFORM] merge join killing performance)

2010-05-25 Thread Tom Lane
Scott Marlowe scott.marl...@gmail.com writes:
 So, Tom, so you think it's possible that the planner isn't noticing
 all those nulls and thinks it'll just take a row or two to get to the
 value it needs to join on?

I dug through this and have concluded that it's really an oversight in
the patch I wrote some years ago in response to this:
http://archives.postgresql.org/pgsql-performance/2005-05/msg00219.php

That patch taught nodeMergejoin that a row containing a NULL key can't
possibly match anything on the other side.  However, its response to
observing a NULL is just to advance to the next row of that input.
What we should do, if the NULL is in the first merge column and the sort
order is nulls-high, is realize that every following row in that input
must also contain a NULL and so we can just terminate the mergejoin
immediately.  The original patch works well for cases where there are
just a few nulls in one input and the important factor is to not read
all the rest of the other input --- but it fails to cover the case where
there are many nulls and the important factor is to not read all the
rest of the nulls.  The problem can be demonstrated if you modify the
example given in the above-referenced message so that table t1 contains
lots of nulls rather than just a few: explain analyze will show that
all of t1 gets read by the mergejoin, and that's not necessary.

I'm inclined to think this is a performance bug and should be
back-patched, assuming the fix is simple (which I think it is, but
haven't coded/tested yet).  It'd probably be reasonable to go back to
8.3; before that, sorting nulls high versus nulls low was pretty poorly
defined and so there'd be risk of breaking cases that gave the right
answers before.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Hiding data in postgresql

2010-05-25 Thread Joseph Adams
On Tue, May 25, 2010 at 3:39 PM, Hector Beyers hqbey...@gmail.com wrote:

 Hi guys,
 (I tried the question in another forum first)
 Does someone have any ideas how I can hide data without the meta data
 noticing? To explain further, I would like to save some collection of data
 where the meta-data does not see it. I am trying to do some security through
 obscurity. It is for research purposes.
 For example, populate a table with 1000 rows, but the meta-data only knows
 of about 500 of them? Only on an export of a dump can you find all the data
 again. Or maybe to make a hidden duplicate schema that can point to the
 hidden data?
 Does someone have any good ideas on how to achieve this or something
 similar?
 Kind regards
 Hector


 On Mon, May 24, 2010 at 9:16 PM, Hector Beyers hqbey...@gmail.com wrote:

 Hi guys,
 does ANYONE have any tips on hiding data on a database server? This means
 that data is stored in places that is not necessarily picked up in the
 schema of the database. I am doing some research on databases and need some
 direction.
 Any help or direction will be highly appreciated.
 Kind regards
 Hector

Not sure if this helpful, but be sure to know about views, which can
be used to filter out rows of a table.  Example:

CREATE TABLE foo (name TEXT, visible BOOL);
INSERT INTO foo VALUES ('two', true);
INSERT INTO foo VALUES ('three', true);
INSERT INTO foo VALUES ('four', false);
INSERT INTO foo VALUES ('five', true);
INSERT INTO foo VALUES ('six', false);
INSERT INTO foo VALUES ('seven', true);
INSERT INTO foo VALUES ('eight', false);
INSERT INTO foo VALUES ('nine', false);
INSERT INTO foo VALUES ('ten', false);
INSERT INTO foo VALUES ('eleven', true);

CREATE VIEW foo_view AS SELECT foo.name FROM foo WHERE visible=true;

= SELECT * FROM foo;
  name  | visible
+-
 two| t
 three  | t
 four   | f
 five   | t
 six| f
 seven  | t
 eight  | f
 nine   | f
 ten| f
 eleven | t
(10 rows)

= SELECT * FROM foo_view;
  name

 two
 three
 five
 seven
 eleven
(5 rows)

Note that views are SELECT-only, but you can use CREATE RULE to
simulate an updatable view.

You may also want to read about Veil:
http://veil.projects.postgresql.org/curdocs/main.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Tom Lane
Jan Wieck janwi...@yahoo.com writes:
 No, I meant how will the *function* know, if a superuser and/or some 
 background process can purge records at any time?

 The data contains timestamps which are supposedly taken in commit order. 

You can *not* rely on the commit timestamps to be in exact order.
(Perhaps approximate ordering is good enough for what you want here,
but just be careful to not fall into the trap of assuming that they're
exactly ordered.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 21:19 +0200, Yeb Havinga wrote:
 Simon Riggs wrote:
  How we handle degraded mode is important, yes. Whatever parameters we
  choose the problem will remain the same.
 
  Should we just ignore degraded mode and respond as if nothing bad had
  happened? Most people would say not.
 
  If we specify server1 = synch and server2 = async we then also need to
  specify what happens if server1 is down. People might often specify
  if (server1 == down) server2 = synch.

 I have a hard time imagining including async servers in the quorum. If 
 an async servers vote is necessary to reach quorum due to a 'real' sync 
 standby server failure, it would mean that the async-intended standby is 
 now also in sync with the master transactions. IMHO this is a bad 
 situation, since instead of the DBA getting the error: not enough sync 
 standbys to reach quorum, he'll now get database is slow complaints, 
 only to find out later that too much sync standby servers went south. 
 (under the assumption that async servers are mostly on too slow links to 
 consider for sync standby).

Yeh, there's difficulty either way. 

We don't need to think of servers as being synch or async, more
likely we would rate them in terms of typical synchronisation delay. So
yeh, calling them fast and slow in terms of synchronisation delay
makes sense.

Some people with low xact rate and high need for protection might want
to switch across to the slow server and keep running. If not, the
max_synch_delay would trip and you would then select
synch_failure_action = rollback. 

The realistic response is to add a second fast sync server, to allow
you to stay up even when you lose one of the fast servers. That now
gives you 4 servers and the failure modes start to get real complex.

Specifying rules to achieve what you're after would be much harder. Some
people might want that, but most people won't in the general case and if
they did specify them they'd likely get them wrong.

All of these issues show why I want to specify the synchronisation mode
as a USERSET. That will allow us to specify more easily which parts of
our application are important when the cluster is degraded and which
data is so critical it must reach multiple servers.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 maybe we should get serializable working and committed on one
 node first and then worry about how to distribute it.  I think
 there might be other approaches to this problem
 
Well, I've got two or three other ideas on how we can manage this
for HS, but since I now realize that I've totally misunderstood the
main use case for this (which is to support trigger-based
replication), I'd like to be clear on something before letting it
drop.  The big question is, do such replicas need to support
serializable access to the data modified by serializable
transactions in the source database?  That is, is there a need for
such replicas to only see states which are possible in some serial
order of execution of serializable transactions on the source
database?  Or to phrase the same question a third way, should there
be a way to run queries on such replicas with confidence that what
is viewed is consistent with user-defined constraints and business
rules?
 
If not, there's no intersection between this feature and SSI.  If
there is, I think we should think through at least a general
strategy sooner, rather than later.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/25/2010 12:03 PM, Simon Riggs wrote:

On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:


In some systems (data warehousing, replication), the order of commits is
important, since that is the order in which changes have become visible.
This information could theoretically be extracted from the WAL, but
scanning the entire WAL just to extract this tidbit of information would
be excruciatingly painful.


I think it would be quite simple to read WAL. WALSender reads the WAL
file after its been flushed, so it would be simple for it to read a blob
of WAL and then extract the commit order from it.

Overall though, it would be easier and more efficient to *add* info to
WAL and then do all this processing *after* WAL has been transported
elsewhere. Extracting info with DDL triggers, normal triggers, commit
order and everything else seems like too much work to me. Every other
RDBMS has moved away from trigger-based replication and we should give
that serious consideration also.


Reading the entire WAL just to find all COMMIT records, then go back to 
the origin database to get the actual replication log you're looking for 
is simpler and more efficient? I don't think so.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Josh Berkus
Alvaro,

 This sounds like extending Xid to 64 bits, without having to store the
 high bits everywhere.  Was this discussed in the PGCon devs meeting?

Essentially, yes.

One of the main objections to raising XID to 64-bit has been the per-row
overhead.  But adding 4 bytes per page wouldn't be much of an impact.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote:
 On 5/25/2010 12:03 PM, Simon Riggs wrote:
  On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
  
  In some systems (data warehousing, replication), the order of commits is
  important, since that is the order in which changes have become visible.
  This information could theoretically be extracted from the WAL, but
  scanning the entire WAL just to extract this tidbit of information would
  be excruciatingly painful.
  
  I think it would be quite simple to read WAL. WALSender reads the WAL
  file after its been flushed, so it would be simple for it to read a blob
  of WAL and then extract the commit order from it.
  
  Overall though, it would be easier and more efficient to *add* info to
  WAL and then do all this processing *after* WAL has been transported
  elsewhere. Extracting info with DDL triggers, normal triggers, commit
  order and everything else seems like too much work to me. Every other
  RDBMS has moved away from trigger-based replication and we should give
  that serious consideration also.
 
 Reading the entire WAL just to find all COMMIT records, then go back to 
 the origin database to get the actual replication log you're looking for 
 is simpler and more efficient? I don't think so.

Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using exact
same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low latency
as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Jan Wieck

On 5/24/2010 9:30 AM, Heikki Linnakangas wrote:

On 22/05/10 16:35, Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

   From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:


Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak.


Me.


 Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG.


Hmm, we don't rely on setting hint bits to truncate CLOG anymore 
(http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php). 
It's the replacement of xids with FrozenXid that matters, the hint bits 
are really just hints.


Doesn't change the conclusion, though: you still need to replace XIDs 
with FrozenXids to truncate the clog. Conceivably we could keep around 
more than 2^32 transactions in clog with this scheme, but then you need 
a lot more space for the clog. But perhaps it would be better to do that 
than to launch anti-wraparound vacuums, or to refuse more updates in the 
extreme cases.


Correct. The problem actually are aborted transactions. Just because an 
XID is really old doesn't mean it was committed.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confused about the buffer pool size

2010-05-25 Thread Josh Berkus
MMK,

 But it does not tell my anything about what the actual buffer size is.
 How do I know what the real buffer size is? I am using 8.4.4 and I am
 running only one query at a time.

Please move this discussion to the pgsql-general or pgsql-performance
lists.  pgsql-hackers is for working on PostgreSQL code, and further
questions on this list will probably not be answered.

Other than that, I have no idea what you mean by buffer size, nor why
you need to know it.  I'd suggest starting your post on the other
mailing list by explaining what specific problem you're trying to solve.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Josh Berkus

 Correct. The problem actually are aborted transactions. Just because an
 XID is really old doesn't mean it was committed.

Yes, that's the main issue with my idea; XIDs which fell off the CLOG
would become visible even if they'd aborted.

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/25/2010 4:50 PM, Simon Riggs wrote:

On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote:

On 5/25/2010 12:03 PM, Simon Riggs wrote:
 On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
 
 In some systems (data warehousing, replication), the order of commits is

 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.
 
 I think it would be quite simple to read WAL. WALSender reads the WAL

 file after its been flushed, so it would be simple for it to read a blob
 of WAL and then extract the commit order from it.
 
 Overall though, it would be easier and more efficient to *add* info to

 WAL and then do all this processing *after* WAL has been transported
 elsewhere. Extracting info with DDL triggers, normal triggers, commit
 order and everything else seems like too much work to me. Every other
 RDBMS has moved away from trigger-based replication and we should give
 that serious consideration also.

Reading the entire WAL just to find all COMMIT records, then go back to 
the origin database to get the actual replication log you're looking for 
is simpler and more efficient? I don't think so.


Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using exact
same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low latency
as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.



Which means that if I want to allow a consumer of that commit order data 
to go offline for three days or so to replicate the 5 requested, low 
volume tables, the origin needs to hang on to the entire WAL log from 
all 100 other high volume tables?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-25 Thread Sam Vilain
Florian Pflug wrote:
 On May 25, 2010, at 12:18 , Heikki Linnakangas wrote:
   
 On 25/05/10 13:03, Florian Pflug wrote:
 
 On May 25, 2010, at 6:08 , Sam Vilain wrote:
   
 http://www.postgresql.org/docs/8.4/static/sql-savepoint.html

 Lead us to believe that if you roll back to the same savepoint name
 twice in a row, that you might start walking back through the
 savepoints.  I guess I missed the note on ROLLBACK TO SAVEPOINT that
 that is not how it works.

 Here is the section:

 SQL requires a savepoint to be destroyed automatically when another
 savepoint with the same name is established. In PostgreSQL, the old
 savepoint is kept, though only the more recent one will be used when
 rolling back or releasing. (Releasing the newer savepoint will cause the
 older one to again become accessible to ROLLBACK TO SAVEPOINT and
 RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.
 
 I'm confused. The sentence in brackets Releasing the newer savepoint will 
 cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and 
 RELEASE SAVEPOINT implies that you *will* walk backwards through all the 
 savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? 
 If that is not how it actually works, then this whole paragraph is wrong, 
 I'd say.
   
 Releasing the newer savepoint will cause the older one to again become 
 accessible, as the doc says, but rolling back to a savepoint does not 
 implicitly release it. You'll have to use RELEASE SAVEPOINT for that.
 

 Ah, now I get it. Thanks.

 Would changing Releasing the newer savepoint will cause ...  to Explicitly 
 releasing the newer savepoint or maybe even Explicitly releasing the newer 
 savepoint with RELEASE SAVEPOINT will cause ... make things clearer?
   

Yes, probably - your misreading matches my misreading of it :-)

There is another way you can get there - releasing to a savepoint before
the re-used savepoint name will also release the savepoints after it.

ie

   savepoint foo;
   savepoint bar;
   savepoint foo;
   release to savepoint bar;
   release to savepoint foo;

After the first release, the second 'foo' savepoint is gone.  I think
this is a key advantage in saving the old savepoints.

Cheers,
Sam

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-25 Thread Jan Urbański
On 19/05/10 21:01, Jesper Krogh wrote:
 The document base is arount 350.000 documents and
 I have set the statistics target on the tsvector column
 to 1000 since the 100 seems way of.

So for tsvectors the statistics target means more or less at any time
track at most 10 * target lexemes simultaneously where track means
keeping them in memory while going through the tuples being analysed.

Remember that the measure is in lexemes, not whole tsvectors and the 10
factor is meant to approximate the average number of unique lexemes in a
tsvector. If your documents are very large, this might not be a good
approximation.

 # ANALYZE verbose reference (document_tsvector);
 INFO:  analyzing reference
 INFO:  reference: scanned 14486 of 14486 pages, containing 350174 live
 rows and 6027 dead rows; 30 rows in sample, 350174 estimated total rows
 ANALYZE
 
 Ok, so analyze allmost examined all rows. Looking into
 most_common_freqs I find
 # select count(unnest) from (select unnest(most_common_freqs) from
 pg_stats where attname = 'document_tsvector') as foo;
  count
 ---
   2810
 (1 row)

So the size of the most_common_freqs and most_common_vals rows in
pg_statistics for tsvectors has an upper bound of stats-target * 10
(for the same reasons as mentioned before) and holds lexemes (not whole
tsvectors). What happens also is that lexemes that where seen only one
while going through the analysed set are discarded, so that's why you
can actually get less entries in these arrays, even if your document set
is big.


 But the distribution is very flat at the end, the last 128 values are
 excactly
 1.00189e-05
 which means that any term sitting outside the array would get an
 estimate of
 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows

Yeah, this might meant that you could try cranking up the stats target a
lot, to make the set of simulatenously tracked lexemes larger (it will
cost time and memory during analyse though). If the documents have
completely different contents, what can happen is that almost all
lexemes are only seen a few times and get removed during the pruning of
the working set. I have seen similar behaviour while working on the
typanalyze function for tsvectors.

 So far I have no idea if this is bad or good, so a couple of sample runs
 of stuff that
 is sitting outside the most_common_vals array:
 
 [gathered statistics suck]

 So the most_common_vals seems to contain a lot of values that should
 never have been kept in favor
 of other values that are more common.

 In practice, just cranking the statistics estimate up high enough seems
 to solve the problem, but doesn't
 there seem to be something wrong in how the statistics are collected?

The algorithm to determine most common vals does not do it accurately.
That would require keeping all lexemes from the analysed tsvectors in
memory, which would be impractical. If you want to learn more about the
algorithm being used, try reading
http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in
ts_typanalyze.c

It would be interesting to know what's the average size of a tsvector in
your document set (ie. how many unique lexemes does a tsvector have on
average). In general, the tsvector typanalyze function is designed to
suck less than the constant factor that has been used previously, but it
only works really well on the most common lexemes (thus preventing most
gross misestimates). I'm not very surprised it misses the difference
between 1612/350174 and 4/350174 and I'm quite happy that is gets that
if you set the stats target really high :o)

There's always the possibility that there's some stupid bug there, but I
think you just set your expectations too high for the tsvector typanalze
function. If you could come up with a better way of doing tsvector
stats, that would be awesome - currently it's just doing its best to
prevent the most outrageous errors.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread KaiGai Kohei
(2010/05/25 21:44), Stephen Frost wrote:
 KaiGai,
 
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 OK, the attached patch reworks it according to the way.
 
 Reviewing this patch, there are a whole slew of problems.
 
 #1: REALLY BIG ISSUE- Insufficient comment updates.  You've changed
 function definitions in a pretty serious way as well as moved some code
 around such that some of the previous comments don't make sense.  You
 have got to update comments when you're writing a patch.  Indeed, the
 places I see a changes in comments are when you've removed what appears
 to still be valid and appropriate comments, or places where you've added
 comments which are just blatently wrong with the submitted patch.

Hmm. I'll revise/add the comment around the patched code.

 #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of
 this patch- don't, we're in feature-freeze right now and should not be
 adding hooks at this time.

The patch is intended to submit for the v9.1 development, not v9.0, isn't it?

 #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to
 utils/acl and instead added executor/executor.h to rt_triggers.c.
 I don't particularly like that.  I admit that DoCopy() already knew
 about the executor, and if that were the only case outside of the
 executor where ExecCheckRTPerms() was getting called it'd probably be
 alright, but we already have another place that wants to use it, so
 let's move it to a more appropriate place.

Sorry, I'm a bit confused.
It seemed to me you suggested to utilize ExecCheckRTPerms() rather than
moving its logic anywhere, so I kept it here. (Was it misunderstand?)

If so, but, I doubt utils/acl is the best placeholder of the moved
ExecCheckRTPerms(), because the checker function calls both of the
default acl functions and a optional external security function.
It means the ExecCheckRTPerms() is caller of acl functions, not acl
function itself, isn't it?
In other words, I wonder we should categorize a function X which calls
A and (optionally) B as a part of A.

I agreed the checker function is not a part of executor, but it is
also not a part of acl functions in my opinion.

If it is disinclined to create a new directory to deploy the checker
function, my preference is src/backend/utils/adt/security.c and
src/include/utils/security.h .

 #4: As mentioned previously, the hook (which should be added in a
 separate patch anyway) makes more sense to me to be in
 ExecCheckRTPerms(), not ExecCheckRTEPerms().  This also means that we
 need to be calling ExecCheckRTPerms() from DoCopy and
 RI_Initial_Check(), to make sure that the hook gets called.  To that
 end, I wouldn't even expose ExecCheckRTEPerms() outside of acl.c.  Also,
 there should be a big comment about not using or calling
 ExecCheckRTEPerms() directly outside of ExecCheckRTPerms() since the
 hook would then be skipped.

I don't have any differences in preference between ExecCheckRTPerms()
and ExecCheckRTEPerms(), except for DoCopy() and RI_Initial_Check()
have to call the checker function with list_make1(rte), instead of rte.

 #5: In DoCopy, you can remove relPerms and remainingPerms, but I'd
 probably leave required_access up near the top and then just use it to
 set rte-required_access directly rather than moving that bit deep down
 into the function.

OK,

 #6: I havn't checked yet, but if there are other things in an RTE which
 would make sense in the DoCopy case, beyond just what's needed for the
 permissions checking, and which wouldn't be 'correct' with a NULL'd
 value, I would set those.  Yes, we're building the RTE to check
 permissions, but we don't want someone downstream to be suprised when
 they make a change to something in the permissions checking and discover
 that a value in RTE they expected to be there wasn't valid.  Even more
 so, if there are function helpers which can be used to build an RTE, we
 should be using them.  The same goes for RI_Initial_Check().

Are you saying something like makeFuncExpr()?
I basically agree. However, should it be done in this patch?

 #7: I'd move the conditional if (is_from) into the foreach which is
 building the columnsSet and eliminate the need for columnsSet; I don't
 see that it's really adding much here.

OK,

 #8: When moving ExecCheckRTPerms(), you should rename it to be more like
 the other function calls in acl.h  Perhaps pg_rangetbl_aclcheck()?
 Also, it should return an actual AclResult instead of just true/false.

See the comments in #3.
And, if the caller has to handle aclcheck_error(), user cannot distinguish
access violation errors between the default PG permission and any other
external security stuff, isn't it?

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread KaiGai Kohei
(2010/05/25 22:59), Stephen Frost wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms()
with locally built RangeTblEntry.
 
 Maybe I missed it somewhere, but we still need to address the case where
 the user doesn't have those SELECT permissions that we're looking for in
 RI_Initial_Check(), right?  KaiGai, your patch should be addressing that
 in a similar fashion..

The reason why user must have SELECT privileges on the PK/FK tables is
the validateForeignKeyConstraint() entirely calls SPI_execute() to verify
FK constraints can be established between two tables (even if fallback path).

And, the reason why RI_Initial_Check() now calls pg_class_aclcheck() is
to try to avoid unexpected access violation error because of SPI_execute().
However, the fallback path also calls SPI_execute() entirely, so I concluded
the permission checks in RI_Initial_Check() is nonsense.

However, it is an independent issue right now, so I kept it as is.

The origin of the matter is that we applies unnecessary permission checks,
although it is purely internal use and user was already checked to execute
whole of ALTER TABLE statement. Right?

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 22:16 , Simon Riggs wrote:
 All of these issues show why I want to specify the synchronisation mode
 as a USERSET. That will allow us to specify more easily which parts of
 our application are important when the cluster is degraded and which
 data is so critical it must reach multiple servers.


Hm, but since flushing a important COMMIT to the slave(s) will also need to 
flush all previous (potentially unimportant) COMMITs to the slave(s), isn't 
there a substantial chance of priority-inversion type problems there?

Then again, if asynchronous_commit proved to be effective than so will this 
probably, so maybe my fear is unjustified.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Hiding data in postgresql

2010-05-25 Thread Stephen Frost
Hector,

* Hector Beyers (hqbey...@gmail.com) wrote:
 Does someone have any ideas how I can hide data without the meta data
 noticing? To explain further, I would like to save some collection of data
 where the meta-data does not see it. I am trying to do some security through
 obscurity. It is for research purposes.

This explanation doesn't actually explain anything, near as I can tell.
Perhaps if you would share what your actual problem is, we could
recommend a solution.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Open Item: pg_controldata - machine readable?

2010-05-25 Thread Takahiro Itagaki
There is an open item pg_controldata - machine readable? in the list:
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items

The proposal by Joe Conway is adding a new contib module.
http://archives.postgresql.org/message-id/4b959d7a.6010...@joeconway.com
http://github.com/jconway/pg_controldata

Should we add the module to 9.0? If we do so, SGML documentation is required.

IMHO, I'd like to put the feature into the core instead of a contrib
module, but we cannot change the catalog version in this time.
So, how about providing control file information through pg_settings
view? We will retrieve those variables as GUC options.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: PDXPUG Day at OSCON 2010

2010-05-25 Thread Mark Wong
It was recommended to me to forward this to -hackers.

Regards,
Mark

-- Forwarded message --
From: Mark Wong mark...@gmail.com
Date: Tue, May 18, 2010 at 6:57 AM
Subject: PDXPUG Day at OSCON 2010
To: pgsql-annou...@postgresql.org


Thanks to the generosity of O'Reilly, we will be having a full day of
free PostgreSQL sessions on Sunday, July 18 at the Oregon Convention
Center.  Location details and schedule information can be found on the
wiki at:

http://wiki.postgresql.org/wiki/PDXPUGDay2010

We will ask for a $30 donation towards PostgreSQL at the conference,
but no one will be turned away. Sign up here:

https://spreadsheets.google.com/viewform?hl=enformkey=dDVBRnJGWVlZRkdycFdXbXVuYTNiU2c6MQ

Please submit your talk proposal here:

http://spreadsheets.google.com/viewform?hl=enformkey=dHBFMGFIWmxJUzhRM3R6dXVlWWxYQ1E6MQ.

Proposals will be decided upon in June 7th and updated on the wiki.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread Stephen Frost
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
  #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of
  this patch- don't, we're in feature-freeze right now and should not be
  adding hooks at this time.
 
 The patch is intended to submit for the v9.1 development, not v9.0, isn't it?

That really depends on if this is actually fixing a bug in the existing
code or not.  I'm on the fence about that at the moment, to be honest.
I was trying to find if we expliitly say that SELECT rights are needed
to reference a column but wasn't able to.  If every code path is
expecting that, then perhaps we should just document it that way and
move on.  In that case, all these changes would be for 9.1.  If we
decide the current behavior is a bug, it might be something which could
be fixed in 9.0 and maybe back-patched.

In *either* case, given that one is a 'clean-up' patch and the other is
'new functionality', they should be independent *anyway*.  Small
incremental changes that don't break things when applied is what we're
shooting for here.

  #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to
  utils/acl and instead added executor/executor.h to rt_triggers.c.
  I don't particularly like that.  I admit that DoCopy() already knew
  about the executor, and if that were the only case outside of the
  executor where ExecCheckRTPerms() was getting called it'd probably be
  alright, but we already have another place that wants to use it, so
  let's move it to a more appropriate place.
 
 Sorry, I'm a bit confused.
 It seemed to me you suggested to utilize ExecCheckRTPerms() rather than
 moving its logic anywhere, so I kept it here. (Was it misunderstand?)

I'm talking about moving the whole function (all 3 lines of it) to
somewhere else and then reworking the function to be more appropriate
based on it's new location (including renaming and changing arguments
and return values, as appropriate).

 If so, but, I doubt utils/acl is the best placeholder of the moved
 ExecCheckRTPerms(), because the checker function calls both of the
 default acl functions and a optional external security function.

Can you explain why you think that having a function in utils/acl (eg:
include/utils/acl.h and backend/utils/aclchk.c) which calls default acl
functions and an allows for an external hook would be a bad idea?

 It means the ExecCheckRTPerms() is caller of acl functions, not acl
 function itself, isn't it?

It's providing a higher-level service, sure, but there's nothing
particularly interesting or special about what it's doing in this case,
and, we need it in multiple places.  Why duplicate it?

 I agreed the checker function is not a part of executor, but it is
 also not a part of acl functions in my opinion.
 
 If it is disinclined to create a new directory to deploy the checker
 function, my preference is src/backend/utils/adt/security.c and
 src/include/utils/security.h .

We don't need a new directory or file for one function, as Robert
already pointed out.

  #6: I havn't checked yet, but if there are other things in an RTE which
  would make sense in the DoCopy case, beyond just what's needed for the
  permissions checking, and which wouldn't be 'correct' with a NULL'd
  value, I would set those.  Yes, we're building the RTE to check
  permissions, but we don't want someone downstream to be suprised when
  they make a change to something in the permissions checking and discover
  that a value in RTE they expected to be there wasn't valid.  Even more
  so, if there are function helpers which can be used to build an RTE, we
  should be using them.  The same goes for RI_Initial_Check().
 
 Are you saying something like makeFuncExpr()?
 I basically agree. However, should it be done in this patch?

Actually, I mean looking for, and using, things like
markRTEForSelectPriv() and addRangeTableEntry() or
addRangeTableEntryForRelation().

  #8: When moving ExecCheckRTPerms(), you should rename it to be more like
  the other function calls in acl.h  Perhaps pg_rangetbl_aclcheck()?
  Also, it should return an actual AclResult instead of just true/false.
 
 See the comments in #3.
 And, if the caller has to handle aclcheck_error(), user cannot distinguish
 access violation errors between the default PG permission and any other
 external security stuff, isn't it?

I'm not suggesting that the caller handle aclcheck_error()..
ExecCheckRTPerms() could just as easily have a flag which indicates if
it will call aclcheck_error() or not, and if not, to return an
AclResult to the caller.  That flag could then be passed to
ExecCheckRTEPerms() as you have it now.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread Stephen Frost
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 The reason why user must have SELECT privileges on the PK/FK tables is
 the validateForeignKeyConstraint() entirely calls SPI_execute() to verify
 FK constraints can be established between two tables (even if fallback path).
 
 And, the reason why RI_Initial_Check() now calls pg_class_aclcheck() is
 to try to avoid unexpected access violation error because of SPI_execute().
 However, the fallback path also calls SPI_execute() entirely, so I concluded
 the permission checks in RI_Initial_Check() is nonsense.

That may be the case.  I'm certainly more concerned with a bug in the
existing code than any new code that we're working on.  The question is-
is this actually a user-visible bug?  Or do we require that a user
creating an FK needs SELECT rights on the primary table?  If so, it's
still a bug, but at that point it's a bug in our documentation where we
don't mention that SELECT rights are also needed.

Anyone know what the SQL spec says about this (if anything...)?

 However, it is an independent issue right now, so I kept it as is.

Uh, I don't really see it as independent..  If we have a bug there that
we need to fix, and it's because we have two different bits of code
trying to do the same checking, we should fix it be eliminating the
duplicate checking, imv.

 The origin of the matter is that we applies unnecessary permission checks,
 although it is purely internal use and user was already checked to execute
 whole of ALTER TABLE statement. Right?

That's certainly a nice thought, but given the complexity in ALTER
TABLE, in particular with regard to permissions checking, I have no idea
if what it's doing is intentional or wrong.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Open Item: invalid declspec for PG_MODULE_MAGIC

2010-05-25 Thread Takahiro Itagaki
This open item is for replacing PGDLLIMPORT markers for PG_MODULE_MAGIC
and PG_FUNCTION_INFO_V1 to __declspec(dllexport) because they are always
expored by user modules rather than by the core codes.
http://archives.postgresql.org/message-id/20100329184705.a60e.52131...@oss.ntt.co.jp

The fix is simple, so I think we can include it to 9.0.
Arguable issues for the patch are:
  * Are there better name than PGMODULEEXPORT?  I like PGDLLEXPORT
because it is similar to PGDLLIMPORT, but it might be too similar.
  * Should we backport the fix to previous releases?
I'd like to backport it because it should not break any existing
third party modules because they cannot be even built on Windows.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq should not be using SSL_CTX_set_client_cert_cb

2010-05-25 Thread Tom Lane
I've been experimenting with SSL setups involving chains of CA
certificates, ie, where the server or client cert itself is signed by
an intermediate CA rather than a trusted root CA.  This appears to work
well enough on the server side if you configure the server correctly
(see discussion of bug #5468).  However, libpq is not able to work with
a client certificate unless that cert is directly signed by a CA that
the server trusts (ie, one listed directly in the server's root.crt file).
This is because there is no good way to feed back any intermediate CA
certs to the server.  The man page for SSL_CTX_set_client_cert_cb says
in so many words that the client_cert_cb API is maldesigned:

BUGS

The client_cert_cb() cannot return a complete certificate chain,
it can only return one client certificate. If the chain only has
a length of 2, the root CA certificate may be omitted according
to the TLS standard and thus a standard conforming answer can be
sent to the server. For a longer chain, the client must send the
complete chain (with the option to leave out the root CA
certificate). This can only be accomplished by either adding the
intermediate CA certificates into the trusted certificate store
for the SSL_CTX object (resulting in having to add CA
certificates that otherwise maybe would not be trusted), or by
adding the chain certificates using the
SSL_CTX_add_extra_chain_cert(3) function, which is only
available for the SSL_CTX object as a whole and that therefore
probably can only apply for one client certificate, making the
concept of the callback function (to allow the choice from
several certificates) questionable.

It strikes me that we could not only fix this case, but make the libpq
code simpler and more like the backend case, if we got rid of
client_cert_cb and instead preloaded the ~/.postgresql/postgresql.crt
file using SSL_CTX_use_certificate_chain_file().  Then, using an
indirectly signed client cert would only require including the full cert
chain in that file.

So I'm wondering if there was any specific reason behind using the
callback API to start with.  Anybody remember?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/25/2010 4:16 PM, Tom Lane wrote:

Jan Wieck janwi...@yahoo.com writes:
No, I meant how will the *function* know, if a superuser and/or some 
background process can purge records at any time?


The data contains timestamps which are supposedly taken in commit order. 


You can *not* rely on the commit timestamps to be in exact order.
(Perhaps approximate ordering is good enough for what you want here,
but just be careful to not fall into the trap of assuming that they're
exactly ordered.)


I am well aware of the fact that commit timestamps within the WAL can go 
backwards and that the serial numbers of this proposed implementation of 
commit order can even be different from what the timestamps AND the WAL 
are saying.


As long as the serial number (record position inside of segment) is 
determined while the transaction still holds all its locks, this is 
going to be good enough for what async replication users today are used 
to. Again, it will not magically make it possible to determine a 
serializable order of actions, that happened from transactions running 
in read committed isolation level, post mortem. I don't even even think 
that is possible at all.


And I don't think anyone proposed a solution for that problem anyways.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >