date:20100525

Re: [HACKERS] pg_stat_transaction patch

2010-05-25 Thread Takahiro Itagaki


Joel Jacobson j...@gluefinance.com wrote:

 I applied all the changes on 9.0beta manually and then it compiled without
 any assertion failures.
 
 I also changed the oids to a different unused range, since the ones I used
 before had been taken in 9.0beta1.

Thanks, but you still need to test your patch:

 - You need to check your patch with make check, because it requires
   adjustments in rule test; Your pg_stat_transaction_function is the
   longest name in the system catalog.

 - You need to configure postgres with --enable-cassert to enable internal
   varidations. The attached test case failed with the following TRAP.
TRAP: FailedAssertion(!(entry-trans == ((void *)0)), File: pgstat.c, Line: 
715)
TRAP: FailedAssertion(!(tabstat-trans == trans), File: pgstat.c, Line: 
1758)

 I suspect it is because get_tabstat_entry for some reason returns NULL, in
 for example pg_stat_get_transaction_tuples_inserted(PG_FUNCTION_ARGS).
 
 Does the function look valid? If you can find the error in it, the other
 functions probably have the same problem.

For the above trap, we can see the comment:
/* Shouldn't have any pending transaction-dependent counts */
We don't expect to read stats entries during transactions. I'm not sure
whether accessing transitional stats during transaction is safe or not.

We might need to go other directions, for example:
  - Use session stats instead transaction stats. You can see the same
information in difference of counters between before and after the
transaction.
  - Export pgBufferUsage instead of relation counters. They are
buffer counters for all relations, but we can obviously export
them because they are just plain variables.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-25 Thread Heikki Linnakangas


On 24/05/10 22:49, Alvaro Herrera wrote:

Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:


Problem: currently, if your database has a large amount of cold data,
such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
needs to touch it thanks to the visibility map.  However, every
freeze_age transactions, very old pages need to be sucked into memory
and rewritten just in order to freeze those pages.  This can have a huge
impact on system performance, and seems unjustified because the pages
are not actually being used.


I think this is nonsense.  If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*.  The problem is vacuum
reading them in so that it can verify there's nothing to do.  If we want
to avoid *reading* those pages, this solution is useless:


Suggested resolution: we would add a 4-byte field to the *page* header
which would track the XID wraparound count.


because you still have to read the page.


What's missing from the suggestion is that relfrozenxid and datfrozenxid 
also need to be expanded to 8-bytes. That way you effectively have 
8-byte XIDs, which means that you never need to vacuum to avoid XID 
wraparound.


You still need to freeze to truncate clog, though, but if you have the 
disk space, you can now do that every 100 billion transactions for 
example if you wish.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-25 Thread KaiGai Kohei

(2010/05/25 12:19), Robert Haas wrote:
 On Mon, May 24, 2010 at 9:27 PM, Stephen Frostsfr...@snowman.net  wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 We have two options; If the checker function takes the list of 
 RangeTblEntry,
 it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely,
 if the checker function takes arguments in my patch, it will be comfortable
 to DoCopy(), but ExecCheckRTPerms().

 In my patch, it takes 6 arguments, but we can reference all of them from
 the given RangeTblEntry. On the other hand, if DoCopy() has to set up
 a pseudo RangeTblEntry to call checker function, it entirely needs to set
 up similar or a bit large number of variables.

 I don't know that it's really all that difficult to set up an RT in
 DoCopy or RI_Initial_Check().  In my opinion, those are the strange or
 corner cases- not the Executor code, through which all 'regular' DML is
 done.  It makes me wonder if COPY shouldn't have been implemented using
 the Executor instead, but that's, again, a completely separate topic.
 It wasn't, but it wants to play like it operates in the same kind of way
 as INSERT, so it needs to pick up the slack.
 
 I think this approach is definitely worth investigating.  KaiGai, can
 you please work up what the patch would look like if we do it this
 way?

OK, the attached patch reworks it according to the way.

* ExecCheckRTEPerms() becomes to take 2nd argument the caller to suggest
  behavior on access violation. The 'abort' argument is true, it raises
  an error using aclcheck_error() or ereport(). Otherwise, it returns
  false immediately without rest of checks.

* DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms()
  with locally built RangeTblEntry.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 21,26 
--- 21,27 
  #include arpa/inet.h
  
  #include access/heapam.h
+ #include access/sysattr.h
  #include access/xact.h
  #include catalog/namespace.h
  #include catalog/pg_type.h
***
*** 37,43 
  #include rewrite/rewriteHandler.h
  #include storage/fd.h
  #include tcop/tcopprot.h
- #include utils/acl.h
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/memutils.h
--- 38,43 
***
*** 725,733  DoCopy(const CopyStmt *stmt, const char *queryString)
  	List	   *force_notnull = NIL;
  	bool		force_quote_all = false;
  	bool		format_specified = false;
- 	AclMode		required_access = (is_from ? ACL_INSERT : ACL_SELECT);
- 	AclMode		relPerms;
- 	AclMode		remainingPerms;
  	ListCell   *option;
  	TupleDesc	tupDesc;
  	int			num_phys_attrs;
--- 725,730 
***
*** 988,993  DoCopy(const CopyStmt *stmt, const char *queryString)
--- 985,995 
  
  	if (stmt-relation)
  	{
+ 		RangeTblEntry	rte;
+ 		Bitmapset	   *columnsSet = NULL;
+ 		List		   *attnums;
+ 		ListCell	   *cur;
+ 
  		Assert(!stmt-query);
  		cstate-queryDesc = NULL;
  
***
*** 998,1026  DoCopy(const CopyStmt *stmt, const char *queryString)
  		tupDesc = RelationGetDescr(cstate-rel);
  
  		/* Check relation permissions. */
! 		relPerms = pg_class_aclmask(RelationGetRelid(cstate-rel), GetUserId(),
! 	required_access, ACLMASK_ALL);
! 		remainingPerms = required_access  ~relPerms;
! 		if (remainingPerms != 0)
  		{
! 			/* We don't have table permissions, check per-column permissions */
! 			List	   *attnums;
! 			ListCell   *cur;
! 
! 			attnums = CopyGetAttnums(tupDesc, cstate-rel, attnamelist);
! 			foreach(cur, attnums)
! 			{
! int			attnum = lfirst_int(cur);
  
! if (pg_attribute_aclcheck(RelationGetRelid(cstate-rel),
! 		  attnum,
! 		  GetUserId(),
! 		  remainingPerms) != ACLCHECK_OK)
! 	aclcheck_error(ACLCHECK_NO_PRIV, ACL_KIND_CLASS,
!    RelationGetRelationName(cstate-rel));
! 			}
  		}
  
  		/* check read-only transaction */
  		if (XactReadOnly  is_from  !cstate-rel-rd_islocaltemp)
  			PreventCommandIfReadOnly(COPY FROM);
--- 1000,1025 
  		tupDesc = RelationGetDescr(cstate-rel);
  
  		/* Check relation permissions. */
! 		attnums = CopyGetAttnums(tupDesc, cstate-rel, attnamelist);
! 		foreach (cur, attnums)
  		{
! 			int	attnum = lfirst_int(cur) - FirstLowInvalidHeapAttributeNumber;
  
! 			columnsSet = bms_add_member(columnsSet, attnum);
  		}
  
+ 		memset(rte, 0, sizeof(rte));
+ 		rte.type = T_RangeTblEntry;
+ 		rte.rtekind = RTE_RELATION;
+ 		rte.relid = RelationGetRelid(cstate-rel);
+ 		rte.requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);
+ 		if (is_from)
+ 			rte.modifiedCols = columnsSet;
+ 		else
+ 			rte.selectedCols = columnsSet;
+ 
+ 		ExecCheckRTEPerms(rte, true);
+ 
  		/* check read-only transaction */
  		if (XactReadOnly  is_from  !cstate-rel-rd_islocaltemp)
  			PreventCommandIfReadOnly(COPY FROM);
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 63,68

Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Joseph Adams

I started a wiki article for brainstorming the JSON API:
http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
substantial changes to the draft of the API based on discussion here
and on the #postgresql IRC channel.

Is it alright to use the wiki for brainstorming, or should it stay on
the mailing list or go somewhere else?

I'll try not to spend too much time quibbling over the specifics as I
tend to do.  While the brainstorming is going on, I plan to start
implementing the datatype by itself so I can establish an initial
working codebase.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-25 Thread Florian Pflug

On May 25, 2010, at 6:08 , Sam Vilain wrote:
 http://www.postgresql.org/docs/8.4/static/sql-savepoint.html
 
 Lead us to believe that if you roll back to the same savepoint name
 twice in a row, that you might start walking back through the
 savepoints.  I guess I missed the note on ROLLBACK TO SAVEPOINT that
 that is not how it works.
 
 Here is the section:
 
 SQL requires a savepoint to be destroyed automatically when another
 savepoint with the same name is established. In PostgreSQL, the old
 savepoint is kept, though only the more recent one will be used when
 rolling back or releasing. (Releasing the newer savepoint will cause the
 older one to again become accessible to ROLLBACK TO SAVEPOINT and
 RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

I'm confused. The sentence in brackets Releasing the newer savepoint will 
cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and 
RELEASE SAVEPOINT implies that you *will* walk backwards through all the 
savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? If 
that is not how it actually works, then this whole paragraph is wrong, I'd say.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be

2010-05-25 Thread Fujii Masao

On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, OK, I think that makes sense.  Would you care to propose a patch?

 Yep. Here is the patch.

 This patch distinguishes normal shutdown from unexpected exit, while the
 server is in recovery. That is, when smart or fast shutdown is requested
 during recovery, the bgwriter sets the ControlFile-state to new-introduced
 DB_SHUTDOWNED_IN_RECOVERY state.

This patch is worth applying for 9.0? If not, I'll add it into
the next CF for 9.1.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hot Standby performance and deadlocking

2010-05-25 Thread Simon Riggs


Some performance problems have been reported on HS from two users: Erik
and Stefan.

The characteristics of those issues have been that performance is
* sporadically reduced, though mostly runs at full speed
* context switch storms reported as being associated

So we're looking for something that doesn't always happen, but when it
does it involves lots of processes and context switching.

Unfortunately neither test reporter has been able to re-run tests,
leaving me not much to go on. Though since I know the code well, I can
focus in on likely suspects fairly easily; in this case I think I have a
root cause.

Earlier this year I added deadlock detection into Startup process when
it waits for a buffer pin. The deadlock detection was simplified since
it doesn't wait for deadlock_timeout before acting, it just immediately
sends a signal to all active processes to resolve the deadlock, even if
the buffer pin is released very soon afterwards. Heikki questioned this
implementation at the time, though I said it was easier to start simple
and add more code if problems arose and time allowed. It's clear that
with 100+ connections and reasonably frequent buffer pin waits, as would
occur when accessing same data blocks on both primary and standby, that
the current too-simple coding would cause performance issues, as Heikki
implied. Certainly actual deadlocks are much rarer than buffer pin
waits, so the current coding is wasteful.

The following patch adds some simple logic to make the Startup process
wait for deadlock_timeout before it sends the deadlock resolution
signals. It does that by refactoring the API to
enable_standby_sigalrm(), though doesn't change other behaviour or add
new features.

Viewpoints?

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***
*** 388,399  ResolveRecoveryConflictWithBufferPin(void)
  	}
  	else if (MaxStandbyDelay  0)
  	{
  		/*
! 		 * Send out a request to check for buffer pin deadlocks before we
! 		 * wait. This is fairly cheap, so no need to wait for deadlock timeout
! 		 * before trying to send it out.
  		 */
! 		SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
  	}
  	else
  	{
--- 388,402 
  	}
  	else if (MaxStandbyDelay  0)
  	{
+ 		TimestampTz now = GetCurrentTimestamp();
+ 
  		/*
! 		 * Set timeout for deadlock check (only)
  		 */
! 		if (enable_standby_sig_alarm(now, now, true))
! 			sig_alarm_enabled = true;
! 		else
! 			elog(FATAL, could not set timer for process wakeup);
  	}
  	else
  	{
***
*** 410,443  ResolveRecoveryConflictWithBufferPin(void)
  		}
  		else
  		{
! 			TimestampTz fin_time;		/* Expected wake-up time by timer */
! 			long		timer_delay_secs;		/* Amount of time we set timer
!  * for */
! 			int			timer_delay_usecs;
! 
! 			/*
! 			 * Send out a request to check for buffer pin deadlocks before we
! 			 * wait. This is fairly cheap, so no need to wait for deadlock
! 			 * timeout before trying to send it out.
! 			 */
! 			SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
  
  			/*
! 			 * How much longer we should wait?
  			 */
! 			fin_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay);
! 
! 			TimestampDifference(now, fin_time,
! timer_delay_secs, timer_delay_usecs);
  
  			/*
! 			 * It's possible that the difference is less than a microsecond;
! 			 * ensure we don't cancel, rather than set, the interrupt.
  			 */
! 			if (timer_delay_secs == 0  timer_delay_usecs == 0)
! timer_delay_usecs = 1;
! 
! 			if (enable_standby_sig_alarm(timer_delay_secs, timer_delay_usecs, fin_time))
  sig_alarm_enabled = true;
  			else
  elog(FATAL, could not set timer for process wakeup);
--- 413,431 
  		}
  		else
  		{
! 			TimestampTz max_standby_time;
  
  			/*
! 			 * At what point in the future do we hit MaxStandbyDelay?
  			 */
! 			max_standby_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay);
! 			Assert(max_standby_time  now);
  
  			/*
! 			 * Wake up at MaxStandby delay, and check for deadlocks as well
! 			 * if we will be waiting longer than deadlock_timeout
  			 */
! 			if (enable_standby_sig_alarm(now, max_standby_time, false))
  sig_alarm_enabled = true;
  			else
  elog(FATAL, could not set timer for process wakeup);
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***
*** 85,90  static TimestampTz timeout_start_time;
--- 85,91 
  
  /* statement_fin_time is valid only if statement_timeout_active is true */
  static TimestampTz statement_fin_time;
+ static TimestampTz statement_fin_time2; /* valid only in recovery */
  
  
  static void RemoveProcFromArray(int code, Datum arg);
***
*** 1619,1641  handle_sig_alarm(SIGNAL_ARGS)
   * To avoid various edge cases, we must be careful to do nothing
   * when there is nothing to be done.  We also need to

Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-25 Thread Heikki Linnakangas

On 25/05/10 13:03, Florian Pflug wrote:

On May 25, 2010, at 6:08 , Sam Vilain wrote:

http://www.postgresql.org/docs/8.4/static/sql-savepoint.html

Lead us to believe that if you roll back to the same savepoint name
twice in a row, that you might start walking back through the
savepoints. I guess I missed the note on ROLLBACK TO SAVEPOINT that
that is not how it works.

Here is the section:

SQL requires a savepoint to be destroyed automatically when another
savepoint with the same name is established. In PostgreSQL, the old
savepoint is kept, though only the more recent one will be used when
rolling back or releasing. (Releasing the newer savepoint will cause the
older one to again become accessible to ROLLBACK TO SAVEPOINT and
RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

I'm confused. The sentence in brackets Releasing the newer savepoint will cause the older one to again
become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT implies that you *will* walk backwards
through all the savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a,
no? If that is not how it actually works, then this whole paragraph is wrong, I'd say.

Releasing the newer savepoint will cause the older one to again become
accessible, as the doc says, but rolling back to a savepoint does not
implicitly release it. You'll have to use RELEASE SAVEPOINT for that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 112 matches

Mail list logo