date:20120222

Attached is a feature extracted from the Ants Aasma add timing of 
buffer I/O requests submission.  That included a tool to measure timing 
overhead, from gettimeofday or whatever else INSTR_TIME_SET_CURRENT 
happens to call.  That's what I've broken out here; it's a broader topic 
than just buffer timing.


I fixed some trivial bugs and cleaned up the output of the program, then 
wrote a full documentation section for it.  After that review, I think 
this could be ready to commit...with a big picture caveat below.  Code 
wise, I am mainly concerned about its portability, and that's not often 
a place we get good review help on.  The tool is based on pg_test_fsync. 
 Perhaps Bruce might want to take a look at this, to see if any of the 
changes he's recently made to pg_test_fsync impact what this utility 
should do?  He might be able to flesh out the FreeBSD examples too.


As for why this whole topic is important, I've found the results of this 
new pg_test_timing track quite well with systems where EXPLAIN ANALYZE 
timing overhead is large.  As such, it fills in a gap in the existing 
docs, where that possibility is raised but no way was given to measure 
it--nor determine how to improve it.  I expect we'll be worried about 
how large timing overhead is more for future features, with the first 
example being the rest of Ants's own submission.


A look back on this now that I'm done with it does raise one large 
question though.  I added some examples of how to measure timing 
overhead using psql.  While I like the broken down timing data that this 
utility provides, I'm not sure whether it's worth adding a contrib 
module just to get it now though.  Extension that's packaged on 
something like PGXN and easy to obtain?  Absolutely--but maybe that's a 
developer only level thing.  Maybe the only code worth distributing is 
the little SQL example of how to measure the overhead, along with some 
reference good/bad numbers.  That plus the intro to timer trivia could 
turn this into a documentation section only, no code change.  I've 
dreamed of running something like this on every system in the build 
farm.  Even if that's a valuable exercise, even then it may only be 
worth doing once, then reverting.


Anyway, the patch does now includes several examples and a short primer 
on PC clock hardware, to help guide what good results look like and why 
they've been impossible to obtain in the past.  That's a bit 
Linux-centric, but the hardware described covers almost all systems 
using Intel or AMD processors.  Only difference with most other 
operating systems is how aggressively they have adopted newer timer 
hardware.  At least this gives a way to measure all of them.


Some references used to put together the clock source tutorial:

Microsoft's intro to HPET: 
http://msdn.microsoft.com/en-us/windows/hardware/gg463347
Notes on effective clock resolution: 
http://elinux.org/System_Tap_Timestamp_Notes
VMware clock history and impact on VMs: 
http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
VMware timer suggestions for various Linux versions: 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1006427


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/contrib/Makefile b/contrib/Makefile
index ac0a80a..d230451 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
+		pg_test_timing	\
 		pg_trgm		\
 		pg_upgrade	\
 		pg_upgrade_support \
diff --git a/contrib/pg_test_timing/Makefile b/contrib/pg_test_timing/Makefile
new file mode 100644
index 000..b8b266a
--- /dev/null
+++ b/contrib/pg_test_timing/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_test_timing/Makefile
+
+PGFILEDESC = pg_test_timing - test timing overhead
+PGAPPICON = win32
+
+PROGRAM  = pg_test_timing
+OBJS = pg_test_timing.o
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_test_timing
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c
new file mode 100644
index 000..4ff0455
--- /dev/null
+++ b/contrib/pg_test_timing/pg_test_timing.c
@@ -0,0 +1,162 @@
+/*
+ *	pg_test_timing.c
+ *		tests overhead of timing calls and their monoticity:  that
+ * 		they always move forward
+ */
+
+#include postgres_fe.h
+
+#include getopt_long.h
+#include portability/instr_time.h
+
+static const char *progname;
+
+static int32 test_duration = 3;
+
+static void handle_args(int argc, char *argv[]);
+static void test_timing(int32);
+
+int
+main(int argc, char *argv[])
+{
+	progname = get_progname(argv[0]);
+
+	handle_args(argc, argv);
+
+	test_timing(test_duration);
+
+	return 0;
+}
+
+static

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-22 Thread Simon Riggs

On Wed, Feb 22, 2012 at 7:06 AM, Noah Misch n...@leadboat.com wrote:
 On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote:
 On Sun, Feb 19, 2012 at 11:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
  So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
  doesn't have a checksum, PD_CHECKSUM2 is not set? ?What good does that
  do?
 
  As explained in detailed comments, the purpose of this is to implement
  Heikki's suggestion that we have a bit set to zero so we can detect
  failures that cause a run of 1s.

 I think it's nonsensical to pretend that there's anything special
 about that particular bit.  If we want to validate the page header
 before trusting the lack of a checksum bit, we can do that far more
 thoroughly than just checking that one bit.  There are a whole bunch
 of bits that ought to always be zero, and there are other things we
 can validate as well (e.g. LSN not in future).  If we're concerned
 about the checksum-enabled bit getting flipped (and I agree that we
 should be), we can check some combination of that stuff in the hope of
 catching it, and that'll be a far better guard than just checking one
 arbitrarily selected bit.

 PageHeaderIsValid() (being renamed to PageIsVerified()) already checks
 (page-pd_flags  ~PD_VALID_FLAG_BITS) == 0.  Explicitly naming another bit
 and keeping it unset is redundant with that existing check.  It would cease to
 be redundant if we ever allocate all the flag bits, but then we also wouldn't
 have a bit to spare as PD_CHECKSUM2.  I agree with you on this point.

No problem to remove that. It was there at Heikki's request.

 That having been said, I don't feel very good about the idea of
 relying on the contents of the page to tell us whether or not the page
 has a checksum.  There's no guarantee that an error that flips the
 has-checksum bit will flip any other bit on the page, or that it won't
 flip everything else we're relying on as a backstop in exactly the
 manner that foils whatever algorithm we put in place.  Random
 corruption is, perhaps, unlikely to do that, but somehow I feel like
 it'll happen more often than random chance suggests.  Things never
 fail the way you want them to.

 Another disadvantage of the current scheme is that there's no
 particularly easy way to know that your whole cluster has checksums.
 No matter how we implement checksums, you'll have to rewrite every
 table in the cluster in order to get them fully turned on.  But with
 the current design, there's no easy way to know how much of the
 cluster is actually checksummed.  If you shut checksums off, they'll
 linger until those pages are rewritten, and there's no easy way to
 find the relations from which they need to be removed, either.

 I'm not seeing value in rewriting pages to remove checksums, as opposed to
 just ignoring those checksums going forward.  Did you have a particular
 scenario in mind?

Agreed. No reason to change a checksum unless we rewrite the block, no
matter whether page_checksums is on or off.

 I'm tempted to suggest a relation-level switch: when you want
 checksums, you use ALTER TABLE to turn them on, and when you don't
 want them any more you use ALTER TABLE to shut them off again, in each
 case rewriting the table.  That way, there's never any ambiguity about
 what's in the data pages in a given relation: either they're either
 all checksummed, or none of them are.  This moves the decision about
 whether checksums are enabled or disabled quite a but further away
 from the data itself, and also allows you to determine (by catalog
 inspection) which parts of the cluster do in fact have checksums.  It
 might be kind of a pain to implement, though: you'd have to pass the
 information about how any given relation was configured down to the
 place where we validate page sanity.  I'm not sure whether that's
 practical.

 This patch implies future opportunities to flesh out its UI, and I don't see
 it locking us out of implementing the above.

Agreed

 We'll want a weak-lock command
 that adds checksums to pages lacking them.

VACUUM FREEZE will do this.

 We'll want to expose whether a
 given relation has full checksum coverage.

Not sure I understand why we'd want that. If you really want that, its
a simple function to do it.

 With that, we could produce an
 error when an apparently-non-checksummed page appears in a relation previously
 known to have full checksum coverage.

Additional checking at relation level is just going to slow things
down. Don't see the value of having relation level checksum option.

It would be an easy thing to skip checksums on UNLOGGED relations, and
possibly desirable, but I don't see the utility of a per relation
checksum setting in other cases. There's just no need for fine tuning
like that.

 Even supposing an ALTER TABLE t SET {WITH | WITHOUT} CHECKSUMS as the only
 tool for enabling or disabling them, it's helpful to have each page declare
 whether it has a checksum.  That way, the

Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-22 Thread Jan Urbański

On 21/02/12 18:28, Jan Urbański wrote:
 On 21/02/12 18:05, Peter Eisentraut wrote:
 it might be better to use ereport, to expose the message
 for translation.

 After giving it some thought some of these elogs could be changed into
 PLy_elogs (which is meant to propagate a Python error into Postgres) and
 the others made into ereports.
 
 I'll send updated patches this evening (CET).

Inevitably, this turned into the next morning CET.

Here are the updated patches which use PLy_elog instead of plain elog.
The difference is that they will get marked for translation and that the
original Python exception will show up in the errdetail field.

Cheers,
Jan
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index 741980c..332898d 100644
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
*** get_source_line(const char *src, int lin
*** 365,370 
--- 365,374 
  	const char *next = src;
  	int			current = 0;
  
+ 	/* sanity check */
+ 	if (lineno = 0)
+ 		return NULL;
+ 
  	while (current  lineno)
  	{
  		s = next;
diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index ae9d87e..f259ecf 100644
*** a/src/pl/plpython/plpy_main.c
--- b/src/pl/plpython/plpy_main.c
*** PLy_init_interp(void)
*** 133,138 
--- 133,140 
  	Py_INCREF(mainmod);
  	PLy_interp_globals = PyModule_GetDict(mainmod);
  	PLy_interp_safe_globals = PyDict_New();
+ 	if (PLy_interp_safe_globals == NULL)
+ 		PLy_elog(ERROR, could not create globals);
  	PyDict_SetItemString(PLy_interp_globals, GD, PLy_interp_safe_globals);
  	Py_DECREF(mainmod);
  	if (PLy_interp_globals == NULL || PyErr_Occurred())
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index d2d0a2a..0c96f03 100644
*** a/src/pl/plpython/plpy_plpymodule.c
--- b/src/pl/plpython/plpy_plpymodule.c
*** PLy_init_plpy(void)
*** 173,181 
  	main_mod = PyImport_AddModule(__main__);
  	main_dict = PyModule_GetDict(main_mod);
  	plpy_mod = PyImport_AddModule(plpy);
  	PyDict_SetItemString(main_dict, plpy, plpy_mod);
  	if (PyErr_Occurred())
! 		elog(ERROR, could not initialize plpy);
  }
  
  static void
--- 173,183 
  	main_mod = PyImport_AddModule(__main__);
  	main_dict = PyModule_GetDict(main_mod);
  	plpy_mod = PyImport_AddModule(plpy);
+ 	if (plpy_mod == NULL)
+ 		PLy_elog(ERROR, could not initialize plpy);
  	PyDict_SetItemString(main_dict, plpy, plpy_mod);
  	if (PyErr_Occurred())
! 		PLy_elog(ERROR, could not initialize plpy);
  }
  
  static void
*** PLy_add_exceptions(PyObject *plpy)
*** 208,213 
--- 210,220 
  	PLy_exc_fatal = PyErr_NewException(plpy.Fatal, NULL, NULL);
  	PLy_exc_spi_error = PyErr_NewException(plpy.SPIError, NULL, NULL);
  
+ 	if (PLy_exc_error == NULL ||
+ 		PLy_exc_fatal == NULL ||
+ 		PLy_exc_spi_error == NULL)
+ 		PLy_elog(ERROR, could not create the base SPI exceptions);
+ 
  	Py_INCREF(PLy_exc_error);
  	PyModule_AddObject(plpy, Error, PLy_exc_error);
  	Py_INCREF(PLy_exc_fatal);
*** PLy_generate_spi_exceptions(PyObject *mo
*** 241,247 
--- 248,260 
  		PyObject   *sqlstate;
  		PyObject   *dict = PyDict_New();
  
+ 		if (dict == NULL)
+ 			PLy_elog(ERROR, could not generate SPI exceptions);
+ 
  		sqlstate = PyString_FromString(unpack_sql_state(exception_map[i].sqlstate));
+ 		if (sqlstate == NULL)
+ 			PLy_elog(ERROR, could not generate SPI exceptions);
+ 
  		PyDict_SetItemString(dict, sqlstate, sqlstate);
  		Py_DECREF(sqlstate);
  		exc = PyErr_NewException(exception_map[i].name, base, dict);
*** PLy_output(volatile int level, PyObject
*** 369,376 
  		 * decoration.
  		 */
  		PyObject   *o;
  
- 		PyArg_UnpackTuple(args, plpy.elog, 1, 1, o);
  		so = PyObject_Str(o);
  	}
  	else
--- 382,393 
  		 * decoration.
  		 */
  		PyObject   *o;
+ 		int			result;
+ 
+ 		result = PyArg_UnpackTuple(args, plpy.elog, 1, 1, o);
+ 		if (!result)
+ 			PLy_elog(ERROR, could not unpack arguments in plpy.elog);
  
  		so = PyObject_Str(o);
  	}
  	else
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 0d63c4f..20b93cd 100644
*** a/src/pl/plpython/plpy_spi.c
--- b/src/pl/plpython/plpy_spi.c
*** PLy_spi_execute_query(char *query, long
*** 338,344 
  	int			rv;
  	volatile MemoryContext oldcontext;
  	volatile ResourceOwner oldowner;
! 	PyObject   *ret;
  
  	oldcontext = CurrentMemoryContext;
  	oldowner = CurrentResourceOwner;
--- 338,344 
  	int			rv;
  	volatile MemoryContext oldcontext;
  	volatile ResourceOwner oldowner;
! 	PyObject   *ret = NULL;
  
  	oldcontext = CurrentMemoryContext;
  	oldowner = CurrentResourceOwner;
*** PLy_spi_execute_query(char *query, long
*** 362,367 
--- 362,368 
  
  	if (rv  0)
  	{
+ 		Py_XDECREF(ret);
  		PLy_exception_set(PLy_exc_spi_error,
  		  SPI_execute failed: %s,
  		  SPI_result_code_string(rv));
diff --git a/src/pl/plpython/plpy_typeio.c

[HACKERS] temporal algebra and data type

2012-02-22 Thread Peter Padua Krauss

Hello,

I am new here,
it is a question about how to colaborate and/or where to find, in the
scope of temporal database...

-

I see the Temporal Postgres,
a project for making tools and documentation that people can use to better
manage, query, and maintain time data in Postgres,
http://temporal.projects.postgresql.org/

BUT
* the project is oversized for my needs ...
* there are NO complementar pure SQL library for manage sheduling
events.

Searching I find something more close to my necessities, the S.Bailey's
TIMESPAN model, at
 http://scottrbailey.wordpress.com/2009/05/19/timespan-postgresql/

BUT it is not ready for practical use (it is more for didactic use)... THEN
I create
   http://code.google.com/p/temporal-type
that is what I need.

*QUESTIONS: *

1) There are another  project, similar or better than my!? There are pure
SQL standard libraries for this?

2) Why something similar (manage events without pg-extension) is not into
the temporal.projects.postgresql.org ?

3) There are another good place for colaborate with a simple and pure
SQL library project?
   Can I colaborate to the wiki-page?  see
http://wiki.postgresql.org/wiki/Temporal_Extensions

Re: [HACKERS] leakproof

On Tue, Feb 21, 2012 at 11:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 I don't think that pure is sufficient to be leakproof.  For example,
 if I have a function which is pure but which takes an unusually long
 time to evaluate for some unique pathological combination of
 arguments, I don't think that it would be considered leakproof.

I think we've made up our mind to largely ignore such cases, though.
There may be information that gets leaked either through the choice of
plan or the time it takes to execute it, but plugging that type of
hole completely is almost impossible.  Suppose the security view
filters for rows where user = 'rhaas' and the evil rhaas requests rows
where id = 5.  The planner chooses an index-scan on id = 5 and then
applies a filter of user = 'rhaas' to the result.  There's probably
some slight timing difference between the case where no id = 5 row
exists, and the case where it exists but does not have user = 'rhaas',
so a sufficiently dedicated hacker might be able to employ a timing
attack to determine the existence of such a row.  Also, by jiggering
with page costs and EXPLAIN, they might be able to get some idea of
the number of rows in the table.

I don't believe that this makes the feature useless, though.  In most
cases, and especially when using surrogate keys (which I typically do,
whether Josh Berkus likes it or not), the existence of the row is not
a terribly interesting piece of information: it's the content of the
row that's sensitive.  Neither table-level security nor column-level
security attempt to prevent you from determining that the table or
column exists; they just hide its contents.  And if you need to bar
EXPLAIN so people can't peek at query plans, it's certainly possible
to do that using ProcessUtility_hook.  If you're worried about more
subtle timing attacks, you should probably rethink the idea of letting
people log into the database in the first place.

Anyway, to your point, I suppose I might hesitate to mark factorial
leak-proof even if it didn't throw an error on overflow, because the
time it takes to return an answer for larger inputs does grow rather
rapidly.  But it's kind of a moot point because the error makes it not
leak-proof anyway.  So maybe we're just splitting hairs here, however
we decide to label this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

 But is not true. Why?

I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
general, because VACUUM has to scan the whole table, and ANALYZE only
a fixed-size subset of its pages.  Not sure what's happening in your
particular case...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Nicolas Barbier

2012/2/22 Robert Haas robertmh...@gmail.com:

 On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:

 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

 But is not true. Why?

 I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
 general, because VACUUM has to scan the whole table, and ANALYZE only
 a fixed-size subset of its pages.

It sounds like you just said the opposite of what you wanted to say.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] temporal algebra and data type

2012-02-22 Thread Euler Taveira de Oliveira

On 22-02-2012 09:50, Peter Padua Krauss wrote:
 1) There are another  project, similar or better than my!? There are pure SQL
 standard libraries for this?
 
Range Types [1]. It is 9.2 material but it is already in the repository.


[1] http://www.postgresql.org/docs/devel/static/rangetypes.html


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

On Tue, Feb 21, 2012 at 5:07 AM, Noah Misch n...@leadboat.com wrote:
 We do, in numerous places, drop a shared buffer content lock and reacquire it
 in exclusive mode.  However, the existing users of that pattern tend to trade
 the lock, complete subsequent work, and unlock the buffer all within the same
 function.  So they must, because several of them recheck facts that can change
 during the period of holding no lock.  SetBufferCommitInfoNeedsSave() callers
 do not adhere to that pattern; they can be quite distant from the original
 lock acquisition and eventual lock release.  Take the prototypical case of
 SetHintBits().  Our shared lock originated in a place like heapgettup(), which
 expects to hold it continually until finished.  We would need to somehow push
 up through HeapTupleSatisfiesVisibility() the fact that we traded the buffer
 lock, then have heapgettup() reorient itself as needed.  Every caller of code
 that can reach SetBufferCommitInfoNeedsSave() would need to do the same.
 Perhaps there's a different way to apply the lock-trade technique that avoids
 this mess, but I'm not seeing it.  Consequently, I find the idea of requiring
 a spinlock acquisition to read or write pd_lsn/pd_tli under BUFFER_LOCK_SHARE
 to be a sensible one.  Within that umbrella, some details need attention:

Fair analysis.

 - Not all BUFFER_LOCK_SHARE callers of PageGetLSN() live in bufmgr.c.  I note
  gistScanPage() and XLogCheckBuffer()[1].  (Perhaps we'll only require the
  spinlock for heap buffers, letting gistScanPage() off the hook.)  We need a
  public API, perhaps LockBufferForLSN().

That seems like a good idea.  I am a little worried about Simon's plan
to do the additional locking only for AMs that have no unlogged-type
updates.  It's a reasonable performance optimization, and skipping the
locking when checksums are shut off also seems reasonable, but it
seems a bit fragile: suppose that, in the future, someone fixes GiST
to do something intelligent with kill_prior_tuple.  Now suddenly it
needs LSN locking that it didn't need before, but this fact might not
be very obvious.  It might be a good idea to design LockBufferForLSN
to take an AM OID as an argument, and use the AM OID to decide whether
the locking is required.  That way, we can call it from everywhere
that reads an LSN, and it can simply skip the locking in places where
it isn't presently needed.

Or maybe there's a better design, but I agree with you that some kind
of public API is essential.

 - The use of some spinlock need not imply using the buffer header spinlock.
  We could add a dedicated pd_lsn_tli_lock to BufferDesc.  That has the usual
  trade-off of splitting a lock: less contention at the cost of more
  acquisitions.  I have no intuition on which approach would perform better.

I think I like the idea of a separate lock, but I agree it could stand
to be tested both ways.  Another thought is that we might add
SpinLockConditionalAcquire(), that just tries to TAS the lock and
returns false if the lock is already held.  Then, we could grab the
spinlock while writing the page, in lieu of copying it, and anyone who
wants to set a hint bit can conditionally acquire the spinlock long
enough to set the bits.  If the spinlock isn't immediately free then
instead of spinning they just don't set the hint bits after all.  That
way you can be sure that no hint bits are set during the write, but
any attempt to set a hint bit only costs you a single TAS - no loop,
as with a regular spinlock acquisition - and of course the hint
itself.

 A possible compromise is to leave the page clean after setting a hint bit,
 much like the patch already has us do under hot standby.  Then there's no new
 WAL and no new rules around pd_lsn.  Wasn't that one of the things Merlin
 benchmarked when he was looking at hint bits?  Does anyone recall the result?

It slows things down substantially in the case of repeated scans of
the same unchaning table. In fact, I tried a much more nuanced
approach: set BM_UNTIDY on the page when
SetBufferCommitInfoNeedsSave() is called.  BM_UNTIDY pages are written
by the background writer and during checkpoints, and 5% of the time by
backends.  All of that has the salutary effect of making the first
sequential scan less ungodly slow, but it also has the less-desirable
effect of making the next 19 of them still kinda-sorta slow.  It was
unclear to me (and perhaps to others) whether it really worked out to
a win.

However, we might need/want to revisit some of that logic in
connection with this patch, because it seems to me that a large
sequential scan of an unhinted table could be ferociously slow, with
the backend writing out its own pages and WAL-logging them as it goes.
 It would be good to test that to figure out whether some mitigation
measures are needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier
nicolas.barb...@gmail.com wrote:
 2012/2/22 Robert Haas robertmh...@gmail.com:

 On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:

 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

 But is not true. Why?

 I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
 general, because VACUUM has to scan the whole table, and ANALYZE only
 a fixed-size subset of its pages.

 It sounds like you just said the opposite of what you wanted to say.

Yeah, I did.  Woops.  Let me try that again:

ANALYZE should be faster; reads only some pages.

VACUUM ANALYZE should be slower; reads them all.

Dunno why Pavel's seeing the opposite without more info.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Simon Riggs

On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com wrote:

 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

VACUUM ANALYZE scans the whole table sequentially.

ANALYZE accesses a random sample of data blocks. Random access is
slower than sequential access, so at some threshold of sample size and
sequential/random I/O speed ratio ANALYZE could become slower.

So it depends upon the hardware and the setting of stats_target.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] typo fix

2012-02-22 Thread Sandro Santilli

Typo in a comment...

--strk; 
From cfca9507df8612a48cad341653f8e9193c6b7e08 Mon Sep 17 00:00:00 2001
From: Sandro Santilli s...@keybit.net
Date: Wed, 22 Feb 2012 11:32:48 +0100
Subject: [PATCH] typo fix

---
 src/backend/commands/extension.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6ecbbc7..a9963ac 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -876,7 +876,7 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 
 		/*
 		 * If it's not relocatable, substitute the target schema name for
-		 * occcurrences of @extschema@.
+		 * occurrences of @extschema@.
 		 *
 		 * For a relocatable extension, we needn't do this.  There cannot be
 		 * any need for @extschema@, else it wouldn't be relocatable.
-- 
1.7.0.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect behaviour when using a GiST index on points

2012-02-22 Thread Alexander Korotkov

Attached patch fixes GiST behaviour without altering operators behaviour.

--
With best regards,
Alexander Korotkov.
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***
*** 836,842  gist_box_picksplit(PG_FUNCTION_ARGS)
  }
  
  /*
!  * Equality method
   *
   * This is used for both boxes and points.
   */
--- 836,843 
  }
  
  /*
!  * Equality method. Returns true only when boxes are exact same. We can't
!  * ignore small extents because of index consistency.
   *
   * This is used for both boxes and points.
   */
***
*** 848,856  gist_box_same(PG_FUNCTION_ARGS)
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1  b2)
! 		*result = DatumGetBool(DirectFunctionCall2(box_same,
!    PointerGetDatum(b1),
!    PointerGetDatum(b2)));
  	else
  		*result = (b1 == NULL  b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
--- 849,857 
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1  b2)
! 		*result = (b1-low.x == b2-low.x  b1-low.y == b2-low.y  
!    b1-high.x == b2-high.x  b1-high.y == b2-high.y)
!   ? TRUE : FALSE;
  	else
  		*result = (b1 == NULL  b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
***
*** 1326,1331  gist_point_consistent(PG_FUNCTION_ARGS)
--- 1327,1333 
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
  	bool		result;
  	StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+ 	BOX		   *query, *key;
  
  	switch (strategyGroup)
  	{
***
*** 1337,1348  gist_point_consistent(PG_FUNCTION_ARGS)
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 	  gist_box_consistent,
! 	  PointerGetDatum(entry),
! 	  PG_GETARG_DATUM(1),
! 	  Int16GetDatum(RTOverlapStrategyNumber),
! 			   0, PointerGetDatum(recheck)));
  			break;
  		case PolygonStrategyNumberGroup:
  			{
--- 1339,1356 
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			/* 
! 			 * This code repeats logic of on_ob which uses simple comparison
! 			 * rather than FP* functions.
! 			 */
! 			query = PG_GETARG_BOX_P(1);
! 			key = DatumGetBoxP(entry-key);
! 			
! 			*recheck = false;
! 			result = key-high.x = query-low.x  
! 	 key-low.x = query-high.x 
! 	 key-high.y = query-low.y  
! 	 key-low.y = query-high.y;
  			break;
  		case PolygonStrategyNumberGroup:
  			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

Robert Haas robertmh...@gmail.com writes:
 Anyway, to your point, I suppose I might hesitate to mark factorial
 leak-proof even if it didn't throw an error on overflow, because the
 time it takes to return an answer for larger inputs does grow rather
 rapidly.  But it's kind of a moot point because the error makes it not
 leak-proof anyway.  So maybe we're just splitting hairs here, however
 we decide to label this.

Speaking of hair-splitting ...

A strict interpretation of no errors can be thrown would, for example,
rule out any function that either takes or returns a varlena datatype,
since those are potentially going to throw out-of-memory errors if they
can't palloc a result value or a temporary detoasted input value.
I don't suppose that we want that, which means that this rule of thumb
is wrong in detail, and there had better be some more subtle definition
of what is okay or not, perhaps along the line of must not throw any
errors that reveal anything useful about the input value.  Have we got
such a definition?  (I confess to not having followed this patch very
closely.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

Simon Riggs si...@2ndquadrant.com writes:
 On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

 VACUUM ANALYZE scans the whole table sequentially.

 ANALYZE accesses a random sample of data blocks. Random access is
 slower than sequential access, so at some threshold of sample size and
 sequential/random I/O speed ratio ANALYZE could become slower.

That analysis is entirely wrong.  In the first place, although ANALYZE
doesn't read all the blocks, what it does read it reads in block number
order.  So it's not like there are random seeks all over the disk that
would not need to happen anyway.  In the second place, VACUUM ANALYZE
consists of two separate passes, VACUUM and then ANALYZE, and the second
pass is going to be random I/O by your definition no matter what.

If the filesystem is hugely biased towards sequential I/O for some
reason, and the VACUUM scan causes the whole table to become resident in
RAM where ANALYZE can read it for free, then I guess it might be
possible to arrive at Pavel's result.  But it would be an awfully narrow
corner case.  I cannot believe that his statement is true in general,
or even for a noticeably large fraction of cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread k...@rice.edu

On Wed, Feb 22, 2012 at 10:29:56AM -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com 
  wrote:
  I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
  expected so ANALYZE should be faster then VACUUM ANALYZE.
 
  VACUUM ANALYZE scans the whole table sequentially.
 
  ANALYZE accesses a random sample of data blocks. Random access is
  slower than sequential access, so at some threshold of sample size and
  sequential/random I/O speed ratio ANALYZE could become slower.
 
 That analysis is entirely wrong.  In the first place, although ANALYZE
 doesn't read all the blocks, what it does read it reads in block number
 order.  So it's not like there are random seeks all over the disk that
 would not need to happen anyway.  In the second place, VACUUM ANALYZE
 consists of two separate passes, VACUUM and then ANALYZE, and the second
 pass is going to be random I/O by your definition no matter what.
 
 If the filesystem is hugely biased towards sequential I/O for some
 reason, and the VACUUM scan causes the whole table to become resident in
 RAM where ANALYZE can read it for free, then I guess it might be
 possible to arrive at Pavel's result.  But it would be an awfully narrow
 corner case.  I cannot believe that his statement is true in general,
 or even for a noticeably large fraction of cases.
 
   regards, tom lane
 

Wouldn't a full sequential scan trigger the kernel read-ahead, which
might not trigger for the analyze block reads, even though they are
in order? That could account for the observation.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof




On 02/22/2012 10:21 AM, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

Anyway, to your point, I suppose I might hesitate to mark factorial
leak-proof even if it didn't throw an error on overflow, because the
time it takes to return an answer for larger inputs does grow rather
rapidly.  But it's kind of a moot point because the error makes it not
leak-proof anyway.  So maybe we're just splitting hairs here, however
we decide to label this.

Speaking of hair-splitting ...

A strict interpretation of no errors can be thrown would, for example,
rule out any function that either takes or returns a varlena datatype,
since those are potentially going to throw out-of-memory errors if they
can't palloc a result value or a temporary detoasted input value.
I don't suppose that we want that, which means that this rule of thumb
is wrong in detail, and there had better be some more subtle definition
of what is okay or not, perhaps along the line of must not throw any
errors that reveal anything useful about the input value.  Have we got
such a definition?  (I confess to not having followed this patch very
closely.)




Yeah.

Returning to the original point, I've come to the conclusion that pure 
isn't the right way to go. The trouble with leakproof is that it 
doesn't point to what it is that's not leaking, which is information 
rather than memory, as many might imagine (and I did) without further 
hints. I'm not sure any single English word would be as descriptive as 
I'd like.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck

2012-02-22 Thread Magnus Hagander

On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/
 =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 
 127.0.0.1 -p 5921 -U replication
 xlog start point: 2/AC4E2600
 pg_basebackup: starting background WAL receiver
 692447/692447 kB (100%), 1/1 tablespace
 xlog end point: 2/AC4E2600
 pg_basebackup: waiting for background process to finish streaming...
 pg_basebackup: base backup completed

 real    3m56.237s
 user    0m0.224s
 sys     0m0.936s

 (time is long because this is only test database with no traffic, so I had 
 to make some inserts for it to finish)

 The above article points out the problem of pg_basebackup from the standby:
 when -x stream is specified, pg_basebackup from the standby gets stuck if
 there is no traffic in the database.

 When -x stream is specified, pg_basebackup forks the background process
 for receiving WAL records during backup, takes an online backup and waits for
 the background process to end. The forked background process keeps receiving
 WAL records, and whenever it reaches end of WAL file, it checks whether it has
 already received all WAL files required for the backup, and exits if yes. 
 Which
 means that at least one WAL segment switch is required for pg_basebackup with
 -x stream option to end.

 In the backup from the master, WAL file switch always occurs at both start and
 end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
 above logic works fine even if there is no traffic. OTOH, in the backup from 
 the
 standby, while there is no traffic, WAL file switch is not performed at all. 
 So
 in that case, there is no chance that the background process reaches end of 
 WAL
 file, check whether all required WAL arrives and exit. At the end, 
 pg_basebackup
 gets stuck.

 To fix the problem, I'd propose to change the background process so that it
 checks whether all required WAL has arrived, every time data is received, even
 if end of WAL file is not reached. Patch attached. Comments?

This seems like a good thing in general.

Why does it need to modify pg_receivexlog, though? I thought only
pg_basebackup had tihs issue?

I guess it is because of the change of the API to
stream_continue_callback only? Looking at it after your patch,
stream_continue_callback and segment_finish_callback are the same.
Should we perhaps just fold them into a single
stream_continue_callback? Since you had to move the detect segment
end to the caller anyway?


Another question related to this - since we clearly don't need the
xlog switch in this case, should we make it conditional on the master
as well, so we don't switch unnecessarily there as well?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Jay Levitt


Greg Smith wrote:


Anyway, the patch does now includes several examples and a short primer on
PC clock hardware, to help guide what good results look like and why they've
been impossible to obtain in the past.  That's a bit Linux-centric, but the
hardware described covers almost all systems using Intel or AMD processors.
Only difference with most other operating systems is how aggressively they
have adopted newer timer hardware.  At least this gives a way to measure all
of them.


N.B.: Windows has at least two clock APIs, timeGetTime and 
QueryPerformanceCounters (and probably more, these days). They rely on 
different hardware clocks, and can get out of sync with each other; 
meanwhile, QueryPerformanceCounters can get out of sync with itself on 
(older?) multi-CPU boards.


So if you're doing high-res timing, it's good to make sure you aren't 
relying on two different clocks in different places... I ran into this with 
MIDI drivers years ago; and wrote a doc:


http://www.ultimatemetal.com/forum/equipment/315910-midi-latency-problem-nuendo.html#post6315034

and a clock-testing utility:

https://github.com/jaylevitt/miditime


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-22 Thread Kevin Grittner

Andrew Dunstan and...@dunslane.net wrote:
 
 Returning to the original point, I've come to the conclusion that
 pure isn't the right way to go. The trouble with leakproof is
 that it doesn't point to what it is that's not leaking, which is
 information rather than memory, as many might imagine (and I did)
 without further hints. I'm not sure any single English word would
 be as descriptive as I'd like.
 
Discreet?
 
http://www.merriam-webster.com/dictionary/discreet
 
I guess the risk is that people would confuse it with discrete.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: proof of concept patch for fixing quantified regex backrefs

Attached is as far as I've gotten with fixing depesz's complaint about
backrefs embedded within larger quantified expressions (the complaint
being that only the last match of the backref is checked properly).
This is per the analysis I posted at
https://sourceforge.net/tracker/index.php?func=detailaid=1115587group_id=10894atid=110894
to the effect that the engine really has to have an iteration subre
type.

The patch is incomplete because I have not written the code yet for the
shortest-match-first case, only longest-match-first.  Within that
restriction, it seems to work, though I have not yet tried the Tcl
regression tests on it.

I have to set this aside now and go focus on release tasks (like writing
release notes), so it's not going to be done in time to include in the
upcoming back-branch releases.  I have mixed feelings about whether to
treat it as a back-patchable bug fix when it's done ... it's certainly a
bug fix but the odds of introducing new issues seem higher than average.
So maybe it should only go into HEAD anyway.  Thoughts?

regards, tom lane


diff --git a/src/backend/regex/README b/src/backend/regex/README
index 3fd58c000119a24d61dff594baec28a27a4437f0..89ba6a62ea2f70bfda729f03efedba4b9b6fce9b 100644
*** a/src/backend/regex/README
--- b/src/backend/regex/README
*** consists of a tree of sub-expressions (
*** 102,116 
  either plain regular expressions (which are executed as DFAs in the manner
  described above) or back-references (which try to match the input to some
  previous substring).  Non-leaf nodes are capture nodes (which save the
! location of the substring currently matching their child node) or
! concatenation or alternation nodes.  At execution time, the executor
! recursively scans the tree.  At concatenation or alternation nodes,
! it considers each possible alternative way of matching the input string,
! ie each place where the string could be split for a concatenation, or each
! child node for an alternation.  It tries the next alternative if the match
! fails according to the child nodes.  This is exactly the sort of
! backtracking search done by a traditional NFA regex engine.  If there are
! many tree levels it can get very slow.
  
  But all is not lost: we can still be smarter than the average pure NFA
  engine.  To do this, each subre node has an associated DFA, which
--- 102,116 
  either plain regular expressions (which are executed as DFAs in the manner
  described above) or back-references (which try to match the input to some
  previous substring).  Non-leaf nodes are capture nodes (which save the
! location of the substring currently matching their child node),
! concatenation, alternation, or iteration nodes.  At execution time, the
! executor recursively scans the tree.  At concatenation, alternation, or
! iteration nodes, it considers each possible alternative way of matching the
! input string, that is each place where the string could be split for a
! concatenation or iteration, or each child node for an alternation.  It
! tries the next alternative if the match fails according to the child nodes.
! This is exactly the sort of backtracking search done by a traditional NFA
! regex engine.  If there are many tree levels it can get very slow.
  
  But all is not lost: we can still be smarter than the average pure NFA
  engine.  To do this, each subre node has an associated DFA, which
diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index 6b80140e90940b4a348c342090e26fdd0bc82c8f..487d0670d2d9c6ae4b9fc5e500765d136c4e441d 100644
*** a/src/backend/regex/regcomp.c
--- b/src/backend/regex/regcomp.c
*** parseqatom(struct vars * v,
*** 1036,1046 
  	/*--
  	 * Prepare a general-purpose state skeleton.
  	 *
! 	 *	  --- [s] ---prefix--- [begin] ---atom--- [end] rest--- [rp]
! 	 *	 /			  /
! 	 * [lp]  [s2] bypass-
  	 *
! 	 * where bypass is an empty, and prefix is some repetitions of atom
  	 *--
  	 */
  	s = newstate(v-nfa);		/* first, new endpoints for the atom */
--- 1038,1054 
  	/*--
  	 * Prepare a general-purpose state skeleton.
  	 *
! 	 * In the no-backrefs case, we want this:
  	 *
! 	 * [lp] --- [s] ---prefix--- [begin] ---atom--- [end] ---rest--- [rp]
! 	 *
! 	 * where prefix is some repetitions of atom.  In the general case we need
! 	 *
! 	 * [lp] --- [s] ---iterator--- [s2] ---rest--- [rp]
! 	 *
! 	 * where the iterator wraps around [begin] ---atom--- [end]
! 	 *
! 	 * We make the s state here for both cases; s2 is made below if needed
  	 *--
  	 */
  	s = newstate(v-nfa);		/* first, new endpoints for the atom */
*** parseqatom(struct vars * v,
*** 1051,1061 
  	NOERR();
  	atom-begin = s;
  	atom-end = s2;
! 	s = newstate(v-nfa);		/* and spots for prefix and bypass */
! 	s2 = newstate(v-nfa);
  	NOERR();
  	EMPTYARC(lp, s);
- 	EMPTYARC(lp, s2);
  	NOERR();
  
  	/* break

Re: [HACKERS] leakproof




On 02/22/2012 11:14 AM, Kevin Grittner wrote:

Andrew Dunstanand...@dunslane.net  wrote:


Returning to the original point, I've come to the conclusion that
pure isn't the right way to go. The trouble with leakproof is
that it doesn't point to what it is that's not leaking, which is
information rather than memory, as many might imagine (and I did)
without further hints. I'm not sure any single English word would
be as descriptive as I'd like.


Discreet?

http://www.merriam-webster.com/dictionary/discreet

I guess the risk is that people would confuse it with discrete.



Yes, too confusing.

silent might be better along those lines.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-02-22 Thread Ants Aasma

On Wed, Feb 22, 2012 at 4:43 PM, Greg Smith g...@2ndquadrant.com wrote:
 Attached are updated versions of this feature without the pg_test_timing
 tool part, since I broke that out into another discussion thread.  I've
 split the part that updates pg_stat_statistics out from the main feature
 too, separate patch attached to here (but I'm not reviewing that yet). Lots
 of bitrot since this was submitted, and yes I noticed that I've almost
 recreated earlier versions of this patch--by splitting off the parts that
 were developed later.

Thanks for the review and splitting. Sorry I didn't fix up the bit rot myself.

 Earlier discussion of this got side tracked on a few things, partly my
 fault. It's worth taking a look at what this provides before judging it too
 much.  It can demo well.

 The stated purpose is helping figure out what relations are gobbling up the
 most access time, presumably to optimize them and/or the storage they are
 on.  What do I put onto SSD is surely a popular request nowadays.

I should have stated the purpose more clearly. The original reason for
developing this patch was to figure out what queries are taking the
most time and why, specifically in the case where OS memory is a lot
larger than shared_buffers. Basically the following query to get a
quick overview where the bottlenecks are:
SELECT query, total_time, (time_read+time_write)/total_time AS
io_fraction FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

This of course hugely benefits from Peter's pg_stat_statements
normalization patch.

Tracking timings per relation was actually an afterthought.

 Now, the first critical question to ask is what additional information is
 this providing above the existing counters?  After all, it's possible to
 tell pgbench_accounts is the hotspot just from comparing heap_blks_read,
 right?

Like I said above, I find it mostly useful to see what is missing the
OS cache. With memory being as cheap as it is, a reasonably priced
server can have 128G of memory, while max recommended value for
shared_buffers is 8GB. It's quite likely to have tables that fit into
OS cache but not into shared_buffers, but it's not trivial to figure
out which those are.

 This run looks useful at providing the data wished for--that read times are
 slower per capita from the accounts table.  The first time I tried this I
 got a bizarre high number for pgbench_branches.heap_blks_time ; I'm not sure
 how reliable this is yet.  One problem that might be easy to fix is that the
 write timing info doesn't show in any of these system views, only in EXPLAIN
 and statement level ones.

I'm not sure about the source of the huge number, might instability in
the clock source. Have you tried running the monotonicity check for a
longer period while the system is under load? Another issue with the
current timing code is that gettimeofday isn't guaranteed to be
monotonic anyway, things like NTP adjustments can make time go
backwards. clock_gettime with CLOCK_MONOTONIC_RAW would be better, but
that's linux specific :(

The reason why I didn't add write timings to relation stats is that I
couldn't figure out what the semantics should be. It could be either
time spent waiting for this relations blocks to be written out or
time spent waiting for some other relations blocks to be written out
to free space for this relations block or maybe distribute the cost,
background writes could be included or excluded. Writes usually return
quickly, unless lots of possibly unrelated writes have dirtied enough
of OS cache, etc. I figured that what ever choices I made, they
wouldn't really help anyone diagnose anything. Having global write
timings in pg_stat_bgwriter might be useful, but I feel that is
something for another patch.

 I still think a full wait timing interface is the right long-term direction
 here.  It's hard to reject this idea when it seems to be working right now
 though, while more comprehensive wait storage is still at least a release
 off.   Opinions welcome, I'm still juggling this around now that I have it
 working again.

I agree that wait timing interface is the right direction. I have
thought a bit about it and could share some ideas - maybe I should
create a wiki page where the general design could be hashed out?

Anyway, the user visible information from this patch should be trivial
to extract from a general wait timing framework. Pushing my own agenda
a bit - having this patch in the current release would help to get
some field experience on any issues surrounding timing :)

 Some implementation notes.  This currently fails regression test
 create_function_3, haven't looked into why yet.

I'll take a look at it.

Thanks again.

--
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead


On 02/22/2012 11:10 AM, Jay Levitt wrote:

N.B.: Windows has at least two clock APIs, timeGetTime and
QueryPerformanceCounters (and probably more, these days). They rely on
different hardware clocks, and can get out of sync with each other;
meanwhile, QueryPerformanceCounters can get out of sync with itself on
(older?) multi-CPU boards.


The PostgreSQL wrapper in src/include/portability/instr_time.h uses 
QueryPerformanceCounter and QueryPerformanceFrequency in a way that the 
result can be used similarly to how deltas in UNIX dates are returned.


As far as I've been able to tell, there aren't any issues unique to 
Windows there.  Multiple cores can have their TSC results get out of 
sync on Windows for the same reason they do on Linux systems, and 
there's also the same frequency/temperature issues.


Newer versions of Windows can use TSC, older versions only use HPET or 
ACPI on older versions, and there's some ability to force bad TSC units 
to use ACPI instead:


http://blogs.msdn.com/b/psssql/archive/2010/08/18/how-it-works-timer-outputs-in-sql-server-2008-r2-invariant-tsc.aspx
http://blogs.msdn.com/b/psssql/archive/2010/08/18/how-it-works-timer-outputs-in-sql-server-2008-r2-invariant-tsc.aspx

There is a lot of questionable behavior if you try to use the better 
timers in Windows XP; check out the obnoxious foot note about XP SP3 at 
http://en.wikipedia.org/wiki/High_Precision_Event_Timer for example.


Since there's little you can do about--it's not even possible to figure 
out which timer is being used easily--I didn't see any easy ways to 
document what Windows does here, in a way that helped anyone.  For the 
most part, you get the best timer Windows has a driver for with 
QueryPerformanceCounter, and that's what the PostgreSQL code uses.  The 
information I gave about how newer systems can have more accurate timing 
is still accurate.  Maybe it would be useful to add something pointing 
out that newer Windows versions tend to support faster timers, too. 
That's something I assume people would guess from what I already wrote, 
it may be worth stating explicitly though.



--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-22 Thread Noah Misch

On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:
 Okay, so this patch fixes the truncation and wraparound issues through a
 mechanism much like pg_clog's: it keeps track of the oldest possibly
 existing multis on each and every table, and then during tuple freezing
 those are removed.  I also took the liberty to make the code remove
 multis altogether (i.e. resetting the IS_MULTI hint bit) when only the
 update remains and lockers are all gone.
 
 I also cleaned up the code in heapam so that there's a couple of tables
 mapping MultiXactStatus to LockTupleMode and back, and to heavyweight
 lock modes (the older patches used functions to do this, which was
 pretty ugly).  I had to add a little helper function to lock.c to make
 this work.  I made a rather large bunch of other minor changes to close
 minor bugs here and there.
 
 Docs have been added, as have new tests for the isolation harness, which
 I've ensured pass in both read committed and serializable modes; WAL
 logging for locking updated versions of a tuple, when an old one is
 locked due to an old snapshot, was also added; there's plenty of room
 for growth in the MultiXact flag bits; the bit that made tables with no
 keys lock the entire row all the time was removed; multiple places in
 code comments were cleaned up that referred to this feature as FOR KEY
 LOCK and ensured that it also mentions FOR KEY UPDATE; the pg_rowlocks,
 pageinspect, pg_controldata, pg_resetxlog utilities have been updated.

All of the above sounds great.  I especially like the growing test coverage.

 All in all, I think this is in pretty much final shape.  Only pg_upgrade
 bits are still missing.  If sharp eyes could give this a critical look
 and knuckle-cracking testers could give it a spin, that would be
 helpful.

Lack of pg_upgrade support leaves this version incomplete, because that
omission would constitute a blocker for beta 2.  This version changes as much
code compared to the version I reviewed at the beginning of the CommitFest as
that version changed overall.  In that light, it's time to close the books on
this patch for the purpose of this CommitFest; I'm marking it Returned with
Feedback.  Thanks for your efforts thus far.

On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
 On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
  * Columns that are part of the key
Noah thinks the set of columns should only consider those actually 
  referenced
by keys, not those that *could* be referenced.
 
 Well, do you disagree?  To me it's low-hanging fruit, because it isolates the
 UPDATE-time overhead of this patch to FK-referenced tables rather than all
 tables having a PK or PK-like index (often just all tables).

You have not answered my question above.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Marti Raudsepp

On Wed, Feb 22, 2012 at 18:44, Greg Smith g...@2ndquadrant.com wrote:
 As far as I've been able to tell, there aren't any issues unique to Windows
 there.  Multiple cores can have their TSC results get out of sync on Windows
 for the same reason they do on Linux systems, and there's also the same
 frequency/temperature issues.

Not on recent Linux kernel versions. Linux automatically detects when
the TSC is unstable (due to power management or out-of-sync
cores/sockets) and automatically falls back to the more expensive HPET
or ACPI methods.

e.g:
% dmesg |grep -i tsc
[0.00] Fast TSC calibration using PIT
[0.164075] checking TSC synchronization [CPU#0 - CPU#1]: passed.
[0.197062] Switching to clocksource tsc
[0.260960] Marking TSC unstable due to TSC halts in idle

Whether these tests cover 100% of the possible conditions, and whether
the detection has race conditions or not, I don't know.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: URI connection string support for libpq

This submission has turned into a bit of a mess.  I did the closest 
thing to a review the day after it was submitted; follow-up review 
attempts had issues applying the patch.  And it's been stuck there.  The 
patch is still fine, I just tested it out to pick this back up myself 
again.  I think this one is a good advocacy feature, and most of the 
hard work is done already.  Smooth some edge cases and this will be 
ready to go.


First thing:  the URI prefix.  It is possible to connect using a URI in 
Python+SQL Alchemy, which was mentioned before as not too relevant due 
to their also requiring a driver name.  As documented at 
http://docs.sqlalchemy.org/en/latest/core/engines.html and demonstrated 
at http://packages.python.org/Flask-SQLAlchemy/config.html , it is 
possible to leave off the driver part of the connection string.  That 
assumes the default driver, such that postgresql:// does the same as 
postgresql+psycopg2:// , sensibly.  That means we absolutely have an 
installed base of URI speaking developers split between postgresql:// 
(Python) and postgres:// (Ruby).  Given that, there really isn't a 
useful path forward that helps out all those developers without 
supporting both prefixes.  That's where this left off before, I just 
wanted to emphasize how clear that need seems now.


Next thing, also mentioned at that Flask page.  SQLite has standardized 
the idea that sqlite:absolute/path/to/foo.db is a URI pointing to a 
file.  Given that, I wonder if Alex's syntax for specifying a socket 
file name might adopt that syntax, rather than requiring the hex 
encoding:  postgresql://%2Fvar%2Fpgsql%2Ftmp/mydb  It's not a big deal, 
but it would smooth another rough edge toward making the Postgres URI 
implementation look as close as possible to others.


So far I've found only one syntax that I expected this to handle that it 
rejects:


psql -d postgresql://gsmith@localhost

It's picky about needing that third slash, but that shouldn't be hard to 
fix.  I started collecting up all the variants that do work as an 
initial shell script regression test, so that changes don't break 
something that already works.  Here are all the variations that already 
work, setup so that a series of 1 outputs is passing:


psql -d postgresql://gsmith@localhost:5432/gsmith -At -c SELECT 1
psql -d postgresql://gsmith@localhost/gsmith -At -c SELECT 1
psql -d postgresql://localhost:5432/gsmith -At -c SELECT 1
psql -d postgresql://localhost/gsmith -At -c SELECT 1
psql -d postgresql://gsmith@localhost:5432/ -At -c SELECT 1
psql -d postgresql://gsmith@localhost/ -At -c SELECT 1
psql -d postgresql://localhost:5432/ -At -c SELECT 1
psql -d postgresql://localhost/gsmith -At -c SELECT 1
psql -d postgresql://localhost/ -At -c SELECT 1
psql -d postgresql:/// -At -c SELECT 1
psql -d postgresql://%6Cocalhost/ -At -c SELECT 1
psql -d postgresql://localhost/gsmith?user=gsmith -At -c SELECT 1
psql -d postgresql://localhost/gsmith?user=gsmithport=5432 -At -c 
SELECT 1
psql -d postgresql://localhost/gsmith?user=gsmith\port=5432 -At -c 
SELECT 1


Replace all the gsmith with $USER to make this usable for others.

My eyes are starting to cross when I look at URI now, so that's enough 
for today.  If Alex wants to rev this soon, great; if not I have a good 
idea what I'd like to do with this next, regardless of that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead


On 02/22/2012 12:25 PM, Marti Raudsepp wrote:

On Wed, Feb 22, 2012 at 18:44, Greg Smithg...@2ndquadrant.com  wrote:

As far as I've been able to tell, there aren't any issues unique to Windows
there.  Multiple cores can have their TSC results get out of sync on Windows
for the same reason they do on Linux systems, and there's also the same
frequency/temperature issues.


Not on recent Linux kernel versions. Linux automatically detects when
the TSC is unstable (due to power management or out-of-sync
cores/sockets) and automatically falls back to the more expensive HPET
or ACPI methods.


From the patch:

Newer operating systems may check for the known TSC problems and
switch to a slower, more stable clock source when they are seen.
If your system supports TSC time but doesn't default to that, it
may be disabled for a good reason.

I ran into a case like you're showing here in my longer exploration of 
this at 
http://archives.postgresql.org/message-id/4edf1871.2010...@2ndquadrant.com 
 I stopped just short of showing what the TSC error message looked 
like.  I hoped that with the above and some examples showing dmesg | 
grep, that would be enough to lead enough people toward finding this on 
their own.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Jeroen Vermeulen


On 2012-02-22 16:29, Tom Lane wrote:

(Snip context)


VACUUM ANALYZE
consists of two separate passes, VACUUM and then ANALYZE, and the second
pass is going to be random I/O by your definition no matter what.


I don't suppose there's a case where the VACUUM (1) gets to delete lots 
and lots of rows that then don't need ANALYZE'ing, and (2) can do so 
without actually touching all those pages?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] determining a type oid from the name

Say I'm writing an extension X, and I want to process data values from 
another extension that creates type Y (e.g. an hstore), what's the best 
way to determine the Oid of type Y in my module X code? SPI code that 
runs select 'something'::Y and then examines the oid in SPI_tuptable? 
Or do we have a utility function I have missed that, given a type name 
and the current search path will give me back the type Oid?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name

2012-02-22 Thread Thom Brown

On 22 February 2012 18:00, Andrew Dunstan and...@dunslane.net wrote:
 Say I'm writing an extension X, and I want to process data values from
 another extension that creates type Y (e.g. an hstore), what's the best way
 to determine the Oid of type Y in my module X code? SPI code that runs
 select 'something'::Y and then examines the oid in SPI_tuptable? Or do we
 have a utility function I have missed that, given a type name and the
 current search path will give me back the type Oid?

Does this help?

test=# SELECT pg_typeof('4834.34'::numeric)::oid;
 pg_typeof
---
  1700
(1 row)

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name

2012-02-22 Thread Kevin Grittner

Thom Brown t...@linux.com wrote:
 Does this help?
 
 test=# SELECT pg_typeof('4834.34'::numeric)::oid;
  pg_typeof
 ---
   1700
 (1 row)
 
Wouldn't it be easier to do this instead?
 
test=# SELECT 'numeric'::regtype::oid;
 oid  
--
 1700
(1 row)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name

2012-02-22 Thread Thom Brown

On 22 February 2012 18:34, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Thom Brown t...@linux.com wrote:
 Does this help?

 test=# SELECT pg_typeof('4834.34'::numeric)::oid;
  pg_typeof
 ---
       1700
 (1 row)

 Wouldn't it be easier to do this instead?

 test=# SELECT 'numeric'::regtype::oid;
  oid
 --
  1700
 (1 row)

Well I may have misread the problem. I thought it was that for a
particular data value, the oid of the type of that value was needed.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Pavel Stehule

2012/2/22 Robert Haas robertmh...@gmail.com:
 On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier
 nicolas.barb...@gmail.com wrote:
 2012/2/22 Robert Haas robertmh...@gmail.com:

 On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:

 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

 But is not true. Why?

 I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
 general, because VACUUM has to scan the whole table, and ANALYZE only
 a fixed-size subset of its pages.

 It sounds like you just said the opposite of what you wanted to say.

 Yeah, I did.  Woops.  Let me try that again:

 ANALYZE should be faster; reads only some pages.

 VACUUM ANALYZE should be slower; reads them all.

 Dunno why Pavel's seeing the opposite without more info.

usual pattern in our application is

create table xx1 as select 
analyze xx1
create table xx2 as select  from xx1, 
analyze xx2
create table xx3 as select ... from xx3, 
analyze xx3
create table xx4 as select ... from xx1, ...

tables xx** are use as cache.

so we have to refresh statistic early.

in this situation - and I found so in this case VACUUM ANALYZE is
faster (30%) than ANALYZE. Size of xx** is usually between 500Kb and
8Kb

This is not usual pattern for OLTP - Application is strictly OLAP.

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Marti Raudsepp

On Wed, Feb 22, 2012 at 19:36, Greg Smith g...@2ndquadrant.com wrote:
 From the patch:

 Newer operating systems may check for the known TSC problems and
 switch to a slower, more stable clock source when they are seen.
 If your system supports TSC time but doesn't default to that, it
 may be disabled for a good reason.

Sorry, I was under the impression that the stability of Windows's
QueryPerformanceCounter() API is hardware-dependent, but I haven't
coded under Windows for a long time -- maybe it's improved in recent
versions.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Kevin Grittner

Pavel Stehule pavel.steh...@gmail.com wrote:
 
 usual pattern in our application is
 
 create table xx1 as select 
 analyze xx1
 create table xx2 as select  from xx1, 
 analyze xx2
 create table xx3 as select ... from xx3, 
 analyze xx3
 create table xx4 as select ... from xx1, ...
 
 tables xx** are use as cache.
 
 so we have to refresh statistic early.
 
 in this situation - and I found so in this case VACUUM ANALYZE is
 faster (30%) than ANALYZE. Size of xx** is usually between 500Kb
 and 8Kb
 
 This is not usual pattern for OLTP - Application is strictly OLAP.
 
Is the VACUUM ANALYZE step faster, or is the overall job faster if
VACUUM ANALYZE is run?  You may be running into the need to rewrite
pages at an inopportune time or order without the VACUUM.  Have you
tried getting a time VACUUM FREEZE ANALYZE on these cache tables
instead of plain VACUUM ANALYZE?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-02-22 Thread Peter Geoghegan

On 16 January 2012 06:28, Greg Smith g...@2ndquadrant.com wrote:
 One of the most useful bits of feedback on how well checkpoint I/O is going
 is the amount of time taken to sync files to disk.  Right now the only way
 to get that is to parse the logs.  The attached patch publishes the most
 useful three bits of data you could only get from log_checkpoints before out
 to pg_stat_bgwriter.  Easiest to just show an example:

This is a preliminary review of the patch. Attached is a revision of
the patch that clears up some minor bit-rot and style issues, just as
a convenience to everyone else who might like to try it out against
master who doesn't want to repeat my work. There were oid collisions
against master, and an upstream change in the pg_proc format with the
introduction of the proleakproof column. Regression tests pass (as
they must have in the prior revision, as the original authors modified
expected/rules.out). Note that I have deliberately avoided changing
the patch further than that, though I believe that further changes
will be necessary.

One beef that I have with the variable name m_write_ms is that ms
could equally well refer to microseconds or milliseconds, and these
mistakes are very common. Therefore, you might think about making the
fields of type instr_time, which is consistent with some other such
fields used within the statistics collector - this takes care of
issues with overflows, portability and confusion over units, to as
great an extent as is possible. I'm not even confident that this is
the right thing though, because PgStat_StatFuncEntry, for example,
*does* use PgStat_Countuer as the type for time values, in its case
apparently *microseconds* rather than milliseconds, even though the
pg_stat_user_functions view displays values in millisecond (it's
incorrectly commented). Even the use of instr_time within some
pgstat.h structs sees the value converted to microseconds for
transmission to the collector. It might be that since we don't really
accumulate time values here, it's better to just directly represent
that value as PgStat_Counter, but in that case I'd still clearly mark
the value as being milliseconds. In any case, I would have the columns
in milliseconds, but not ending with *_ms  within pg_stat_bgwriter,
for consistency with pg_stat_user_functions et al.

There is no reason to cast to uint64 when accumulating milliseconds,
as the lvalue is currently a PgStat_Counter.

Details of existing comment bug: The docs show that
pg_stat_*_functions accumulates time in units of milliseconds.
However, a PgStat_FunctionEntry payload is commented as sending the
time/self time to the stats collector in microseconds. So this
comment, in the existing code, is actually wrong:

PgStat_Counter f_time;  /* times in microseconds */
PgStat_Counter f_time_self;
} PgStat_StatFuncEntry;

Can someone commit a fix for that please? It could easily result in a
user-visible bug. Mixing units, and representing them as simple
integers and totally eschewing type safety is sort of a pet peeve of
mine, but consistency with the existing code is more important.

My general impression of the patch's code quality is that it is more
or less idomatic, and, as Greg said, simple; it generalises from the
style of existing code in LogCheckpointEnd(), as well as existing
fields within the PgStat_MsgBgWriter and PgStat_GlobalStats structs .

As for the substance of the patch, I am in general agreement that this
is a good idea. Storing the statistics in pg_stat_bgwriter is a more
flexible approach than is immediately apparent.  Users can usefully
calculate delta values, as part of the same process through which
checkpoint tuning is performed by comparing output from select now(),
* from pg_stat_bgwriter at different intervals. This also puts this
information within easy reach of monitoring tools.

So, I'd ask Greg and/or Jaime to produce a revision of the patch with
those concerns in mind, as well as fixing the md.c usage of
log_checkpoints. That variable is currently used in these places only:

src/backend/access/transam/xlog.c
80:bool log_checkpoints = false;
7694:* Note: because it is possible for log_checkpoints to change while a
7696:* log_checkpoints is currently off.
7835:   if (log_checkpoints)
8035:   if (log_checkpoints)
8229:* Note: because it is possible for log_checkpoints to change while a
8231:* log_checkpoints is currently off.
8236:   if (log_checkpoints)
8301:   if (log_checkpoints)
8305:   ereport((log_checkpoints ? LOG : DEBUG2),

src/backend/storage/smgr/md.c
1097:   if (log_checkpoints)
1105:   if (log_checkpoints  
(!INSTR_TIME_IS_ZERO(sync_start)))

I also agree with separating backend write times due to lagging
background writes, and the backend writes by design, as described by
Jeff. I'm not inclined to worry too much about the usability issues
surrounding the relative values of things like

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Pavel Stehule

2012/2/22 Kevin Grittner kevin.gritt...@wicourts.gov:
 Pavel Stehule pavel.steh...@gmail.com wrote:

 usual pattern in our application is

 create table xx1 as select 
 analyze xx1
 create table xx2 as select  from xx1, 
 analyze xx2
 create table xx3 as select ... from xx3, 
 analyze xx3
 create table xx4 as select ... from xx1, ...

 tables xx** are use as cache.

 so we have to refresh statistic early.

 in this situation - and I found so in this case VACUUM ANALYZE is
 faster (30%) than ANALYZE. Size of xx** is usually between 500Kb
 and 8Kb

 This is not usual pattern for OLTP - Application is strictly OLAP.

 Is the VACUUM ANALYZE step faster, or is the overall job faster if
 VACUUM ANALYZE is run?  You may be running into the need to rewrite
 pages at an inopportune time or order without the VACUUM.  Have you
 tried getting a time VACUUM FREEZE ANALYZE on these cache tables
 instead of plain VACUUM ANALYZE?

 -Kevin

vacuum freeze analyze is slower as expected. vacuum analyze is little
bit faster or same in any step then analyze.

I expected so just analyze should be significantly faster and it is not.

Tom's demonstration is enough for me. ANALYZE doesn't read complete
table, but uses random IO. VACUUM ANALYZE reads complete table, but it
uses seq IO and vacuum is fast (because it does nothing) in our case.

Thank You

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name




On 02/22/2012 01:36 PM, Thom Brown wrote:

On 22 February 2012 18:34, Kevin Grittnerkevin.gritt...@wicourts.gov  wrote:

Thom Brownt...@linux.com  wrote:

Does this help?

test=# SELECT pg_typeof('4834.34'::numeric)::oid;
  pg_typeof
---
   1700
(1 row)

Wouldn't it be easier to do this instead?

test=# SELECT 'numeric'::regtype::oid;
  oid
--
  1700
(1 row)

Well I may have misread the problem. I thought it was that for a
particular data value, the oid of the type of that value was needed.



Maybe I need to be more clear. The C code I'm writing will process 
composites. I want to cache the Oids of certain non-builtin types in the 
function info's fn_extra, and then be able to test whether or not the 
fields in the composites are of those types.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Simon Riggs

On Wed, Feb 22, 2012 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
 expected so ANALYZE should be faster then VACUUM ANALYZE.

 VACUUM ANALYZE scans the whole table sequentially.

 ANALYZE accesses a random sample of data blocks. Random access is
 slower than sequential access, so at some threshold of sample size and
 sequential/random I/O speed ratio ANALYZE could become slower.

 That analysis is entirely wrong.  In the first place, although ANALYZE
 doesn't read all the blocks, what it does read it reads in block number
 order.  So it's not like there are random seeks all over the disk that
 would not need to happen anyway.

Entirely right it would seem, since your later comments match my own.

The industry accepted description for non-sequential access is random
access whether or not the function that describes the movement is
entirely random. To argue otherwise is merely hairsplitting.

The disk access is not-sequential for ANALYZE. Not-sequential access
is slower on some hardware, and so given a large enough sample it can
account for the observed difference.

Additional access to the disk while the ANALYZE was running would
actually make it fully random, if anyone really cares.


 If the filesystem is hugely biased towards sequential I/O for some
 reason, and the VACUUM scan causes the whole table to become resident in
 RAM where ANALYZE can read it for free, then I guess it might be
 possible to arrive at Pavel's result.  But it would be an awfully narrow
 corner case.  I cannot believe that his statement is true in general,
 or even for a noticeably large fraction of cases.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

On Sun, Feb 19, 2012 at 01:13:10PM +0200, Peter Eisentraut wrote:
 The documentation of the pg_upgrade -l/--logfile option never made much
 sense to me:
 
   -l, --logfile=FILENAMElog session activity to file
 
 I don't know what session means for pg_upgrade, so I never used it.
 
 What it actually does is log the output of all the programs that
 pg_upgrade calls internally, such as pg_ctl, psql, vacuumdb,
 pg_resetxlog, to the specified file, which is quite useful for analyzing
 errors such as
 
 unable to connect to new postmaster started with the command: 
 /usr/lib/postgresql/9.1/bin/pg_ctl -w -l /dev/null -D 
 /var/lib/postgresql/9.1/main -o -p 5433 -b start  /dev/null 21
 
 where -l would have put something in the place of /dev/null.
 
 So what might be a better wording for this option?  Something like log
 output of internally called programs to file?

How about?

-l, --logfile=FILENAMElog internal activity to file

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

On Sun, Feb 19, 2012 at 01:24:34PM -0500, Robert Haas wrote:
 As a more general comment, I think that the way pg_upgrade does
 logging right now is absolutely terrible.  IME, it is utterly
 impossible to understand what has gone wrong with pg_upgrade without
 looking at the log file.  And by default, no log file is created.  So
 typically what happens is:
 
 - I run pg_upgrade.  It fails.
 - I rename the control file from the old cluster back to its original name.
 - I rerun pg_upgrade, this time with -l.  It fails again.
 - I read the log file, figure out what the problem is, and correct it.
 - I rename the control file from the old cluster back to its original
 name, again.
 - I run pg_upgrade a third time.
 - On a good day, it works, else go to step 5.
 
 One pretty obvious improvement would be: if pg_upgrade fails after
 renaming the control file for the old cluster out of the way - say,
 while loading the schema dump into the new cluster - have it RENAME
 THE OLD CONTROL FILE BACK before exiting.  But I also think the

The behavior you are seeing now is the paranoia inherent in pg_upgrade's
design.  Now that pg_upgrade is being used more, perhaps that needs to
be relaxed.  

However, remember we rename that control file to prevent the old cluster
from being run accidentally, which is particular important in link mode.
There might be some error cases that still would not restore the
location of that file if we have a revert behavior on error.  A more
normal behavior would be for pg_upgrade to rename the control file only
when the upgrade completes successfully.  

 logging needs improvement.  Right now, we studiously redirect both
 stdout and stderr to /dev/null; maybe it would be better to redirect
 stdout to /dev/null and NOT redirect stderr.  If that generates too
 much chatter in non-failure cases, then let's adjust the output of the
 commands pg_upgrade is invoking until it doesn't.  The actual cause of
 the failure, rather than pg_upgrade's fairly-useless gloss on it,
 ought to be visible right away, at least IMHO.

Well, we have a -d option for debug;  we could modify that to have debug
levels.  Also, from the command line, it is difficult to have multiple
process write into a single file, so that isn't going work to have
pg_upgrade and the server logging to the same file on Windows.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name

Andrew Dunstan and...@dunslane.net writes:
 Maybe I need to be more clear. The C code I'm writing will process 
 composites. I want to cache the Oids of certain non-builtin types in the 
 function info's fn_extra, and then be able to test whether or not the 
 fields in the composites are of those types.

What's your basis for identifying those types in the first place?
Name?  Doesn't seem terribly robust if the other extension can be
installed in some random schema.  But anyway, something in
parser/parse_type.c ought to help you with that --- maybe
parseTypeString?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-22 Thread Alvaro Herrera


Excerpts from Bruce Momjian's message of mié feb 22 17:01:10 -0300 2012:
 On Sun, Feb 19, 2012 at 01:24:34PM -0500, Robert Haas wrote:

  One pretty obvious improvement would be: if pg_upgrade fails after
  renaming the control file for the old cluster out of the way - say,
  while loading the schema dump into the new cluster - have it RENAME
  THE OLD CONTROL FILE BACK before exiting.  But I also think the
 
 The behavior you are seeing now is the paranoia inherent in pg_upgrade's
 design.  Now that pg_upgrade is being used more, perhaps that needs to
 be relaxed.  
 
 However, remember we rename that control file to prevent the old cluster
 from being run accidentally, which is particular important in link mode.

... but if the upgrade failed, clearly this shouldn't be a problem.  I
agree with Robert, and was bit by this last week -- in case of any error
during the procedure, the control file should be renamed back to its
original name.

 There might be some error cases that still would not restore the
 location of that file if we have a revert behavior on error.  A more
 normal behavior would be for pg_upgrade to rename the control file only
 when the upgrade completes successfully.  

Not sure about this.  If the upgrades completes successfully and the
file is not renamed at the last minute due to some error, that would be
a problem as well, because now the old cluster would happily run and
perhaps corrupt the data files from under the new cluster.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REASSIGN OWNED lacks support for FDWs

2012-02-22 Thread Alvaro Herrera


Excerpts from Etsuro Fujita's message of mié feb 22 05:37:36 -0300 2012:

 I did some tests.  The results look good to me.  Please find attached a
 logfile.

Thanks.

 My only concern on the patch is
 
 +static void
 +AlterForeignServerOwner_internal(Relation rel, HeapTuple tup, Oid
 newOwnerId)
 +{
 +Form_pg_foreign_server form;
 
 -srvId = HeapTupleGetOid(tup);
  form = (Form_pg_foreign_server) GETSTRUCT(tup);
 
  if (form-srvowner != newOwnerId)
 @@ -366,10 +388,15 @@ AlterForeignServerOwner(const char *name, Oid
 newOwnerId)
  /* Superusers can always do it */
  if (!superuser())
  {
 
 I wonder if superusers can always do it.  For example, is it OK for
 superusers to change the ownership of a foreign server owned by old_role
 to new_role that doesn't have USAGE privilege on its foreign data wrapper.

Well, permission checking are just what they were before the patch.  I
did not change them here.  I didn't participate in the discussions that
led to the current behavior, but as far as I know the guiding principle
here is that superusers always can do whatever they please.  Maybe what
you point out is a bug in the behavior (both before and after my patch),
but if so, please raise it separately.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name




On 02/22/2012 03:20 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

Maybe I need to be more clear. The C code I'm writing will process
composites. I want to cache the Oids of certain non-builtin types in the
function info's fn_extra, and then be able to test whether or not the
fields in the composites are of those types.

What's your basis for identifying those types in the first place?
Name?  Doesn't seem terribly robust if the other extension can be
installed in some random schema.  But anyway, something in
parser/parse_type.c ought to help you with that --- maybe
parseTypeString?





Thanks, that might do the trick.

I fully agree it's not bulletproof, but I'm not sure what alternative 
there is.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

On Wed, Feb 22, 2012 at 05:22:29PM -0300, Alvaro Herrera wrote:
 
 Excerpts from Bruce Momjian's message of mié feb 22 17:01:10 -0300 2012:
  On Sun, Feb 19, 2012 at 01:24:34PM -0500, Robert Haas wrote:
 
   One pretty obvious improvement would be: if pg_upgrade fails after
   renaming the control file for the old cluster out of the way - say,
   while loading the schema dump into the new cluster - have it RENAME
   THE OLD CONTROL FILE BACK before exiting.  But I also think the
  
  The behavior you are seeing now is the paranoia inherent in pg_upgrade's
  design.  Now that pg_upgrade is being used more, perhaps that needs to
  be relaxed.  
  
  However, remember we rename that control file to prevent the old cluster
  from being run accidentally, which is particular important in link mode.
 
 ... but if the upgrade failed, clearly this shouldn't be a problem.  I
 agree with Robert, and was bit by this last week -- in case of any error
 during the procedure, the control file should be renamed back to its
 original name.
 
  There might be some error cases that still would not restore the
  location of that file if we have a revert behavior on error.  A more
  normal behavior would be for pg_upgrade to rename the control file only
  when the upgrade completes successfully.  
 
 Not sure about this.  If the upgrades completes successfully and the
 file is not renamed at the last minute due to some error, that would be
 a problem as well, because now the old cluster would happily run and
 perhaps corrupt the data files from under the new cluster.

Well, the basic problem is that the user, before pg_upgrade started,
installed a new cluster that works.  If we rename the old control, but
rename it back on failure, there are cases we will miss, kill like -9 or
a server crash, and it will not be obvious to them that the control file
was renamed.

Of course, if we only rename on success, and there is kill -9 or server
crash, the old cluster is still start-able, like the new one.

One good argument for the rename early is that on a server crash, the
system is probably going to restart the database automatically, and that
means the old server.

Right now we have a clear message that they need to rename the control
file to start the old server.  Not sure what the new wording would look
like --- let me try.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-22 Thread Peter Eisentraut

On ons, 2012-02-22 at 14:38 -0500, Bruce Momjian wrote:
 How about?
 
   -l, --logfile=FILENAMElog internal activity to file

That sounds better.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-22 Thread Peter Eisentraut

On sön, 2012-02-19 at 13:24 -0500, Robert Haas wrote:
 But I also think the
 logging needs improvement.  Right now, we studiously redirect both
 stdout and stderr to /dev/null; maybe it would be better to redirect
 stdout to /dev/null and NOT redirect stderr.  If that generates too
 much chatter in non-failure cases, then let's adjust the output of the
 commands pg_upgrade is invoking until it doesn't.

That should be achievable for calls to psql and vacuumdb, say, but what
would you do with the server logs?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REASSIGN OWNED lacks support for FDWs

2012-02-22 Thread Alvaro Herrera


Excerpts from Tom Lane's message of mar feb 21 21:30:39 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Alvaro Herrera's message of mar feb 21 15:54:03 -0300 2012:
  Excerpts from Tom Lane's message of lun feb 20 12:37:45 -0300 2012:
  As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php
  there is no switch case in shdepReassignOwned for foreign data wrappers.
 
  I'm gonna take a stab at fixing this now (the simple way).
 
  Here's the patch I have; last minute opinions.  I intend to push a
  backpatch to 9.1 and 9.0.  8.4 has the same problem, but since Heikki
  punted in e356743f3ed45c36dcc4d0dbf6c1e8751b3d70b5, I'm not going to
  bother either.
 
 Looks roughly like what I expected, but I haven't tested it.

Thanks, pushed.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: PL/pgPSM for 9.3

2012-02-22 Thread Pavel Stehule

Hello

I am sending a proposal for PSM language support. It is early maybe. I
would to have a patch for first 9.3 commitfest.

Proposal PL/pgPSM

I propose to integrate a PSM language into the core. This language is
defined as part of ANSI SQL - SQL/PSM and is used in some well known
databases like DB2, Terradata and some other less known RDBMS like
MonetDB. A The proposed implementation is based on the same
architecture as the current PL/pgSQL interpreter - interpretation of
AST with integration of an SQL parser and expression executor. Reasons
why to use same architecture are: reuse some parts of interpreter and
well experience with the current interpreter.

One year ago I wrote a PSM compiler to pcode and pcode interpreter -
PSM0. This project showed that using pcode doesn't carry any speedup
over AST interpret - the bottleneck was an expression execution - and
PostgreSQL executor is the part that we would not substitute, so we
can use a simpler and well known AST interpreter.

Language is specified by the SQL standard SQL/PSM - so I'll describe
main differences against PL/pgSQL:

* Exception handlers are subroutines (continue handlers).

* Warnings can be handled like an exception - handling warnings is
important and well known pattern

BEGIN
DECLARE success boolean DEFAULT true;
DECLARE _x varchar;
DECLARE c CURSOR FOR SELECT x FROM tab;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET success = false;
OPEN c;
FETCH c INTO _x;
WHILE success DO
  -- process _x
  FETCH c INTO _x;
END WHILE;
CLOSE c;
END;

* Declaration of variables should be based on SQL query result - and
because we would have to execute and check queries only at runtime, we
have to use a little bit different rules for variables - variables
should not be declared  before they are used - it is possible, because
there are different syntax for assign statement. Automatic variables
should be read only.

BEGIN
DECLARE s text;
FOR SELECT curse FROM courses
DO
  SET s = COALESCE(s || ',' curse, course);
END FOR;

Other differences are minor.

Request:

* reusing important parts of plpgsql executor - simple expressions,
using variables, assign results to variables

* support “check function” statement

* be as near to the standard as possible

* implement well known patterns that are not in standard, but that
have impact on speed or usability and are used in well known
implementations (SQLCODE variable, multi assign).

-- little bit more effective iteration (well known pattern)

BEGIN
OPEN c;
FETCH c INTO _x;
WHILE SQLCODE = 0 DO
  -- process _x;
  FETCH c INTO _x;
END WHILE;
CLOSE c;
END;

* check strictly expression syntax  don't allow plpgsql's  var :=
SELECT 10; var := a FROM tab  and similar, only  set var = 10; or
set var = (SELECT 10); or set var = (SELECT a FROM tab) should be
allowed

LIMITS:

* PostgreSQL doesn't support procedures - statement CALL will not be
supported (I would like to see CALL statement as PostgreSQL statement,
not only PSM emulation)

* I don't plan to implement PSM into SQL parser in first step. Reason
- this feature requries an introduction of a DELIMITERs (like MySQL or
DB2). This concept is not adapted in PostgreSQL  and PostgreSQL design
is more user friendly for work from command line tools. This can be
enhanced in future,  if we find a way to extend the bison parser.
Using $$ delimiters for function body doesn't block some future
enhancing.

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-22 Thread Marc Munro

On Wed, 2012-02-22 at 12:44 -0400, Andrew Dunstan wrote:

 Returning to the original point, I've come to the conclusion that
 pure 
 isn't the right way to go. The trouble with leakproof is that it 
 doesn't point to what it is that's not leaking, which is information 
 rather than memory, as many might imagine (and I did) without further 
 hints. I'm not sure any single English word would be as descriptive as
 I'd like.

As the developer of veil I feel marginally qualified to bikeshed here:
how about silent?  A silent function being one that will not blab.

There are also quite a few synonyms in the thesaurus for trustworthy.  I
kind of like honorable or righteous myself.

__
Marc


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] leakproof




On 02/22/2012 04:29 PM, Marc Munro wrote:

On Wed, 2012-02-22 at 12:44 -0400, Andrew Dunstan wrote:


Returning to the original point, I've come to the conclusion that
pure
isn't the right way to go. The trouble with leakproof is that it
doesn't point to what it is that's not leaking, which is information
rather than memory, as many might imagine (and I did) without further
hints. I'm not sure any single English word would be as descriptive as
I'd like.

As the developer of veil I feel marginally qualified to bikeshed here:
how about silent?  A silent function being one that will not blab.


I also made this suggestion later in the day.



There are also quite a few synonyms in the thesaurus for trustworthy.  I
kind of like honorable or righteous myself.



Let's not go there.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] determining a type oid from the name

2012-02-22 Thread Dimitri Fontaine

Andrew Dunstan and...@dunslane.net writes:
 I fully agree it's not bulletproof, but I'm not sure what alternative there
 is.

If you know the type has been installed as an extension you can look at
the extension's content in pg_depend, much like \dx+ does, limiting to
only types whose name matches.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

On Wed, Feb 22, 2012 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Anyway, to your point, I suppose I might hesitate to mark factorial
 leak-proof even if it didn't throw an error on overflow, because the
 time it takes to return an answer for larger inputs does grow rather
 rapidly.  But it's kind of a moot point because the error makes it not
 leak-proof anyway.  So maybe we're just splitting hairs here, however
 we decide to label this.

 Speaking of hair-splitting ...

 A strict interpretation of no errors can be thrown would, for example,
 rule out any function that either takes or returns a varlena datatype,
 since those are potentially going to throw out-of-memory errors if they
 can't palloc a result value or a temporary detoasted input value.
 I don't suppose that we want that, which means that this rule of thumb
 is wrong in detail, and there had better be some more subtle definition
 of what is okay or not, perhaps along the line of must not throw any
 errors that reveal anything useful about the input value.  Have we got
 such a definition?  (I confess to not having followed this patch very
 closely.)

Not exactly; I've kind of been playing it by ear, but I agree that
out-of-memory errors based on the input value being huge are probably
not something we want to stress out about too much.  In theory you
could probe for the approximate size of the value by using up nearly
all the memory on the system, leaving a varying amount behind, and
then see whether you get an out of memory error.  But again, if people
are going to that kind of trouble to ferret out just the approximate
size of the data, it was probably a bad idea to let them log into the
database at all, in any way, in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The industry accepted description for non-sequential access is random
 access whether or not the function that describes the movement is
 entirely random. To argue otherwise is merely hairsplitting.

I don't think so.  For example, a bitmap index scan contrives to speed
things up by arranging for the table I/O to happen in ascending block
number order, with skips, rather than in random order, as a plain
index scan would do, and that seems to be a pretty effective
technique.  Except to the extent that it interferes with the kernel's
ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
and 5 than to read blocks 1, 2, 4, and 5.  Not reading block 3 can't
require more effort than reading it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] swapcache-style cache?

2012-02-22 Thread james

Has anyone considered managing a system like the DragonFLY swapcache for 
a DBMS like PostgreSQL?


ie where the admin can assign drives with good random read behaviour 
(but perhaps also-ran random write) such as SSDs to provide a cache for 
blocks that were dirtied, with async write that hopefully writes them 
out before they are forcibly discarded.


And where a cache fail (whether by timeout, hard fail, or CRC fail) just 
means having to go back to the real transactional storage.


I'd been thinking that swapcache would help where the working set won't 
fit in RAM, also L2ARC on Solaris - but it seems to me that there is no 
reason not to allow the DBMS to manage the set-aside area itself where 
it is given either access to the raw device or to a pre-sized file on 
the device it can map in segments.


While L2ARC is obviously very heavyweight and entwined in ZFS, 
Dragonfly's swapcache seems to me remarkably elegant and, it would seem, 
very effective.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

On Wed, Feb 22, 2012 at 3:51 PM, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-02-19 at 13:24 -0500, Robert Haas wrote:
 But I also think the
 logging needs improvement.  Right now, we studiously redirect both
 stdout and stderr to /dev/null; maybe it would be better to redirect
 stdout to /dev/null and NOT redirect stderr.  If that generates too
 much chatter in non-failure cases, then let's adjust the output of the
 commands pg_upgrade is invoking until it doesn't.

 That should be achievable for calls to psql and vacuumdb, say, but what
 would you do with the server logs?

I don't know.  It might be less of an issue, though.  I mean, IME,
what typically happens is that psql fails to restore the dump, either
because it can't connect to the new database or because it's confused
by some stupid case that isn't handled well.  So even if we could just
improve the error handling to report those types of failures more
transparently, I think it would be a big improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: proof of concept patch for fixing quantified regex backrefs

On Wed, Feb 22, 2012 at 11:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Attached is as far as I've gotten with fixing depesz's complaint about
backrefs embedded within larger quantified expressions (the complaint
being that only the last match of the backref is checked properly).
This is per the analysis I posted at
https://sourceforge.net/tracker/index.php?func=detailaid=1115587group_id=10894atid=110894
to the effect that the engine really has to have an iteration subre
type.

The patch is incomplete because I have not written the code yet for the
shortest-match-first case, only longest-match-first. Within that
restriction, it seems to work, though I have not yet tried the Tcl
regression tests on it.

I have to set this aside now and go focus on release tasks (like writing
release notes), so it's not going to be done in time to include in the
upcoming back-branch releases. I have mixed feelings about whether to
treat it as a back-patchable bug fix when it's done ... it's certainly a
bug fix but the odds of introducing new issues seem higher than average.
So maybe it should only go into HEAD anyway. Thoughts?

+1 for just doing it in HEAD. This strikes me as the sort of thing
that has a higher-than-average chance of breaking applications in
subtle ways, and that's exactly the sort of thing that we don't want
to do in a minor release, especially in older branches where people
are presumably not upgrading precisely because what they have today
works for them. If someone really needs the fix, they can always
back-port it themselves... I don't think that code has changed much.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-22 Thread Peter Geoghegan

I decided that it would be worth benchmarking this patch.
Specifically, I tested:

master, as a basis of comparison

checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'on'

checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'off'

This test was performed using pgbench-tools. At different client
counts and scaling factors 1,10,100, performance of an update.sql
workload was tested.

This benchmark used Greg Smith's toy server. As he put it recently:

The main change to the 8 hyperthreaded core test server (Intel
i7-870) for this year is bumping it from 8GB to 16GB of RAM, which
effectively doubles the scale I can reach before things slow
dramatically. However, while Greg used scientific Linux for his
recent batch of performance numbers, the box was booted into Debian
for this, which used Kernel 2.6.32, FWIW. Didn't bother with a
separate disk for WAL.

I put shared_buffers at 1GB, and checkpoint_segments at 50. I took the
additional precaution of initdb'ing for each set, lest there be some
kind of contamination between sets, which necessitated doing some
additional work since I couldn't very well expect the results
database to persist. Different sets of figures from different runs
where dumped and restored, before finally being combined so that
pgbench-tools could produce it's regular report.

I have attached a config for pgbench-tools, so that others may
recreate my work here. I also attach the most relevant image,
comparing each test set across scaling levels. I'll make a pdf of the
full report available if that would be useful.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


config
Description: Binary data
attachment: scaling-sets-page-checksums.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

Andrew Dunstan and...@dunslane.net writes:
 On 02/22/2012 04:29 PM, Marc Munro wrote:
 As the developer of veil I feel marginally qualified to bikeshed here:
 how about silent?  A silent function being one that will not blab.

 I also made this suggestion later in the day.

SILENT isn't a bad idea.  It seems to lead the mind in the right
direction, or at least not encourage people to guess the wrong meaning.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI rw-conflicts and 2PC

2012-02-22 Thread Jeff Davis

On Tue, 2012-02-14 at 19:32 -0500, Dan Ports wrote:
 On Tue, Feb 14, 2012 at 09:27:58AM -0600, Kevin Grittner wrote:
  Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
   On 14.02.2012 04:57, Dan Ports wrote:
   The easiest answer would be to just treat every prepared
   transaction found during recovery as though it had a conflict in
   and out. This is roughly a one-line change, and it's certainly
   safe.

+1.

I don't even see this as much of a problem. Prepared transactions
hanging around for arbitrary periods of time cause all kinds of problems
already. Those using them need to be careful to resolve them quickly --
and if there's a crash involved, I think it's reasonable to say they
should be resolved before continuing normal online operations.

 Hmm, it occurs to me if we have to abort a transaction due to
 serialization failure involving a prepared transaction, we might want
 to include the prepared transaction's gid in the errdetail. 

I like this idea.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

On Wed, Feb 22, 2012 at 05:49:26PM -0500, Robert Haas wrote:
 On Wed, Feb 22, 2012 at 3:51 PM, Peter Eisentraut pete...@gmx.net wrote:
  On sön, 2012-02-19 at 13:24 -0500, Robert Haas wrote:
  But I also think the
  logging needs improvement.  Right now, we studiously redirect both
  stdout and stderr to /dev/null; maybe it would be better to redirect
  stdout to /dev/null and NOT redirect stderr.  If that generates too
  much chatter in non-failure cases, then let's adjust the output of the
  commands pg_upgrade is invoking until it doesn't.
 
  That should be achievable for calls to psql and vacuumdb, say, but what
  would you do with the server logs?
 
 I don't know.  It might be less of an issue, though.  I mean, IME,
 what typically happens is that psql fails to restore the dump, either
 because it can't connect to the new database or because it's confused
 by some stupid case that isn't handled well.  So even if we could just
 improve the error handling to report those types of failures more
 transparently, I think it would be a big improvement.

Well, on Unix, it is easy to redirect the server logs to the same place
as the pg_upgrade logs.  That doesn't help?  How would we improve the
reporting of SQL restore failures?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

On Wed, Feb 22, 2012 at 10:50:07PM +0200, Peter Eisentraut wrote:
 On ons, 2012-02-22 at 14:38 -0500, Bruce Momjian wrote:
  How about?
  
  -l, --logfile=FILENAMElog internal activity to file
 
 That sounds better.

Done.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-22 Thread David Fetter

On Wed, Feb 22, 2012 at 11:17:53PM +, Peter Geoghegan wrote:
 I decided that it would be worth benchmarking this patch.
 Specifically, I tested:
 
 master, as a basis of comparison
 
 checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'on'
 
 checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'off'
 
 This test was performed using pgbench-tools. At different client
 counts and scaling factors 1,10,100, performance of an update.sql
 workload was tested.

Looks interesting.  Could you get some error bars around the numbers
plotted, and possibly some scaling factors between 10 and 100?

For the former, I'm looking for whether those changes are within
ordinary variation, and in the latter, some better idea of what the
curve looks like.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Commit a445cb92 not tested without OpenSSL support?

2012-02-22 Thread Affan Salman

Hey folks

It appears that the commit a445cb92ef5b3a31313ebce30e18cc1d6e0bdecb
causes ld to bail when building *without* OpenSSL support:

utils/misc/guc.o:(.data+0x4d80): undefined reference to `ssl_cert_file'
utils/misc/guc.o:(.data+0x4ddc): undefined reference to `ssl_key_file'
utils/misc/guc.o:(.data+0x4e38): undefined reference to `ssl_ca_file'
utils/misc/guc.o:(.data+0x4e94): undefined reference to `ssl_crl_file'

Tested; the conditional compilation predicated upon #ifdef USE_SSL in
be-secure.c is satisfied with configure (--)with(-)openssl to succeed.

 Regards, Affan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

On Wed, Feb 22, 2012 at 6:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 02/22/2012 04:29 PM, Marc Munro wrote:
 As the developer of veil I feel marginally qualified to bikeshed here:
 how about silent?  A silent function being one that will not blab.

 I also made this suggestion later in the day.

 SILENT isn't a bad idea.  It seems to lead the mind in the right
 direction, or at least not encourage people to guess the wrong meaning.

TACITURN.  UNLOQUACIOUS.  TIGHTLIPPED.  LACONIC.

OK, OK, I'm joking...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] typo fix

On Wed, Feb 22, 2012 at 8:47 AM, Sandro Santilli s...@keybit.net wrote:
 Typo in a comment...

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Displaying accumulated autovacuum cost

On Tue, Feb 21, 2012 at 11:55 PM, Greg Smith g...@2ndquadrant.com wrote:
 I just took this for spin.  Everything I tried worked, docs built and read
 fine.  The description of how dirty differs from written is a bit
 cryptic, but I don't see an easy way to do better without a whole new
 section on that topic.  Once the extension upgrade questions are sorted out,
 I'd say this is ready to commit.  Example I have at the bottom here shows a
 case where this is a big improvement over the existing tracking.  I think
 this is a must-have improvement if we're going to advocate using
 pg_stat_statements for more things.

Glad you like it; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2