Re: [HACKERS] REASSIGN OWNED lacks support for FDWs
(2012/02/22 9:30), Tom Lane wrote: Alvaro Herreraalvhe...@commandprompt.com writes: Excerpts from Alvaro Herrera's message of mar feb 21 15:54:03 -0300 2012: Excerpts from Tom Lane's message of lun feb 20 12:37:45 -0300 2012: As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php there is no switch case in shdepReassignOwned for foreign data wrappers. I'm gonna take a stab at fixing this now (the simple way). Here's the patch I have; last minute opinions. I intend to push a backpatch to 9.1 and 9.0. 8.4 has the same problem, but since Heikki punted in e356743f3ed45c36dcc4d0dbf6c1e8751b3d70b5, I'm not going to bother either. Looks roughly like what I expected, but I haven't tested it. I did some tests. The results look good to me. Please find attached a logfile. My only concern on the patch is +static void +AlterForeignServerOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId) +{ + Form_pg_foreign_server form; - srvId = HeapTupleGetOid(tup); form = (Form_pg_foreign_server) GETSTRUCT(tup); if (form-srvowner != newOwnerId) @@ -366,10 +388,15 @@ AlterForeignServerOwner(const char *name, Oid newOwnerId) /* Superusers can always do it */ if (!superuser()) { I wonder if superusers can always do it. For example, is it OK for superusers to change the ownership of a foreign server owned by old_role to new_role that doesn't have USAGE privilege on its foreign data wrapper. Best regards, Etsuro Fujita $ psql postgres psql (9.2devel) Type help for help. postgres=# CREATE ROLE reassign_fdw_user LOGIN SUPERUSER; CREATE ROLE postgres=# SET SESSION AUTHORIZATION 'reassign_fdw_user'; SET postgres=# CREATE ROLE regress_test_role; CREATE ROLE postgres=# CREATE ROLE regress_test_role2; CREATE ROLE postgres=# CREATE ROLE regress_test_role_super SUPERUSER; CREATE ROLE postgres=# CREATE FOREIGN DATA WRAPPER foo; CREATE FOREIGN DATA WRAPPER postgres=# \dew List of foreign-data wrappers Name | Owner | Handler | Validator --+---+-+--- foo | reassign_fdw_user | - | - (1 row) postgres=# REASSIGN OWNED BY reassign_fdw_user TO regress_test_role; ERROR: permission denied to change owner of foreign-data wrapper foo HINT: The owner of a foreign-data wrapper must be a superuser. postgres=# REASSIGN OWNED BY reassign_fdw_user TO regress_test_role_super; REASSIGN OWNED postgres=# \dew List of foreign-data wrappers Name | Owner | Handler | Validator --+-+-+--- foo | regress_test_role_super | - | - (1 row) postgres=# REASSIGN OWNED BY regress_test_role_super TO regress_test_role; ERROR: permission denied to change owner of foreign-data wrapper foo HINT: The owner of a foreign-data wrapper must be a superuser. postgres=# REASSIGN OWNED BY regress_test_role_super TO reassign_fdw_user; REASSIGN OWNED postgres=# GRANT USAGE ON FOREIGN DATA WRAPPER foo TO regress_test_role; GRANT postgres=# SET ROLE regress_test_role; SET postgres= CREATE SERVER t1 FOREIGN DATA WRAPPER foo; CREATE SERVER postgres= REASSIGN OWNED by regress_test_role TO regress_test_role2; ERROR: permission denied to reassign objects postgres= RESET ROLE; RESET postgres=# REASSIGN OWNED by regress_test_role TO regress_test_role2; REASSIGN OWNED postgres=# \des+ List of foreign servers Name | Owner| Foreign-data wrapper | Access privileges | Type | V ersion | FDW Options | Description --++--+---+--+-- ---+-+- t1 | regress_test_role2 | foo | | | | | (1 row) postgres=# -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Extensions default full version (was: [HACKERS] Displaying accumulated autovacuum cost)
Greg Smith g...@2ndquadrant.com writes: On 02/21/2012 04:44 PM, Dimitri Fontaine wrote: The solution would be to be able to create hstore 1.1 from 1.0 automatically and I sent over a very simple patch to do that, albeit after the deadline for the current CF (that's why it's not listed). Maybe that's simple enough to be considered? (re-attaching here) I can't find any message that matches this description--from you, touches extensions in this way, and was submitted after the CF deadline. Help? If http://archives.postgresql.org/message-id/m2k44m9oyo@2ndquadrant.fr that's something that ripples out to impacting how changes to upgraded extensions should look, that should get its own thread I think. This one is pretty deep and already far off its titled topic. Let's do that then. Well I'm just changing the subject here. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Runtime SHAREDIR for testing CREATE EXTENSION
On Tue, Feb 21, 2012 at 10:34:42PM +0100, Dimitri Fontaine wrote: Sandro Santilli s...@keybit.net writes: On Tue, Feb 21, 2012 at 10:21:17AM -0500, Tom Lane wrote: Sandro Santilli s...@keybit.net writes: I'm trying to understand what options I have to test CREATE EXTENSION w/out installing the extension files in their final destination. There aren't any. Generally speaking, if you want to be testing an extension, you should be doing it in a test installation anyway; so I don't see what's the objection to sticking the files into the installation's real SHAREDIR. We do a staged installation of the extension library and scripts in an ad-hoc dir which does not match the built-in SHAREDIR of PostgreSQL. It works fine for testing the extension directly sourcing the scripts. It's only CREATE EXTENSION that doesn't play nicely with that setup. Please see the inline extension thread where answers to your problem have been discussed. https://commitfest.postgresql.org/action/patch_view?id=746 I'm not really looking for inline extensions. I do want to install the extension objects somewhere, just NOT in the PostgreSQL builtin SHAREDIR but in an arbitrary staging directory to use for QA the extension before distribution. --strk; ,--o-. | __/ |Delivering high quality PostGIS 2.0 ! | / 2.0 |http://strk.keybit.net `-o--' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead
Attached is a feature extracted from the Ants Aasma add timing of buffer I/O requests submission. That included a tool to measure timing overhead, from gettimeofday or whatever else INSTR_TIME_SET_CURRENT happens to call. That's what I've broken out here; it's a broader topic than just buffer timing. I fixed some trivial bugs and cleaned up the output of the program, then wrote a full documentation section for it. After that review, I think this could be ready to commit...with a big picture caveat below. Code wise, I am mainly concerned about its portability, and that's not often a place we get good review help on. The tool is based on pg_test_fsync. Perhaps Bruce might want to take a look at this, to see if any of the changes he's recently made to pg_test_fsync impact what this utility should do? He might be able to flesh out the FreeBSD examples too. As for why this whole topic is important, I've found the results of this new pg_test_timing track quite well with systems where EXPLAIN ANALYZE timing overhead is large. As such, it fills in a gap in the existing docs, where that possibility is raised but no way was given to measure it--nor determine how to improve it. I expect we'll be worried about how large timing overhead is more for future features, with the first example being the rest of Ants's own submission. A look back on this now that I'm done with it does raise one large question though. I added some examples of how to measure timing overhead using psql. While I like the broken down timing data that this utility provides, I'm not sure whether it's worth adding a contrib module just to get it now though. Extension that's packaged on something like PGXN and easy to obtain? Absolutely--but maybe that's a developer only level thing. Maybe the only code worth distributing is the little SQL example of how to measure the overhead, along with some reference good/bad numbers. That plus the intro to timer trivia could turn this into a documentation section only, no code change. I've dreamed of running something like this on every system in the build farm. Even if that's a valuable exercise, even then it may only be worth doing once, then reverting. Anyway, the patch does now includes several examples and a short primer on PC clock hardware, to help guide what good results look like and why they've been impossible to obtain in the past. That's a bit Linux-centric, but the hardware described covers almost all systems using Intel or AMD processors. Only difference with most other operating systems is how aggressively they have adopted newer timer hardware. At least this gives a way to measure all of them. Some references used to put together the clock source tutorial: Microsoft's intro to HPET: http://msdn.microsoft.com/en-us/windows/hardware/gg463347 Notes on effective clock resolution: http://elinux.org/System_Tap_Timestamp_Notes VMware clock history and impact on VMs: http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf VMware timer suggestions for various Linux versions: http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1006427 -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/contrib/Makefile b/contrib/Makefile index ac0a80a..d230451 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -35,6 +35,7 @@ SUBDIRS = \ pg_standby \ pg_stat_statements \ pg_test_fsync \ + pg_test_timing \ pg_trgm \ pg_upgrade \ pg_upgrade_support \ diff --git a/contrib/pg_test_timing/Makefile b/contrib/pg_test_timing/Makefile new file mode 100644 index 000..b8b266a --- /dev/null +++ b/contrib/pg_test_timing/Makefile @@ -0,0 +1,18 @@ +# contrib/pg_test_timing/Makefile + +PGFILEDESC = pg_test_timing - test timing overhead +PGAPPICON = win32 + +PROGRAM = pg_test_timing +OBJS = pg_test_timing.o + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_test_timing +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c new file mode 100644 index 000..4ff0455 --- /dev/null +++ b/contrib/pg_test_timing/pg_test_timing.c @@ -0,0 +1,162 @@ +/* + * pg_test_timing.c + * tests overhead of timing calls and their monoticity: that + * they always move forward + */ + +#include postgres_fe.h + +#include getopt_long.h +#include portability/instr_time.h + +static const char *progname; + +static int32 test_duration = 3; + +static void handle_args(int argc, char *argv[]); +static void test_timing(int32); + +int +main(int argc, char *argv[]) +{ + progname = get_progname(argv[0]); + + handle_args(argc, argv); + + test_timing(test_duration); + + return 0; +} + +static
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Feb 22, 2012 at 7:06 AM, Noah Misch n...@leadboat.com wrote: On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote: On Sun, Feb 19, 2012 at 11:35 AM, Simon Riggs si...@2ndquadrant.com wrote: So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it doesn't have a checksum, PD_CHECKSUM2 is not set? ?What good does that do? As explained in detailed comments, the purpose of this is to implement Heikki's suggestion that we have a bit set to zero so we can detect failures that cause a run of 1s. I think it's nonsensical to pretend that there's anything special about that particular bit. If we want to validate the page header before trusting the lack of a checksum bit, we can do that far more thoroughly than just checking that one bit. There are a whole bunch of bits that ought to always be zero, and there are other things we can validate as well (e.g. LSN not in future). If we're concerned about the checksum-enabled bit getting flipped (and I agree that we should be), we can check some combination of that stuff in the hope of catching it, and that'll be a far better guard than just checking one arbitrarily selected bit. PageHeaderIsValid() (being renamed to PageIsVerified()) already checks (page-pd_flags ~PD_VALID_FLAG_BITS) == 0. Explicitly naming another bit and keeping it unset is redundant with that existing check. It would cease to be redundant if we ever allocate all the flag bits, but then we also wouldn't have a bit to spare as PD_CHECKSUM2. I agree with you on this point. No problem to remove that. It was there at Heikki's request. That having been said, I don't feel very good about the idea of relying on the contents of the page to tell us whether or not the page has a checksum. There's no guarantee that an error that flips the has-checksum bit will flip any other bit on the page, or that it won't flip everything else we're relying on as a backstop in exactly the manner that foils whatever algorithm we put in place. Random corruption is, perhaps, unlikely to do that, but somehow I feel like it'll happen more often than random chance suggests. Things never fail the way you want them to. Another disadvantage of the current scheme is that there's no particularly easy way to know that your whole cluster has checksums. No matter how we implement checksums, you'll have to rewrite every table in the cluster in order to get them fully turned on. But with the current design, there's no easy way to know how much of the cluster is actually checksummed. If you shut checksums off, they'll linger until those pages are rewritten, and there's no easy way to find the relations from which they need to be removed, either. I'm not seeing value in rewriting pages to remove checksums, as opposed to just ignoring those checksums going forward. Did you have a particular scenario in mind? Agreed. No reason to change a checksum unless we rewrite the block, no matter whether page_checksums is on or off. I'm tempted to suggest a relation-level switch: when you want checksums, you use ALTER TABLE to turn them on, and when you don't want them any more you use ALTER TABLE to shut them off again, in each case rewriting the table. That way, there's never any ambiguity about what's in the data pages in a given relation: either they're either all checksummed, or none of them are. This moves the decision about whether checksums are enabled or disabled quite a but further away from the data itself, and also allows you to determine (by catalog inspection) which parts of the cluster do in fact have checksums. It might be kind of a pain to implement, though: you'd have to pass the information about how any given relation was configured down to the place where we validate page sanity. I'm not sure whether that's practical. This patch implies future opportunities to flesh out its UI, and I don't see it locking us out of implementing the above. Agreed We'll want a weak-lock command that adds checksums to pages lacking them. VACUUM FREEZE will do this. We'll want to expose whether a given relation has full checksum coverage. Not sure I understand why we'd want that. If you really want that, its a simple function to do it. With that, we could produce an error when an apparently-non-checksummed page appears in a relation previously known to have full checksum coverage. Additional checking at relation level is just going to slow things down. Don't see the value of having relation level checksum option. It would be an easy thing to skip checksums on UNLOGGED relations, and possibly desirable, but I don't see the utility of a per relation checksum setting in other cases. There's just no need for fine tuning like that. Even supposing an ALTER TABLE t SET {WITH | WITHOUT} CHECKSUMS as the only tool for enabling or disabling them, it's helpful to have each page declare whether it has a checksum. That way, the
Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c
On 21/02/12 18:28, Jan Urbański wrote: On 21/02/12 18:05, Peter Eisentraut wrote: it might be better to use ereport, to expose the message for translation. After giving it some thought some of these elogs could be changed into PLy_elogs (which is meant to propagate a Python error into Postgres) and the others made into ereports. I'll send updated patches this evening (CET). Inevitably, this turned into the next morning CET. Here are the updated patches which use PLy_elog instead of plain elog. The difference is that they will get marked for translation and that the original Python exception will show up in the errdetail field. Cheers, Jan diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c index 741980c..332898d 100644 *** a/src/pl/plpython/plpy_elog.c --- b/src/pl/plpython/plpy_elog.c *** get_source_line(const char *src, int lin *** 365,370 --- 365,374 const char *next = src; int current = 0; + /* sanity check */ + if (lineno = 0) + return NULL; + while (current lineno) { s = next; diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c index ae9d87e..f259ecf 100644 *** a/src/pl/plpython/plpy_main.c --- b/src/pl/plpython/plpy_main.c *** PLy_init_interp(void) *** 133,138 --- 133,140 Py_INCREF(mainmod); PLy_interp_globals = PyModule_GetDict(mainmod); PLy_interp_safe_globals = PyDict_New(); + if (PLy_interp_safe_globals == NULL) + PLy_elog(ERROR, could not create globals); PyDict_SetItemString(PLy_interp_globals, GD, PLy_interp_safe_globals); Py_DECREF(mainmod); if (PLy_interp_globals == NULL || PyErr_Occurred()) diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c index d2d0a2a..0c96f03 100644 *** a/src/pl/plpython/plpy_plpymodule.c --- b/src/pl/plpython/plpy_plpymodule.c *** PLy_init_plpy(void) *** 173,181 main_mod = PyImport_AddModule(__main__); main_dict = PyModule_GetDict(main_mod); plpy_mod = PyImport_AddModule(plpy); PyDict_SetItemString(main_dict, plpy, plpy_mod); if (PyErr_Occurred()) ! elog(ERROR, could not initialize plpy); } static void --- 173,183 main_mod = PyImport_AddModule(__main__); main_dict = PyModule_GetDict(main_mod); plpy_mod = PyImport_AddModule(plpy); + if (plpy_mod == NULL) + PLy_elog(ERROR, could not initialize plpy); PyDict_SetItemString(main_dict, plpy, plpy_mod); if (PyErr_Occurred()) ! PLy_elog(ERROR, could not initialize plpy); } static void *** PLy_add_exceptions(PyObject *plpy) *** 208,213 --- 210,220 PLy_exc_fatal = PyErr_NewException(plpy.Fatal, NULL, NULL); PLy_exc_spi_error = PyErr_NewException(plpy.SPIError, NULL, NULL); + if (PLy_exc_error == NULL || + PLy_exc_fatal == NULL || + PLy_exc_spi_error == NULL) + PLy_elog(ERROR, could not create the base SPI exceptions); + Py_INCREF(PLy_exc_error); PyModule_AddObject(plpy, Error, PLy_exc_error); Py_INCREF(PLy_exc_fatal); *** PLy_generate_spi_exceptions(PyObject *mo *** 241,247 --- 248,260 PyObject *sqlstate; PyObject *dict = PyDict_New(); + if (dict == NULL) + PLy_elog(ERROR, could not generate SPI exceptions); + sqlstate = PyString_FromString(unpack_sql_state(exception_map[i].sqlstate)); + if (sqlstate == NULL) + PLy_elog(ERROR, could not generate SPI exceptions); + PyDict_SetItemString(dict, sqlstate, sqlstate); Py_DECREF(sqlstate); exc = PyErr_NewException(exception_map[i].name, base, dict); *** PLy_output(volatile int level, PyObject *** 369,376 * decoration. */ PyObject *o; - PyArg_UnpackTuple(args, plpy.elog, 1, 1, o); so = PyObject_Str(o); } else --- 382,393 * decoration. */ PyObject *o; + int result; + + result = PyArg_UnpackTuple(args, plpy.elog, 1, 1, o); + if (!result) + PLy_elog(ERROR, could not unpack arguments in plpy.elog); so = PyObject_Str(o); } else diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c index 0d63c4f..20b93cd 100644 *** a/src/pl/plpython/plpy_spi.c --- b/src/pl/plpython/plpy_spi.c *** PLy_spi_execute_query(char *query, long *** 338,344 int rv; volatile MemoryContext oldcontext; volatile ResourceOwner oldowner; ! PyObject *ret; oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; --- 338,344 int rv; volatile MemoryContext oldcontext; volatile ResourceOwner oldowner; ! PyObject *ret = NULL; oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; *** PLy_spi_execute_query(char *query, long *** 362,367 --- 362,368 if (rv 0) { + Py_XDECREF(ret); PLy_exception_set(PLy_exc_spi_error, SPI_execute failed: %s, SPI_result_code_string(rv)); diff --git a/src/pl/plpython/plpy_typeio.c
[HACKERS] temporal algebra and data type
Hello, I am new here, it is a question about how to colaborate and/or where to find, in the scope of temporal database... - I see the Temporal Postgres, a project for making tools and documentation that people can use to better manage, query, and maintain time data in Postgres, http://temporal.projects.postgresql.org/ BUT * the project is oversized for my needs ... * there are NO complementar pure SQL library for manage sheduling events. Searching I find something more close to my necessities, the S.Bailey's TIMESPAN model, at http://scottrbailey.wordpress.com/2009/05/19/timespan-postgresql/ BUT it is not ready for practical use (it is more for didactic use)... THEN I create http://code.google.com/p/temporal-type that is what I need. *QUESTIONS: * 1) There are another project, similar or better than my!? There are pure SQL standard libraries for this? 2) Why something similar (manage events without pg-extension) is not into the temporal.projects.postgresql.org ? 3) There are another good place for colaborate with a simple and pure SQL library project? Can I colaborate to the wiki-page? see http://wiki.postgresql.org/wiki/Temporal_Extensions
Re: [HACKERS] leakproof
On Tue, Feb 21, 2012 at 11:54 AM, Jeff Janes jeff.ja...@gmail.com wrote: I don't think that pure is sufficient to be leakproof. For example, if I have a function which is pure but which takes an unusually long time to evaluate for some unique pathological combination of arguments, I don't think that it would be considered leakproof. I think we've made up our mind to largely ignore such cases, though. There may be information that gets leaked either through the choice of plan or the time it takes to execute it, but plugging that type of hole completely is almost impossible. Suppose the security view filters for rows where user = 'rhaas' and the evil rhaas requests rows where id = 5. The planner chooses an index-scan on id = 5 and then applies a filter of user = 'rhaas' to the result. There's probably some slight timing difference between the case where no id = 5 row exists, and the case where it exists but does not have user = 'rhaas', so a sufficiently dedicated hacker might be able to employ a timing attack to determine the existence of such a row. Also, by jiggering with page costs and EXPLAIN, they might be able to get some idea of the number of rows in the table. I don't believe that this makes the feature useless, though. In most cases, and especially when using surrogate keys (which I typically do, whether Josh Berkus likes it or not), the existence of the row is not a terribly interesting piece of information: it's the content of the row that's sensitive. Neither table-level security nor column-level security attempt to prevent you from determining that the table or column exists; they just hide its contents. And if you need to bar EXPLAIN so people can't peek at query plans, it's certainly possible to do that using ProcessUtility_hook. If you're worried about more subtle timing attacks, you should probably rethink the idea of letting people log into the database in the first place. Anyway, to your point, I suppose I might hesitate to mark factorial leak-proof even if it didn't throw an error on overflow, because the time it takes to return an answer for larger inputs does grow rather rapidly. But it's kind of a moot point because the error makes it not leak-proof anyway. So maybe we're just splitting hairs here, however we decide to label this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. But is not true. Why? I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in general, because VACUUM has to scan the whole table, and ANALYZE only a fixed-size subset of its pages. Not sure what's happening in your particular case... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
2012/2/22 Robert Haas robertmh...@gmail.com: On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. But is not true. Why? I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in general, because VACUUM has to scan the whole table, and ANALYZE only a fixed-size subset of its pages. It sounds like you just said the opposite of what you wanted to say. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporal algebra and data type
On 22-02-2012 09:50, Peter Padua Krauss wrote: 1) There are another project, similar or better than my!? There are pure SQL standard libraries for this? Range Types [1]. It is 9.2 material but it is already in the repository. [1] http://www.postgresql.org/docs/devel/static/rangetypes.html -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Tue, Feb 21, 2012 at 5:07 AM, Noah Misch n...@leadboat.com wrote: We do, in numerous places, drop a shared buffer content lock and reacquire it in exclusive mode. However, the existing users of that pattern tend to trade the lock, complete subsequent work, and unlock the buffer all within the same function. So they must, because several of them recheck facts that can change during the period of holding no lock. SetBufferCommitInfoNeedsSave() callers do not adhere to that pattern; they can be quite distant from the original lock acquisition and eventual lock release. Take the prototypical case of SetHintBits(). Our shared lock originated in a place like heapgettup(), which expects to hold it continually until finished. We would need to somehow push up through HeapTupleSatisfiesVisibility() the fact that we traded the buffer lock, then have heapgettup() reorient itself as needed. Every caller of code that can reach SetBufferCommitInfoNeedsSave() would need to do the same. Perhaps there's a different way to apply the lock-trade technique that avoids this mess, but I'm not seeing it. Consequently, I find the idea of requiring a spinlock acquisition to read or write pd_lsn/pd_tli under BUFFER_LOCK_SHARE to be a sensible one. Within that umbrella, some details need attention: Fair analysis. - Not all BUFFER_LOCK_SHARE callers of PageGetLSN() live in bufmgr.c. I note gistScanPage() and XLogCheckBuffer()[1]. (Perhaps we'll only require the spinlock for heap buffers, letting gistScanPage() off the hook.) We need a public API, perhaps LockBufferForLSN(). That seems like a good idea. I am a little worried about Simon's plan to do the additional locking only for AMs that have no unlogged-type updates. It's a reasonable performance optimization, and skipping the locking when checksums are shut off also seems reasonable, but it seems a bit fragile: suppose that, in the future, someone fixes GiST to do something intelligent with kill_prior_tuple. Now suddenly it needs LSN locking that it didn't need before, but this fact might not be very obvious. It might be a good idea to design LockBufferForLSN to take an AM OID as an argument, and use the AM OID to decide whether the locking is required. That way, we can call it from everywhere that reads an LSN, and it can simply skip the locking in places where it isn't presently needed. Or maybe there's a better design, but I agree with you that some kind of public API is essential. - The use of some spinlock need not imply using the buffer header spinlock. We could add a dedicated pd_lsn_tli_lock to BufferDesc. That has the usual trade-off of splitting a lock: less contention at the cost of more acquisitions. I have no intuition on which approach would perform better. I think I like the idea of a separate lock, but I agree it could stand to be tested both ways. Another thought is that we might add SpinLockConditionalAcquire(), that just tries to TAS the lock and returns false if the lock is already held. Then, we could grab the spinlock while writing the page, in lieu of copying it, and anyone who wants to set a hint bit can conditionally acquire the spinlock long enough to set the bits. If the spinlock isn't immediately free then instead of spinning they just don't set the hint bits after all. That way you can be sure that no hint bits are set during the write, but any attempt to set a hint bit only costs you a single TAS - no loop, as with a regular spinlock acquisition - and of course the hint itself. A possible compromise is to leave the page clean after setting a hint bit, much like the patch already has us do under hot standby. Then there's no new WAL and no new rules around pd_lsn. Wasn't that one of the things Merlin benchmarked when he was looking at hint bits? Does anyone recall the result? It slows things down substantially in the case of repeated scans of the same unchaning table. In fact, I tried a much more nuanced approach: set BM_UNTIDY on the page when SetBufferCommitInfoNeedsSave() is called. BM_UNTIDY pages are written by the background writer and during checkpoints, and 5% of the time by backends. All of that has the salutary effect of making the first sequential scan less ungodly slow, but it also has the less-desirable effect of making the next 19 of them still kinda-sorta slow. It was unclear to me (and perhaps to others) whether it really worked out to a win. However, we might need/want to revisit some of that logic in connection with this patch, because it seems to me that a large sequential scan of an unhinted table could be ferociously slow, with the backend writing out its own pages and WAL-logging them as it goes. It would be good to test that to figure out whether some mitigation measures are needed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2012/2/22 Robert Haas robertmh...@gmail.com: On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. But is not true. Why? I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in general, because VACUUM has to scan the whole table, and ANALYZE only a fixed-size subset of its pages. It sounds like you just said the opposite of what you wanted to say. Yeah, I did. Woops. Let me try that again: ANALYZE should be faster; reads only some pages. VACUUM ANALYZE should be slower; reads them all. Dunno why Pavel's seeing the opposite without more info. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. VACUUM ANALYZE scans the whole table sequentially. ANALYZE accesses a random sample of data blocks. Random access is slower than sequential access, so at some threshold of sample size and sequential/random I/O speed ratio ANALYZE could become slower. So it depends upon the hardware and the setting of stats_target. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] typo fix
Typo in a comment... --strk; From cfca9507df8612a48cad341653f8e9193c6b7e08 Mon Sep 17 00:00:00 2001 From: Sandro Santilli s...@keybit.net Date: Wed, 22 Feb 2012 11:32:48 +0100 Subject: [PATCH] typo fix --- src/backend/commands/extension.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 6ecbbc7..a9963ac 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -876,7 +876,7 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control, /* * If it's not relocatable, substitute the target schema name for - * occcurrences of @extschema@. + * occurrences of @extschema@. * * For a relocatable extension, we needn't do this. There cannot be * any need for @extschema@, else it wouldn't be relocatable. -- 1.7.0.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect behaviour when using a GiST index on points
Attached patch fixes GiST behaviour without altering operators behaviour. -- With best regards, Alexander Korotkov. *** a/src/backend/access/gist/gistproc.c --- b/src/backend/access/gist/gistproc.c *** *** 836,842 gist_box_picksplit(PG_FUNCTION_ARGS) } /* ! * Equality method * * This is used for both boxes and points. */ --- 836,843 } /* ! * Equality method. Returns true only when boxes are exact same. We can't ! * ignore small extents because of index consistency. * * This is used for both boxes and points. */ *** *** 848,856 gist_box_same(PG_FUNCTION_ARGS) bool *result = (bool *) PG_GETARG_POINTER(2); if (b1 b2) ! *result = DatumGetBool(DirectFunctionCall2(box_same, ! PointerGetDatum(b1), ! PointerGetDatum(b2))); else *result = (b1 == NULL b2 == NULL) ? TRUE : FALSE; PG_RETURN_POINTER(result); --- 849,857 bool *result = (bool *) PG_GETARG_POINTER(2); if (b1 b2) ! *result = (b1-low.x == b2-low.x b1-low.y == b2-low.y ! b1-high.x == b2-high.x b1-high.y == b2-high.y) ! ? TRUE : FALSE; else *result = (b1 == NULL b2 == NULL) ? TRUE : FALSE; PG_RETURN_POINTER(result); *** *** 1326,1331 gist_point_consistent(PG_FUNCTION_ARGS) --- 1327,1333 bool *recheck = (bool *) PG_GETARG_POINTER(4); bool result; StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset; + BOX *query, *key; switch (strategyGroup) { *** *** 1337,1348 gist_point_consistent(PG_FUNCTION_ARGS) *recheck = false; break; case BoxStrategyNumberGroup: ! result = DatumGetBool(DirectFunctionCall5( ! gist_box_consistent, ! PointerGetDatum(entry), ! PG_GETARG_DATUM(1), ! Int16GetDatum(RTOverlapStrategyNumber), ! 0, PointerGetDatum(recheck))); break; case PolygonStrategyNumberGroup: { --- 1339,1356 *recheck = false; break; case BoxStrategyNumberGroup: ! /* ! * This code repeats logic of on_ob which uses simple comparison ! * rather than FP* functions. ! */ ! query = PG_GETARG_BOX_P(1); ! key = DatumGetBoxP(entry-key); ! ! *recheck = false; ! result = key-high.x = query-low.x ! key-low.x = query-high.x ! key-high.y = query-low.y ! key-low.y = query-high.y; break; case PolygonStrategyNumberGroup: { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
Robert Haas robertmh...@gmail.com writes: Anyway, to your point, I suppose I might hesitate to mark factorial leak-proof even if it didn't throw an error on overflow, because the time it takes to return an answer for larger inputs does grow rather rapidly. But it's kind of a moot point because the error makes it not leak-proof anyway. So maybe we're just splitting hairs here, however we decide to label this. Speaking of hair-splitting ... A strict interpretation of no errors can be thrown would, for example, rule out any function that either takes or returns a varlena datatype, since those are potentially going to throw out-of-memory errors if they can't palloc a result value or a temporary detoasted input value. I don't suppose that we want that, which means that this rule of thumb is wrong in detail, and there had better be some more subtle definition of what is okay or not, perhaps along the line of must not throw any errors that reveal anything useful about the input value. Have we got such a definition? (I confess to not having followed this patch very closely.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
Simon Riggs si...@2ndquadrant.com writes: On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. VACUUM ANALYZE scans the whole table sequentially. ANALYZE accesses a random sample of data blocks. Random access is slower than sequential access, so at some threshold of sample size and sequential/random I/O speed ratio ANALYZE could become slower. That analysis is entirely wrong. In the first place, although ANALYZE doesn't read all the blocks, what it does read it reads in block number order. So it's not like there are random seeks all over the disk that would not need to happen anyway. In the second place, VACUUM ANALYZE consists of two separate passes, VACUUM and then ANALYZE, and the second pass is going to be random I/O by your definition no matter what. If the filesystem is hugely biased towards sequential I/O for some reason, and the VACUUM scan causes the whole table to become resident in RAM where ANALYZE can read it for free, then I guess it might be possible to arrive at Pavel's result. But it would be an awfully narrow corner case. I cannot believe that his statement is true in general, or even for a noticeably large fraction of cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 10:29:56AM -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. VACUUM ANALYZE scans the whole table sequentially. ANALYZE accesses a random sample of data blocks. Random access is slower than sequential access, so at some threshold of sample size and sequential/random I/O speed ratio ANALYZE could become slower. That analysis is entirely wrong. In the first place, although ANALYZE doesn't read all the blocks, what it does read it reads in block number order. So it's not like there are random seeks all over the disk that would not need to happen anyway. In the second place, VACUUM ANALYZE consists of two separate passes, VACUUM and then ANALYZE, and the second pass is going to be random I/O by your definition no matter what. If the filesystem is hugely biased towards sequential I/O for some reason, and the VACUUM scan causes the whole table to become resident in RAM where ANALYZE can read it for free, then I guess it might be possible to arrive at Pavel's result. But it would be an awfully narrow corner case. I cannot believe that his statement is true in general, or even for a noticeably large fraction of cases. regards, tom lane Wouldn't a full sequential scan trigger the kernel read-ahead, which might not trigger for the analyze block reads, even though they are in order? That could account for the observation. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
On 02/22/2012 10:21 AM, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: Anyway, to your point, I suppose I might hesitate to mark factorial leak-proof even if it didn't throw an error on overflow, because the time it takes to return an answer for larger inputs does grow rather rapidly. But it's kind of a moot point because the error makes it not leak-proof anyway. So maybe we're just splitting hairs here, however we decide to label this. Speaking of hair-splitting ... A strict interpretation of no errors can be thrown would, for example, rule out any function that either takes or returns a varlena datatype, since those are potentially going to throw out-of-memory errors if they can't palloc a result value or a temporary detoasted input value. I don't suppose that we want that, which means that this rule of thumb is wrong in detail, and there had better be some more subtle definition of what is okay or not, perhaps along the line of must not throw any errors that reveal anything useful about the input value. Have we got such a definition? (I confess to not having followed this patch very closely.) Yeah. Returning to the original point, I've come to the conclusion that pure isn't the right way to go. The trouble with leakproof is that it doesn't point to what it is that's not leaking, which is information rather than memory, as many might imagine (and I did) without further hints. I'm not sure any single English word would be as descriptive as I'd like. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote: Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real 3m56.237s user 0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? This seems like a good thing in general. Why does it need to modify pg_receivexlog, though? I thought only pg_basebackup had tihs issue? I guess it is because of the change of the API to stream_continue_callback only? Looking at it after your patch, stream_continue_callback and segment_finish_callback are the same. Should we perhaps just fold them into a single stream_continue_callback? Since you had to move the detect segment end to the caller anyway? Another question related to this - since we clearly don't need the xlog switch in this case, should we make it conditional on the master as well, so we don't switch unnecessarily there as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead
Greg Smith wrote: Anyway, the patch does now includes several examples and a short primer on PC clock hardware, to help guide what good results look like and why they've been impossible to obtain in the past. That's a bit Linux-centric, but the hardware described covers almost all systems using Intel or AMD processors. Only difference with most other operating systems is how aggressively they have adopted newer timer hardware. At least this gives a way to measure all of them. N.B.: Windows has at least two clock APIs, timeGetTime and QueryPerformanceCounters (and probably more, these days). They rely on different hardware clocks, and can get out of sync with each other; meanwhile, QueryPerformanceCounters can get out of sync with itself on (older?) multi-CPU boards. So if you're doing high-res timing, it's good to make sure you aren't relying on two different clocks in different places... I ran into this with MIDI drivers years ago; and wrote a doc: http://www.ultimatemetal.com/forum/equipment/315910-midi-latency-problem-nuendo.html#post6315034 and a clock-testing utility: https://github.com/jaylevitt/miditime -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
Andrew Dunstan and...@dunslane.net wrote: Returning to the original point, I've come to the conclusion that pure isn't the right way to go. The trouble with leakproof is that it doesn't point to what it is that's not leaking, which is information rather than memory, as many might imagine (and I did) without further hints. I'm not sure any single English word would be as descriptive as I'd like. Discreet? http://www.merriam-webster.com/dictionary/discreet I guess the risk is that people would confuse it with discrete. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: proof of concept patch for fixing quantified regex backrefs
Attached is as far as I've gotten with fixing depesz's complaint about backrefs embedded within larger quantified expressions (the complaint being that only the last match of the backref is checked properly). This is per the analysis I posted at https://sourceforge.net/tracker/index.php?func=detailaid=1115587group_id=10894atid=110894 to the effect that the engine really has to have an iteration subre type. The patch is incomplete because I have not written the code yet for the shortest-match-first case, only longest-match-first. Within that restriction, it seems to work, though I have not yet tried the Tcl regression tests on it. I have to set this aside now and go focus on release tasks (like writing release notes), so it's not going to be done in time to include in the upcoming back-branch releases. I have mixed feelings about whether to treat it as a back-patchable bug fix when it's done ... it's certainly a bug fix but the odds of introducing new issues seem higher than average. So maybe it should only go into HEAD anyway. Thoughts? regards, tom lane diff --git a/src/backend/regex/README b/src/backend/regex/README index 3fd58c000119a24d61dff594baec28a27a4437f0..89ba6a62ea2f70bfda729f03efedba4b9b6fce9b 100644 *** a/src/backend/regex/README --- b/src/backend/regex/README *** consists of a tree of sub-expressions ( *** 102,116 either plain regular expressions (which are executed as DFAs in the manner described above) or back-references (which try to match the input to some previous substring). Non-leaf nodes are capture nodes (which save the ! location of the substring currently matching their child node) or ! concatenation or alternation nodes. At execution time, the executor ! recursively scans the tree. At concatenation or alternation nodes, ! it considers each possible alternative way of matching the input string, ! ie each place where the string could be split for a concatenation, or each ! child node for an alternation. It tries the next alternative if the match ! fails according to the child nodes. This is exactly the sort of ! backtracking search done by a traditional NFA regex engine. If there are ! many tree levels it can get very slow. But all is not lost: we can still be smarter than the average pure NFA engine. To do this, each subre node has an associated DFA, which --- 102,116 either plain regular expressions (which are executed as DFAs in the manner described above) or back-references (which try to match the input to some previous substring). Non-leaf nodes are capture nodes (which save the ! location of the substring currently matching their child node), ! concatenation, alternation, or iteration nodes. At execution time, the ! executor recursively scans the tree. At concatenation, alternation, or ! iteration nodes, it considers each possible alternative way of matching the ! input string, that is each place where the string could be split for a ! concatenation or iteration, or each child node for an alternation. It ! tries the next alternative if the match fails according to the child nodes. ! This is exactly the sort of backtracking search done by a traditional NFA ! regex engine. If there are many tree levels it can get very slow. But all is not lost: we can still be smarter than the average pure NFA engine. To do this, each subre node has an associated DFA, which diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c index 6b80140e90940b4a348c342090e26fdd0bc82c8f..487d0670d2d9c6ae4b9fc5e500765d136c4e441d 100644 *** a/src/backend/regex/regcomp.c --- b/src/backend/regex/regcomp.c *** parseqatom(struct vars * v, *** 1036,1046 /*-- * Prepare a general-purpose state skeleton. * ! * --- [s] ---prefix--- [begin] ---atom--- [end] rest--- [rp] ! * / / ! * [lp] [s2] bypass- * ! * where bypass is an empty, and prefix is some repetitions of atom *-- */ s = newstate(v-nfa); /* first, new endpoints for the atom */ --- 1038,1054 /*-- * Prepare a general-purpose state skeleton. * ! * In the no-backrefs case, we want this: * ! * [lp] --- [s] ---prefix--- [begin] ---atom--- [end] ---rest--- [rp] ! * ! * where prefix is some repetitions of atom. In the general case we need ! * ! * [lp] --- [s] ---iterator--- [s2] ---rest--- [rp] ! * ! * where the iterator wraps around [begin] ---atom--- [end] ! * ! * We make the s state here for both cases; s2 is made below if needed *-- */ s = newstate(v-nfa); /* first, new endpoints for the atom */ *** parseqatom(struct vars * v, *** 1051,1061 NOERR(); atom-begin = s; atom-end = s2; ! s = newstate(v-nfa); /* and spots for prefix and bypass */ ! s2 = newstate(v-nfa); NOERR(); EMPTYARC(lp, s); - EMPTYARC(lp, s2); NOERR(); /* break
Re: [HACKERS] leakproof
On 02/22/2012 11:14 AM, Kevin Grittner wrote: Andrew Dunstanand...@dunslane.net wrote: Returning to the original point, I've come to the conclusion that pure isn't the right way to go. The trouble with leakproof is that it doesn't point to what it is that's not leaking, which is information rather than memory, as many might imagine (and I did) without further hints. I'm not sure any single English word would be as descriptive as I'd like. Discreet? http://www.merriam-webster.com/dictionary/discreet I guess the risk is that people would confuse it with discrete. Yes, too confusing. silent might be better along those lines. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Feb 22, 2012 at 4:43 PM, Greg Smith g...@2ndquadrant.com wrote: Attached are updated versions of this feature without the pg_test_timing tool part, since I broke that out into another discussion thread. I've split the part that updates pg_stat_statistics out from the main feature too, separate patch attached to here (but I'm not reviewing that yet). Lots of bitrot since this was submitted, and yes I noticed that I've almost recreated earlier versions of this patch--by splitting off the parts that were developed later. Thanks for the review and splitting. Sorry I didn't fix up the bit rot myself. Earlier discussion of this got side tracked on a few things, partly my fault. It's worth taking a look at what this provides before judging it too much. It can demo well. The stated purpose is helping figure out what relations are gobbling up the most access time, presumably to optimize them and/or the storage they are on. What do I put onto SSD is surely a popular request nowadays. I should have stated the purpose more clearly. The original reason for developing this patch was to figure out what queries are taking the most time and why, specifically in the case where OS memory is a lot larger than shared_buffers. Basically the following query to get a quick overview where the bottlenecks are: SELECT query, total_time, (time_read+time_write)/total_time AS io_fraction FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; This of course hugely benefits from Peter's pg_stat_statements normalization patch. Tracking timings per relation was actually an afterthought. Now, the first critical question to ask is what additional information is this providing above the existing counters? After all, it's possible to tell pgbench_accounts is the hotspot just from comparing heap_blks_read, right? Like I said above, I find it mostly useful to see what is missing the OS cache. With memory being as cheap as it is, a reasonably priced server can have 128G of memory, while max recommended value for shared_buffers is 8GB. It's quite likely to have tables that fit into OS cache but not into shared_buffers, but it's not trivial to figure out which those are. This run looks useful at providing the data wished for--that read times are slower per capita from the accounts table. The first time I tried this I got a bizarre high number for pgbench_branches.heap_blks_time ; I'm not sure how reliable this is yet. One problem that might be easy to fix is that the write timing info doesn't show in any of these system views, only in EXPLAIN and statement level ones. I'm not sure about the source of the huge number, might instability in the clock source. Have you tried running the monotonicity check for a longer period while the system is under load? Another issue with the current timing code is that gettimeofday isn't guaranteed to be monotonic anyway, things like NTP adjustments can make time go backwards. clock_gettime with CLOCK_MONOTONIC_RAW would be better, but that's linux specific :( The reason why I didn't add write timings to relation stats is that I couldn't figure out what the semantics should be. It could be either time spent waiting for this relations blocks to be written out or time spent waiting for some other relations blocks to be written out to free space for this relations block or maybe distribute the cost, background writes could be included or excluded. Writes usually return quickly, unless lots of possibly unrelated writes have dirtied enough of OS cache, etc. I figured that what ever choices I made, they wouldn't really help anyone diagnose anything. Having global write timings in pg_stat_bgwriter might be useful, but I feel that is something for another patch. I still think a full wait timing interface is the right long-term direction here. It's hard to reject this idea when it seems to be working right now though, while more comprehensive wait storage is still at least a release off. Opinions welcome, I'm still juggling this around now that I have it working again. I agree that wait timing interface is the right direction. I have thought a bit about it and could share some ideas - maybe I should create a wiki page where the general design could be hashed out? Anyway, the user visible information from this patch should be trivial to extract from a general wait timing framework. Pushing my own agenda a bit - having this patch in the current release would help to get some field experience on any issues surrounding timing :) Some implementation notes. This currently fails regression test create_function_3, haven't looked into why yet. I'll take a look at it. Thanks again. -- Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead
On 02/22/2012 11:10 AM, Jay Levitt wrote: N.B.: Windows has at least two clock APIs, timeGetTime and QueryPerformanceCounters (and probably more, these days). They rely on different hardware clocks, and can get out of sync with each other; meanwhile, QueryPerformanceCounters can get out of sync with itself on (older?) multi-CPU boards. The PostgreSQL wrapper in src/include/portability/instr_time.h uses QueryPerformanceCounter and QueryPerformanceFrequency in a way that the result can be used similarly to how deltas in UNIX dates are returned. As far as I've been able to tell, there aren't any issues unique to Windows there. Multiple cores can have their TSC results get out of sync on Windows for the same reason they do on Linux systems, and there's also the same frequency/temperature issues. Newer versions of Windows can use TSC, older versions only use HPET or ACPI on older versions, and there's some ability to force bad TSC units to use ACPI instead: http://blogs.msdn.com/b/psssql/archive/2010/08/18/how-it-works-timer-outputs-in-sql-server-2008-r2-invariant-tsc.aspx http://blogs.msdn.com/b/psssql/archive/2010/08/18/how-it-works-timer-outputs-in-sql-server-2008-r2-invariant-tsc.aspx There is a lot of questionable behavior if you try to use the better timers in Windows XP; check out the obnoxious foot note about XP SP3 at http://en.wikipedia.org/wiki/High_Precision_Event_Timer for example. Since there's little you can do about--it's not even possible to figure out which timer is being used easily--I didn't see any easy ways to document what Windows does here, in a way that helped anyone. For the most part, you get the best timer Windows has a driver for with QueryPerformanceCounter, and that's what the PostgreSQL code uses. The information I gave about how newer systems can have more accurate timing is still accurate. Maybe it would be useful to add something pointing out that newer Windows versions tend to support faster timers, too. That's something I assume people would guess from what I already wrote, it may be worth stating explicitly though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote: Okay, so this patch fixes the truncation and wraparound issues through a mechanism much like pg_clog's: it keeps track of the oldest possibly existing multis on each and every table, and then during tuple freezing those are removed. I also took the liberty to make the code remove multis altogether (i.e. resetting the IS_MULTI hint bit) when only the update remains and lockers are all gone. I also cleaned up the code in heapam so that there's a couple of tables mapping MultiXactStatus to LockTupleMode and back, and to heavyweight lock modes (the older patches used functions to do this, which was pretty ugly). I had to add a little helper function to lock.c to make this work. I made a rather large bunch of other minor changes to close minor bugs here and there. Docs have been added, as have new tests for the isolation harness, which I've ensured pass in both read committed and serializable modes; WAL logging for locking updated versions of a tuple, when an old one is locked due to an old snapshot, was also added; there's plenty of room for growth in the MultiXact flag bits; the bit that made tables with no keys lock the entire row all the time was removed; multiple places in code comments were cleaned up that referred to this feature as FOR KEY LOCK and ensured that it also mentions FOR KEY UPDATE; the pg_rowlocks, pageinspect, pg_controldata, pg_resetxlog utilities have been updated. All of the above sounds great. I especially like the growing test coverage. All in all, I think this is in pretty much final shape. Only pg_upgrade bits are still missing. If sharp eyes could give this a critical look and knuckle-cracking testers could give it a spin, that would be helpful. Lack of pg_upgrade support leaves this version incomplete, because that omission would constitute a blocker for beta 2. This version changes as much code compared to the version I reviewed at the beginning of the CommitFest as that version changed overall. In that light, it's time to close the books on this patch for the purpose of this CommitFest; I'm marking it Returned with Feedback. Thanks for your efforts thus far. On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote: On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: * Columns that are part of the key Noah thinks the set of columns should only consider those actually referenced by keys, not those that *could* be referenced. Well, do you disagree? To me it's low-hanging fruit, because it isolates the UPDATE-time overhead of this patch to FK-referenced tables rather than all tables having a PK or PK-like index (often just all tables). You have not answered my question above. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead
On Wed, Feb 22, 2012 at 18:44, Greg Smith g...@2ndquadrant.com wrote: As far as I've been able to tell, there aren't any issues unique to Windows there. Multiple cores can have their TSC results get out of sync on Windows for the same reason they do on Linux systems, and there's also the same frequency/temperature issues. Not on recent Linux kernel versions. Linux automatically detects when the TSC is unstable (due to power management or out-of-sync cores/sockets) and automatically falls back to the more expensive HPET or ACPI methods. e.g: % dmesg |grep -i tsc [0.00] Fast TSC calibration using PIT [0.164075] checking TSC synchronization [CPU#0 - CPU#1]: passed. [0.197062] Switching to clocksource tsc [0.260960] Marking TSC unstable due to TSC halts in idle Whether these tests cover 100% of the possible conditions, and whether the detection has race conditions or not, I don't know. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: URI connection string support for libpq
This submission has turned into a bit of a mess. I did the closest thing to a review the day after it was submitted; follow-up review attempts had issues applying the patch. And it's been stuck there. The patch is still fine, I just tested it out to pick this back up myself again. I think this one is a good advocacy feature, and most of the hard work is done already. Smooth some edge cases and this will be ready to go. First thing: the URI prefix. It is possible to connect using a URI in Python+SQL Alchemy, which was mentioned before as not too relevant due to their also requiring a driver name. As documented at http://docs.sqlalchemy.org/en/latest/core/engines.html and demonstrated at http://packages.python.org/Flask-SQLAlchemy/config.html , it is possible to leave off the driver part of the connection string. That assumes the default driver, such that postgresql:// does the same as postgresql+psycopg2:// , sensibly. That means we absolutely have an installed base of URI speaking developers split between postgresql:// (Python) and postgres:// (Ruby). Given that, there really isn't a useful path forward that helps out all those developers without supporting both prefixes. That's where this left off before, I just wanted to emphasize how clear that need seems now. Next thing, also mentioned at that Flask page. SQLite has standardized the idea that sqlite:absolute/path/to/foo.db is a URI pointing to a file. Given that, I wonder if Alex's syntax for specifying a socket file name might adopt that syntax, rather than requiring the hex encoding: postgresql://%2Fvar%2Fpgsql%2Ftmp/mydb It's not a big deal, but it would smooth another rough edge toward making the Postgres URI implementation look as close as possible to others. So far I've found only one syntax that I expected this to handle that it rejects: psql -d postgresql://gsmith@localhost It's picky about needing that third slash, but that shouldn't be hard to fix. I started collecting up all the variants that do work as an initial shell script regression test, so that changes don't break something that already works. Here are all the variations that already work, setup so that a series of 1 outputs is passing: psql -d postgresql://gsmith@localhost:5432/gsmith -At -c SELECT 1 psql -d postgresql://gsmith@localhost/gsmith -At -c SELECT 1 psql -d postgresql://localhost:5432/gsmith -At -c SELECT 1 psql -d postgresql://localhost/gsmith -At -c SELECT 1 psql -d postgresql://gsmith@localhost:5432/ -At -c SELECT 1 psql -d postgresql://gsmith@localhost/ -At -c SELECT 1 psql -d postgresql://localhost:5432/ -At -c SELECT 1 psql -d postgresql://localhost/gsmith -At -c SELECT 1 psql -d postgresql://localhost/ -At -c SELECT 1 psql -d postgresql:/// -At -c SELECT 1 psql -d postgresql://%6Cocalhost/ -At -c SELECT 1 psql -d postgresql://localhost/gsmith?user=gsmith -At -c SELECT 1 psql -d postgresql://localhost/gsmith?user=gsmithport=5432 -At -c SELECT 1 psql -d postgresql://localhost/gsmith?user=gsmith\port=5432 -At -c SELECT 1 Replace all the gsmith with $USER to make this usable for others. My eyes are starting to cross when I look at URI now, so that's enough for today. If Alex wants to rev this soon, great; if not I have a good idea what I'd like to do with this next, regardless of that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead
On 02/22/2012 12:25 PM, Marti Raudsepp wrote: On Wed, Feb 22, 2012 at 18:44, Greg Smithg...@2ndquadrant.com wrote: As far as I've been able to tell, there aren't any issues unique to Windows there. Multiple cores can have their TSC results get out of sync on Windows for the same reason they do on Linux systems, and there's also the same frequency/temperature issues. Not on recent Linux kernel versions. Linux automatically detects when the TSC is unstable (due to power management or out-of-sync cores/sockets) and automatically falls back to the more expensive HPET or ACPI methods. From the patch: Newer operating systems may check for the known TSC problems and switch to a slower, more stable clock source when they are seen. If your system supports TSC time but doesn't default to that, it may be disabled for a good reason. I ran into a case like you're showing here in my longer exploration of this at http://archives.postgresql.org/message-id/4edf1871.2010...@2ndquadrant.com I stopped just short of showing what the TSC error message looked like. I hoped that with the above and some examples showing dmesg | grep, that would be enough to lead enough people toward finding this on their own. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On 2012-02-22 16:29, Tom Lane wrote: (Snip context) VACUUM ANALYZE consists of two separate passes, VACUUM and then ANALYZE, and the second pass is going to be random I/O by your definition no matter what. I don't suppose there's a case where the VACUUM (1) gets to delete lots and lots of rows that then don't need ANALYZE'ing, and (2) can do so without actually touching all those pages? Jeroen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] determining a type oid from the name
Say I'm writing an extension X, and I want to process data values from another extension that creates type Y (e.g. an hstore), what's the best way to determine the Oid of type Y in my module X code? SPI code that runs select 'something'::Y and then examines the oid in SPI_tuptable? Or do we have a utility function I have missed that, given a type name and the current search path will give me back the type Oid? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
On 22 February 2012 18:00, Andrew Dunstan and...@dunslane.net wrote: Say I'm writing an extension X, and I want to process data values from another extension that creates type Y (e.g. an hstore), what's the best way to determine the Oid of type Y in my module X code? SPI code that runs select 'something'::Y and then examines the oid in SPI_tuptable? Or do we have a utility function I have missed that, given a type name and the current search path will give me back the type Oid? Does this help? test=# SELECT pg_typeof('4834.34'::numeric)::oid; pg_typeof --- 1700 (1 row) -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
Thom Brown t...@linux.com wrote: Does this help? test=# SELECT pg_typeof('4834.34'::numeric)::oid; pg_typeof --- 1700 (1 row) Wouldn't it be easier to do this instead? test=# SELECT 'numeric'::regtype::oid; oid -- 1700 (1 row) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
On 22 February 2012 18:34, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Thom Brown t...@linux.com wrote: Does this help? test=# SELECT pg_typeof('4834.34'::numeric)::oid; pg_typeof --- 1700 (1 row) Wouldn't it be easier to do this instead? test=# SELECT 'numeric'::regtype::oid; oid -- 1700 (1 row) Well I may have misread the problem. I thought it was that for a particular data value, the oid of the type of that value was needed. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
2012/2/22 Robert Haas robertmh...@gmail.com: On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2012/2/22 Robert Haas robertmh...@gmail.com: On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. But is not true. Why? I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in general, because VACUUM has to scan the whole table, and ANALYZE only a fixed-size subset of its pages. It sounds like you just said the opposite of what you wanted to say. Yeah, I did. Woops. Let me try that again: ANALYZE should be faster; reads only some pages. VACUUM ANALYZE should be slower; reads them all. Dunno why Pavel's seeing the opposite without more info. usual pattern in our application is create table xx1 as select analyze xx1 create table xx2 as select from xx1, analyze xx2 create table xx3 as select ... from xx3, analyze xx3 create table xx4 as select ... from xx1, ... tables xx** are use as cache. so we have to refresh statistic early. in this situation - and I found so in this case VACUUM ANALYZE is faster (30%) than ANALYZE. Size of xx** is usually between 500Kb and 8Kb This is not usual pattern for OLTP - Application is strictly OLAP. Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead
On Wed, Feb 22, 2012 at 19:36, Greg Smith g...@2ndquadrant.com wrote: From the patch: Newer operating systems may check for the known TSC problems and switch to a slower, more stable clock source when they are seen. If your system supports TSC time but doesn't default to that, it may be disabled for a good reason. Sorry, I was under the impression that the stability of Windows's QueryPerformanceCounter() API is hardware-dependent, but I haven't coded under Windows for a long time -- maybe it's improved in recent versions. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
Pavel Stehule pavel.steh...@gmail.com wrote: usual pattern in our application is create table xx1 as select analyze xx1 create table xx2 as select from xx1, analyze xx2 create table xx3 as select ... from xx3, analyze xx3 create table xx4 as select ... from xx1, ... tables xx** are use as cache. so we have to refresh statistic early. in this situation - and I found so in this case VACUUM ANALYZE is faster (30%) than ANALYZE. Size of xx** is usually between 500Kb and 8Kb This is not usual pattern for OLTP - Application is strictly OLAP. Is the VACUUM ANALYZE step faster, or is the overall job faster if VACUUM ANALYZE is run? You may be running into the need to rewrite pages at an inopportune time or order without the VACUUM. Have you tried getting a time VACUUM FREEZE ANALYZE on these cache tables instead of plain VACUUM ANALYZE? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On 16 January 2012 06:28, Greg Smith g...@2ndquadrant.com wrote: One of the most useful bits of feedback on how well checkpoint I/O is going is the amount of time taken to sync files to disk. Right now the only way to get that is to parse the logs. The attached patch publishes the most useful three bits of data you could only get from log_checkpoints before out to pg_stat_bgwriter. Easiest to just show an example: This is a preliminary review of the patch. Attached is a revision of the patch that clears up some minor bit-rot and style issues, just as a convenience to everyone else who might like to try it out against master who doesn't want to repeat my work. There were oid collisions against master, and an upstream change in the pg_proc format with the introduction of the proleakproof column. Regression tests pass (as they must have in the prior revision, as the original authors modified expected/rules.out). Note that I have deliberately avoided changing the patch further than that, though I believe that further changes will be necessary. One beef that I have with the variable name m_write_ms is that ms could equally well refer to microseconds or milliseconds, and these mistakes are very common. Therefore, you might think about making the fields of type instr_time, which is consistent with some other such fields used within the statistics collector - this takes care of issues with overflows, portability and confusion over units, to as great an extent as is possible. I'm not even confident that this is the right thing though, because PgStat_StatFuncEntry, for example, *does* use PgStat_Countuer as the type for time values, in its case apparently *microseconds* rather than milliseconds, even though the pg_stat_user_functions view displays values in millisecond (it's incorrectly commented). Even the use of instr_time within some pgstat.h structs sees the value converted to microseconds for transmission to the collector. It might be that since we don't really accumulate time values here, it's better to just directly represent that value as PgStat_Counter, but in that case I'd still clearly mark the value as being milliseconds. In any case, I would have the columns in milliseconds, but not ending with *_ms within pg_stat_bgwriter, for consistency with pg_stat_user_functions et al. There is no reason to cast to uint64 when accumulating milliseconds, as the lvalue is currently a PgStat_Counter. Details of existing comment bug: The docs show that pg_stat_*_functions accumulates time in units of milliseconds. However, a PgStat_FunctionEntry payload is commented as sending the time/self time to the stats collector in microseconds. So this comment, in the existing code, is actually wrong: PgStat_Counter f_time; /* times in microseconds */ PgStat_Counter f_time_self; } PgStat_StatFuncEntry; Can someone commit a fix for that please? It could easily result in a user-visible bug. Mixing units, and representing them as simple integers and totally eschewing type safety is sort of a pet peeve of mine, but consistency with the existing code is more important. My general impression of the patch's code quality is that it is more or less idomatic, and, as Greg said, simple; it generalises from the style of existing code in LogCheckpointEnd(), as well as existing fields within the PgStat_MsgBgWriter and PgStat_GlobalStats structs . As for the substance of the patch, I am in general agreement that this is a good idea. Storing the statistics in pg_stat_bgwriter is a more flexible approach than is immediately apparent. Users can usefully calculate delta values, as part of the same process through which checkpoint tuning is performed by comparing output from select now(), * from pg_stat_bgwriter at different intervals. This also puts this information within easy reach of monitoring tools. So, I'd ask Greg and/or Jaime to produce a revision of the patch with those concerns in mind, as well as fixing the md.c usage of log_checkpoints. That variable is currently used in these places only: src/backend/access/transam/xlog.c 80:bool log_checkpoints = false; 7694:* Note: because it is possible for log_checkpoints to change while a 7696:* log_checkpoints is currently off. 7835: if (log_checkpoints) 8035: if (log_checkpoints) 8229:* Note: because it is possible for log_checkpoints to change while a 8231:* log_checkpoints is currently off. 8236: if (log_checkpoints) 8301: if (log_checkpoints) 8305: ereport((log_checkpoints ? LOG : DEBUG2), src/backend/storage/smgr/md.c 1097: if (log_checkpoints) 1105: if (log_checkpoints (!INSTR_TIME_IS_ZERO(sync_start))) I also agree with separating backend write times due to lagging background writes, and the backend writes by design, as described by Jeff. I'm not inclined to worry too much about the usability issues surrounding the relative values of things like
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
2012/2/22 Kevin Grittner kevin.gritt...@wicourts.gov: Pavel Stehule pavel.steh...@gmail.com wrote: usual pattern in our application is create table xx1 as select analyze xx1 create table xx2 as select from xx1, analyze xx2 create table xx3 as select ... from xx3, analyze xx3 create table xx4 as select ... from xx1, ... tables xx** are use as cache. so we have to refresh statistic early. in this situation - and I found so in this case VACUUM ANALYZE is faster (30%) than ANALYZE. Size of xx** is usually between 500Kb and 8Kb This is not usual pattern for OLTP - Application is strictly OLAP. Is the VACUUM ANALYZE step faster, or is the overall job faster if VACUUM ANALYZE is run? You may be running into the need to rewrite pages at an inopportune time or order without the VACUUM. Have you tried getting a time VACUUM FREEZE ANALYZE on these cache tables instead of plain VACUUM ANALYZE? -Kevin vacuum freeze analyze is slower as expected. vacuum analyze is little bit faster or same in any step then analyze. I expected so just analyze should be significantly faster and it is not. Tom's demonstration is enough for me. ANALYZE doesn't read complete table, but uses random IO. VACUUM ANALYZE reads complete table, but it uses seq IO and vacuum is fast (because it does nothing) in our case. Thank You Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
On 02/22/2012 01:36 PM, Thom Brown wrote: On 22 February 2012 18:34, Kevin Grittnerkevin.gritt...@wicourts.gov wrote: Thom Brownt...@linux.com wrote: Does this help? test=# SELECT pg_typeof('4834.34'::numeric)::oid; pg_typeof --- 1700 (1 row) Wouldn't it be easier to do this instead? test=# SELECT 'numeric'::regtype::oid; oid -- 1700 (1 row) Well I may have misread the problem. I thought it was that for a particular data value, the oid of the type of that value was needed. Maybe I need to be more clear. The C code I'm writing will process composites. I want to cache the Oids of certain non-builtin types in the function info's fn_extra, and then be able to test whether or not the fields in the composites are of those types. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. VACUUM ANALYZE scans the whole table sequentially. ANALYZE accesses a random sample of data blocks. Random access is slower than sequential access, so at some threshold of sample size and sequential/random I/O speed ratio ANALYZE could become slower. That analysis is entirely wrong. In the first place, although ANALYZE doesn't read all the blocks, what it does read it reads in block number order. So it's not like there are random seeks all over the disk that would not need to happen anyway. Entirely right it would seem, since your later comments match my own. The industry accepted description for non-sequential access is random access whether or not the function that describes the movement is entirely random. To argue otherwise is merely hairsplitting. The disk access is not-sequential for ANALYZE. Not-sequential access is slower on some hardware, and so given a large enough sample it can account for the observed difference. Additional access to the disk while the ANALYZE was running would actually make it fully random, if anyone really cares. If the filesystem is hugely biased towards sequential I/O for some reason, and the VACUUM scan causes the whole table to become resident in RAM where ANALYZE can read it for free, then I guess it might be possible to arrive at Pavel's result. But it would be an awfully narrow corner case. I cannot believe that his statement is true in general, or even for a noticeably large fraction of cases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Sun, Feb 19, 2012 at 01:13:10PM +0200, Peter Eisentraut wrote: The documentation of the pg_upgrade -l/--logfile option never made much sense to me: -l, --logfile=FILENAMElog session activity to file I don't know what session means for pg_upgrade, so I never used it. What it actually does is log the output of all the programs that pg_upgrade calls internally, such as pg_ctl, psql, vacuumdb, pg_resetxlog, to the specified file, which is quite useful for analyzing errors such as unable to connect to new postmaster started with the command: /usr/lib/postgresql/9.1/bin/pg_ctl -w -l /dev/null -D /var/lib/postgresql/9.1/main -o -p 5433 -b start /dev/null 21 where -l would have put something in the place of /dev/null. So what might be a better wording for this option? Something like log output of internally called programs to file? How about? -l, --logfile=FILENAMElog internal activity to file -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Sun, Feb 19, 2012 at 01:24:34PM -0500, Robert Haas wrote: As a more general comment, I think that the way pg_upgrade does logging right now is absolutely terrible. IME, it is utterly impossible to understand what has gone wrong with pg_upgrade without looking at the log file. And by default, no log file is created. So typically what happens is: - I run pg_upgrade. It fails. - I rename the control file from the old cluster back to its original name. - I rerun pg_upgrade, this time with -l. It fails again. - I read the log file, figure out what the problem is, and correct it. - I rename the control file from the old cluster back to its original name, again. - I run pg_upgrade a third time. - On a good day, it works, else go to step 5. One pretty obvious improvement would be: if pg_upgrade fails after renaming the control file for the old cluster out of the way - say, while loading the schema dump into the new cluster - have it RENAME THE OLD CONTROL FILE BACK before exiting. But I also think the The behavior you are seeing now is the paranoia inherent in pg_upgrade's design. Now that pg_upgrade is being used more, perhaps that needs to be relaxed. However, remember we rename that control file to prevent the old cluster from being run accidentally, which is particular important in link mode. There might be some error cases that still would not restore the location of that file if we have a revert behavior on error. A more normal behavior would be for pg_upgrade to rename the control file only when the upgrade completes successfully. logging needs improvement. Right now, we studiously redirect both stdout and stderr to /dev/null; maybe it would be better to redirect stdout to /dev/null and NOT redirect stderr. If that generates too much chatter in non-failure cases, then let's adjust the output of the commands pg_upgrade is invoking until it doesn't. The actual cause of the failure, rather than pg_upgrade's fairly-useless gloss on it, ought to be visible right away, at least IMHO. Well, we have a -d option for debug; we could modify that to have debug levels. Also, from the command line, it is difficult to have multiple process write into a single file, so that isn't going work to have pg_upgrade and the server logging to the same file on Windows. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
Andrew Dunstan and...@dunslane.net writes: Maybe I need to be more clear. The C code I'm writing will process composites. I want to cache the Oids of certain non-builtin types in the function info's fn_extra, and then be able to test whether or not the fields in the composites are of those types. What's your basis for identifying those types in the first place? Name? Doesn't seem terribly robust if the other extension can be installed in some random schema. But anyway, something in parser/parse_type.c ought to help you with that --- maybe parseTypeString? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
Excerpts from Bruce Momjian's message of mié feb 22 17:01:10 -0300 2012: On Sun, Feb 19, 2012 at 01:24:34PM -0500, Robert Haas wrote: One pretty obvious improvement would be: if pg_upgrade fails after renaming the control file for the old cluster out of the way - say, while loading the schema dump into the new cluster - have it RENAME THE OLD CONTROL FILE BACK before exiting. But I also think the The behavior you are seeing now is the paranoia inherent in pg_upgrade's design. Now that pg_upgrade is being used more, perhaps that needs to be relaxed. However, remember we rename that control file to prevent the old cluster from being run accidentally, which is particular important in link mode. ... but if the upgrade failed, clearly this shouldn't be a problem. I agree with Robert, and was bit by this last week -- in case of any error during the procedure, the control file should be renamed back to its original name. There might be some error cases that still would not restore the location of that file if we have a revert behavior on error. A more normal behavior would be for pg_upgrade to rename the control file only when the upgrade completes successfully. Not sure about this. If the upgrades completes successfully and the file is not renamed at the last minute due to some error, that would be a problem as well, because now the old cluster would happily run and perhaps corrupt the data files from under the new cluster. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REASSIGN OWNED lacks support for FDWs
Excerpts from Etsuro Fujita's message of mié feb 22 05:37:36 -0300 2012: I did some tests. The results look good to me. Please find attached a logfile. Thanks. My only concern on the patch is +static void +AlterForeignServerOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId) +{ +Form_pg_foreign_server form; -srvId = HeapTupleGetOid(tup); form = (Form_pg_foreign_server) GETSTRUCT(tup); if (form-srvowner != newOwnerId) @@ -366,10 +388,15 @@ AlterForeignServerOwner(const char *name, Oid newOwnerId) /* Superusers can always do it */ if (!superuser()) { I wonder if superusers can always do it. For example, is it OK for superusers to change the ownership of a foreign server owned by old_role to new_role that doesn't have USAGE privilege on its foreign data wrapper. Well, permission checking are just what they were before the patch. I did not change them here. I didn't participate in the discussions that led to the current behavior, but as far as I know the guiding principle here is that superusers always can do whatever they please. Maybe what you point out is a bug in the behavior (both before and after my patch), but if so, please raise it separately. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
On 02/22/2012 03:20 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: Maybe I need to be more clear. The C code I'm writing will process composites. I want to cache the Oids of certain non-builtin types in the function info's fn_extra, and then be able to test whether or not the fields in the composites are of those types. What's your basis for identifying those types in the first place? Name? Doesn't seem terribly robust if the other extension can be installed in some random schema. But anyway, something in parser/parse_type.c ought to help you with that --- maybe parseTypeString? Thanks, that might do the trick. I fully agree it's not bulletproof, but I'm not sure what alternative there is. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Wed, Feb 22, 2012 at 05:22:29PM -0300, Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of mié feb 22 17:01:10 -0300 2012: On Sun, Feb 19, 2012 at 01:24:34PM -0500, Robert Haas wrote: One pretty obvious improvement would be: if pg_upgrade fails after renaming the control file for the old cluster out of the way - say, while loading the schema dump into the new cluster - have it RENAME THE OLD CONTROL FILE BACK before exiting. But I also think the The behavior you are seeing now is the paranoia inherent in pg_upgrade's design. Now that pg_upgrade is being used more, perhaps that needs to be relaxed. However, remember we rename that control file to prevent the old cluster from being run accidentally, which is particular important in link mode. ... but if the upgrade failed, clearly this shouldn't be a problem. I agree with Robert, and was bit by this last week -- in case of any error during the procedure, the control file should be renamed back to its original name. There might be some error cases that still would not restore the location of that file if we have a revert behavior on error. A more normal behavior would be for pg_upgrade to rename the control file only when the upgrade completes successfully. Not sure about this. If the upgrades completes successfully and the file is not renamed at the last minute due to some error, that would be a problem as well, because now the old cluster would happily run and perhaps corrupt the data files from under the new cluster. Well, the basic problem is that the user, before pg_upgrade started, installed a new cluster that works. If we rename the old control, but rename it back on failure, there are cases we will miss, kill like -9 or a server crash, and it will not be obvious to them that the control file was renamed. Of course, if we only rename on success, and there is kill -9 or server crash, the old cluster is still start-able, like the new one. One good argument for the rename early is that on a server crash, the system is probably going to restart the database automatically, and that means the old server. Right now we have a clear message that they need to rename the control file to start the old server. Not sure what the new wording would look like --- let me try. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On ons, 2012-02-22 at 14:38 -0500, Bruce Momjian wrote: How about? -l, --logfile=FILENAMElog internal activity to file That sounds better. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On sön, 2012-02-19 at 13:24 -0500, Robert Haas wrote: But I also think the logging needs improvement. Right now, we studiously redirect both stdout and stderr to /dev/null; maybe it would be better to redirect stdout to /dev/null and NOT redirect stderr. If that generates too much chatter in non-failure cases, then let's adjust the output of the commands pg_upgrade is invoking until it doesn't. That should be achievable for calls to psql and vacuumdb, say, but what would you do with the server logs? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REASSIGN OWNED lacks support for FDWs
Excerpts from Tom Lane's message of mar feb 21 21:30:39 -0300 2012: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Alvaro Herrera's message of mar feb 21 15:54:03 -0300 2012: Excerpts from Tom Lane's message of lun feb 20 12:37:45 -0300 2012: As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php there is no switch case in shdepReassignOwned for foreign data wrappers. I'm gonna take a stab at fixing this now (the simple way). Here's the patch I have; last minute opinions. I intend to push a backpatch to 9.1 and 9.0. 8.4 has the same problem, but since Heikki punted in e356743f3ed45c36dcc4d0dbf6c1e8751b3d70b5, I'm not going to bother either. Looks roughly like what I expected, but I haven't tested it. Thanks, pushed. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: PL/pgPSM for 9.3
Hello I am sending a proposal for PSM language support. It is early maybe. I would to have a patch for first 9.3 commitfest. Proposal PL/pgPSM I propose to integrate a PSM language into the core. This language is defined as part of ANSI SQL - SQL/PSM and is used in some well known databases like DB2, Terradata and some other less known RDBMS like MonetDB. A The proposed implementation is based on the same architecture as the current PL/pgSQL interpreter - interpretation of AST with integration of an SQL parser and expression executor. Reasons why to use same architecture are: reuse some parts of interpreter and well experience with the current interpreter. One year ago I wrote a PSM compiler to pcode and pcode interpreter - PSM0. This project showed that using pcode doesn't carry any speedup over AST interpret - the bottleneck was an expression execution - and PostgreSQL executor is the part that we would not substitute, so we can use a simpler and well known AST interpreter. Language is specified by the SQL standard SQL/PSM - so I'll describe main differences against PL/pgSQL: * Exception handlers are subroutines (continue handlers). * Warnings can be handled like an exception - handling warnings is important and well known pattern BEGIN DECLARE success boolean DEFAULT true; DECLARE _x varchar; DECLARE c CURSOR FOR SELECT x FROM tab; DECLARE CONTINUE HANDLER FOR NOT FOUND SET success = false; OPEN c; FETCH c INTO _x; WHILE success DO -- process _x FETCH c INTO _x; END WHILE; CLOSE c; END; * Declaration of variables should be based on SQL query result - and because we would have to execute and check queries only at runtime, we have to use a little bit different rules for variables - variables should not be declared before they are used - it is possible, because there are different syntax for assign statement. Automatic variables should be read only. BEGIN DECLARE s text; FOR SELECT curse FROM courses DO SET s = COALESCE(s || ',' curse, course); END FOR; Other differences are minor. Request: * reusing important parts of plpgsql executor - simple expressions, using variables, assign results to variables * support “check function” statement * be as near to the standard as possible * implement well known patterns that are not in standard, but that have impact on speed or usability and are used in well known implementations (SQLCODE variable, multi assign). -- little bit more effective iteration (well known pattern) BEGIN OPEN c; FETCH c INTO _x; WHILE SQLCODE = 0 DO -- process _x; FETCH c INTO _x; END WHILE; CLOSE c; END; * check strictly expression syntax don't allow plpgsql's var := SELECT 10; var := a FROM tab and similar, only set var = 10; or set var = (SELECT 10); or set var = (SELECT a FROM tab) should be allowed LIMITS: * PostgreSQL doesn't support procedures - statement CALL will not be supported (I would like to see CALL statement as PostgreSQL statement, not only PSM emulation) * I don't plan to implement PSM into SQL parser in first step. Reason - this feature requries an introduction of a DELIMITERs (like MySQL or DB2). This concept is not adapted in PostgreSQL and PostgreSQL design is more user friendly for work from command line tools. This can be enhanced in future, if we find a way to extend the bison parser. Using $$ delimiters for function body doesn't block some future enhancing. Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
On Wed, 2012-02-22 at 12:44 -0400, Andrew Dunstan wrote: Returning to the original point, I've come to the conclusion that pure isn't the right way to go. The trouble with leakproof is that it doesn't point to what it is that's not leaking, which is information rather than memory, as many might imagine (and I did) without further hints. I'm not sure any single English word would be as descriptive as I'd like. As the developer of veil I feel marginally qualified to bikeshed here: how about silent? A silent function being one that will not blab. There are also quite a few synonyms in the thesaurus for trustworthy. I kind of like honorable or righteous myself. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] leakproof
On 02/22/2012 04:29 PM, Marc Munro wrote: On Wed, 2012-02-22 at 12:44 -0400, Andrew Dunstan wrote: Returning to the original point, I've come to the conclusion that pure isn't the right way to go. The trouble with leakproof is that it doesn't point to what it is that's not leaking, which is information rather than memory, as many might imagine (and I did) without further hints. I'm not sure any single English word would be as descriptive as I'd like. As the developer of veil I feel marginally qualified to bikeshed here: how about silent? A silent function being one that will not blab. I also made this suggestion later in the day. There are also quite a few synonyms in the thesaurus for trustworthy. I kind of like honorable or righteous myself. Let's not go there. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determining a type oid from the name
Andrew Dunstan and...@dunslane.net writes: I fully agree it's not bulletproof, but I'm not sure what alternative there is. If you know the type has been installed as an extension you can look at the extension's content in pg_depend, much like \dx+ does, limiting to only types whose name matches. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
On Wed, Feb 22, 2012 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Anyway, to your point, I suppose I might hesitate to mark factorial leak-proof even if it didn't throw an error on overflow, because the time it takes to return an answer for larger inputs does grow rather rapidly. But it's kind of a moot point because the error makes it not leak-proof anyway. So maybe we're just splitting hairs here, however we decide to label this. Speaking of hair-splitting ... A strict interpretation of no errors can be thrown would, for example, rule out any function that either takes or returns a varlena datatype, since those are potentially going to throw out-of-memory errors if they can't palloc a result value or a temporary detoasted input value. I don't suppose that we want that, which means that this rule of thumb is wrong in detail, and there had better be some more subtle definition of what is okay or not, perhaps along the line of must not throw any errors that reveal anything useful about the input value. Have we got such a definition? (I confess to not having followed this patch very closely.) Not exactly; I've kind of been playing it by ear, but I agree that out-of-memory errors based on the input value being huge are probably not something we want to stress out about too much. In theory you could probe for the approximate size of the value by using up nearly all the memory on the system, leaving a varying amount behind, and then see whether you get an out of memory error. But again, if people are going to that kind of trouble to ferret out just the approximate size of the data, it was probably a bad idea to let them log into the database at all, in any way, in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs si...@2ndquadrant.com wrote: The industry accepted description for non-sequential access is random access whether or not the function that describes the movement is entirely random. To argue otherwise is merely hairsplitting. I don't think so. For example, a bitmap index scan contrives to speed things up by arranging for the table I/O to happen in ascending block number order, with skips, rather than in random order, as a plain index scan would do, and that seems to be a pretty effective technique. Except to the extent that it interferes with the kernel's ability to do readahead, it really can't be to read blocks 1, 2, 3, 4, and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't require more effort than reading it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] swapcache-style cache?
Has anyone considered managing a system like the DragonFLY swapcache for a DBMS like PostgreSQL? ie where the admin can assign drives with good random read behaviour (but perhaps also-ran random write) such as SSDs to provide a cache for blocks that were dirtied, with async write that hopefully writes them out before they are forcibly discarded. And where a cache fail (whether by timeout, hard fail, or CRC fail) just means having to go back to the real transactional storage. I'd been thinking that swapcache would help where the working set won't fit in RAM, also L2ARC on Solaris - but it seems to me that there is no reason not to allow the DBMS to manage the set-aside area itself where it is given either access to the raw device or to a pre-sized file on the device it can map in segments. While L2ARC is obviously very heavyweight and entwined in ZFS, Dragonfly's swapcache seems to me remarkably elegant and, it would seem, very effective. James -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Wed, Feb 22, 2012 at 3:51 PM, Peter Eisentraut pete...@gmx.net wrote: On sön, 2012-02-19 at 13:24 -0500, Robert Haas wrote: But I also think the logging needs improvement. Right now, we studiously redirect both stdout and stderr to /dev/null; maybe it would be better to redirect stdout to /dev/null and NOT redirect stderr. If that generates too much chatter in non-failure cases, then let's adjust the output of the commands pg_upgrade is invoking until it doesn't. That should be achievable for calls to psql and vacuumdb, say, but what would you do with the server logs? I don't know. It might be less of an issue, though. I mean, IME, what typically happens is that psql fails to restore the dump, either because it can't connect to the new database or because it's confused by some stupid case that isn't handled well. So even if we could just improve the error handling to report those types of failures more transparently, I think it would be a big improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: proof of concept patch for fixing quantified regex backrefs
On Wed, Feb 22, 2012 at 11:19 AM, Tom Lane t...@sss.pgh.pa.us wrote: Attached is as far as I've gotten with fixing depesz's complaint about backrefs embedded within larger quantified expressions (the complaint being that only the last match of the backref is checked properly). This is per the analysis I posted at https://sourceforge.net/tracker/index.php?func=detailaid=1115587group_id=10894atid=110894 to the effect that the engine really has to have an iteration subre type. The patch is incomplete because I have not written the code yet for the shortest-match-first case, only longest-match-first. Within that restriction, it seems to work, though I have not yet tried the Tcl regression tests on it. I have to set this aside now and go focus on release tasks (like writing release notes), so it's not going to be done in time to include in the upcoming back-branch releases. I have mixed feelings about whether to treat it as a back-patchable bug fix when it's done ... it's certainly a bug fix but the odds of introducing new issues seem higher than average. So maybe it should only go into HEAD anyway. Thoughts? +1 for just doing it in HEAD. This strikes me as the sort of thing that has a higher-than-average chance of breaking applications in subtle ways, and that's exactly the sort of thing that we don't want to do in a minor release, especially in older branches where people are presumably not upgrading precisely because what they have today works for them. If someone really needs the fix, they can always back-port it themselves... I don't think that code has changed much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
I decided that it would be worth benchmarking this patch. Specifically, I tested: master, as a basis of comparison checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'on' checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'off' This test was performed using pgbench-tools. At different client counts and scaling factors 1,10,100, performance of an update.sql workload was tested. This benchmark used Greg Smith's toy server. As he put it recently: The main change to the 8 hyperthreaded core test server (Intel i7-870) for this year is bumping it from 8GB to 16GB of RAM, which effectively doubles the scale I can reach before things slow dramatically. However, while Greg used scientific Linux for his recent batch of performance numbers, the box was booted into Debian for this, which used Kernel 2.6.32, FWIW. Didn't bother with a separate disk for WAL. I put shared_buffers at 1GB, and checkpoint_segments at 50. I took the additional precaution of initdb'ing for each set, lest there be some kind of contamination between sets, which necessitated doing some additional work since I couldn't very well expect the results database to persist. Different sets of figures from different runs where dumped and restored, before finally being combined so that pgbench-tools could produce it's regular report. I have attached a config for pgbench-tools, so that others may recreate my work here. I also attach the most relevant image, comparing each test set across scaling levels. I'll make a pdf of the full report available if that would be useful. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services config Description: Binary data attachment: scaling-sets-page-checksums.png -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
Andrew Dunstan and...@dunslane.net writes: On 02/22/2012 04:29 PM, Marc Munro wrote: As the developer of veil I feel marginally qualified to bikeshed here: how about silent? A silent function being one that will not blab. I also made this suggestion later in the day. SILENT isn't a bad idea. It seems to lead the mind in the right direction, or at least not encourage people to guess the wrong meaning. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI rw-conflicts and 2PC
On Tue, 2012-02-14 at 19:32 -0500, Dan Ports wrote: On Tue, Feb 14, 2012 at 09:27:58AM -0600, Kevin Grittner wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 14.02.2012 04:57, Dan Ports wrote: The easiest answer would be to just treat every prepared transaction found during recovery as though it had a conflict in and out. This is roughly a one-line change, and it's certainly safe. +1. I don't even see this as much of a problem. Prepared transactions hanging around for arbitrary periods of time cause all kinds of problems already. Those using them need to be careful to resolve them quickly -- and if there's a crash involved, I think it's reasonable to say they should be resolved before continuing normal online operations. Hmm, it occurs to me if we have to abort a transaction due to serialization failure involving a prepared transaction, we might want to include the prepared transaction's gid in the errdetail. I like this idea. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Wed, Feb 22, 2012 at 05:49:26PM -0500, Robert Haas wrote: On Wed, Feb 22, 2012 at 3:51 PM, Peter Eisentraut pete...@gmx.net wrote: On sön, 2012-02-19 at 13:24 -0500, Robert Haas wrote: But I also think the logging needs improvement. Right now, we studiously redirect both stdout and stderr to /dev/null; maybe it would be better to redirect stdout to /dev/null and NOT redirect stderr. If that generates too much chatter in non-failure cases, then let's adjust the output of the commands pg_upgrade is invoking until it doesn't. That should be achievable for calls to psql and vacuumdb, say, but what would you do with the server logs? I don't know. It might be less of an issue, though. I mean, IME, what typically happens is that psql fails to restore the dump, either because it can't connect to the new database or because it's confused by some stupid case that isn't handled well. So even if we could just improve the error handling to report those types of failures more transparently, I think it would be a big improvement. Well, on Unix, it is easy to redirect the server logs to the same place as the pg_upgrade logs. That doesn't help? How would we improve the reporting of SQL restore failures? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Wed, Feb 22, 2012 at 10:50:07PM +0200, Peter Eisentraut wrote: On ons, 2012-02-22 at 14:38 -0500, Bruce Momjian wrote: How about? -l, --logfile=FILENAMElog internal activity to file That sounds better. Done. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Feb 22, 2012 at 11:17:53PM +, Peter Geoghegan wrote: I decided that it would be worth benchmarking this patch. Specifically, I tested: master, as a basis of comparison checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'on' checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'off' This test was performed using pgbench-tools. At different client counts and scaling factors 1,10,100, performance of an update.sql workload was tested. Looks interesting. Could you get some error bars around the numbers plotted, and possibly some scaling factors between 10 and 100? For the former, I'm looking for whether those changes are within ordinary variation, and in the latter, some better idea of what the curve looks like. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Commit a445cb92 not tested without OpenSSL support?
Hey folks It appears that the commit a445cb92ef5b3a31313ebce30e18cc1d6e0bdecb causes ld to bail when building *without* OpenSSL support: utils/misc/guc.o:(.data+0x4d80): undefined reference to `ssl_cert_file' utils/misc/guc.o:(.data+0x4ddc): undefined reference to `ssl_key_file' utils/misc/guc.o:(.data+0x4e38): undefined reference to `ssl_ca_file' utils/misc/guc.o:(.data+0x4e94): undefined reference to `ssl_crl_file' Tested; the conditional compilation predicated upon #ifdef USE_SSL in be-secure.c is satisfied with configure (--)with(-)openssl to succeed. Regards, Affan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
On Wed, Feb 22, 2012 at 6:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 02/22/2012 04:29 PM, Marc Munro wrote: As the developer of veil I feel marginally qualified to bikeshed here: how about silent? A silent function being one that will not blab. I also made this suggestion later in the day. SILENT isn't a bad idea. It seems to lead the mind in the right direction, or at least not encourage people to guess the wrong meaning. TACITURN. UNLOQUACIOUS. TIGHTLIPPED. LACONIC. OK, OK, I'm joking... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] typo fix
On Wed, Feb 22, 2012 at 8:47 AM, Sandro Santilli s...@keybit.net wrote: Typo in a comment... Committed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On Tue, Feb 21, 2012 at 11:55 PM, Greg Smith g...@2ndquadrant.com wrote: I just took this for spin. Everything I tried worked, docs built and read fine. The description of how dirty differs from written is a bit cryptic, but I don't see an easy way to do better without a whole new section on that topic. Once the extension upgrade questions are sorted out, I'd say this is ready to commit. Example I have at the bottom here shows a case where this is a big improvement over the existing tracking. I think this is a must-have improvement if we're going to advocate using pg_stat_statements for more things. Glad you like it; committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Feb 22, 2012 at 6:17 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: I decided that it would be worth benchmarking this patch. Specifically, I tested: master, as a basis of comparison checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'on' checksum16_with_wallogged_hint_bits.v10.patch, page_checksums = 'off' This test was performed using pgbench-tools. At different client counts and scaling factors 1,10,100, performance of an update.sql workload was tested. This benchmark used Greg Smith's toy server. As he put it recently: The main change to the 8 hyperthreaded core test server (Intel i7-870) for this year is bumping it from 8GB to 16GB of RAM, which effectively doubles the scale I can reach before things slow dramatically. However, while Greg used scientific Linux for his recent batch of performance numbers, the box was booted into Debian for this, which used Kernel 2.6.32, FWIW. Didn't bother with a separate disk for WAL. I put shared_buffers at 1GB, and checkpoint_segments at 50. I took the additional precaution of initdb'ing for each set, lest there be some kind of contamination between sets, which necessitated doing some additional work since I couldn't very well expect the results database to persist. Different sets of figures from different runs where dumped and restored, before finally being combined so that pgbench-tools could produce it's regular report. I have attached a config for pgbench-tools, so that others may recreate my work here. I also attach the most relevant image, comparing each test set across scaling levels. I'll make a pdf of the full report available if that would be useful. Thanks for testing this. The graph obscures a bit how much percentage change we're talking about here - could you post the raw tps numbers? I think we also need to test the case of seq-scanning a large, unhinted table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we add crc32 in libpgport?
On Thu, Feb 16, 2012 at 6:09 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina dan...@heroku.com wrote: Ah, yes, I think my optimizations were off when building, or something. I didn't get such verbosity at first, and then I remember doing something slightly different and then getting a lot of output. I didn't pay attention to the build size. I will investigate. [...] I agree, I was about to say what about a preprocessor hack... after the last paragraph, but saw you beat me to the punch. I'll have a look soon. Ping! Err, yes. Clearly I've managed to not do this, and not see your email until now. Here's what I intend to do: 1) Split the tables into another header file, per Tom's suggestion 2) #include those tables in pgport exactly once. Per Tom's objection that pgport is not always available in distributions, that is not the only way the table will be exposed, but as pgport is definitely built and available when building postgres proper. 3) Third-party projects and contribs should use the header file, and not libpgport It's still a bit awkward in that one is including something that's not really a port (except in the degenerate sense, as no system has these tables defined vs, say, gettimeofday, where Windows needs a port), but it's the only thing that I can see that is compiled once and can be linked against repeatedly in postgres's build without having to, say, directly cross-reference the pg_crc.o file (as seen in the two command line utilities that need crc). Thoughts? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers with DO functionality
On Fri, Feb 17, 2012 at 11:43:53AM -0500, Andrew Dunstan wrote: On 02/17/2012 11:29 AM, David E. Wheeler wrote: On Feb 17, 2012, at 5:22 AM, Thom Brown wrote: The purpose being to only have a single statement to set up the trigger rather than setting up a separate trigger function which will unlikely be re-used by other triggers... or is this of dubious benefit? +1, though I imagine it would just give it a generated name and save it anyway, eh? Before we rush into this, let's consider all the wrinkles. For example, what if you need to change the function? And how would you edit the function in psql? It might be a bit more involved that it seems at first glance, although my initial reaction was the same as David's. Another complication: anonymous triggers would either have to be alone, or provide a mechanism to manage a sequence of anonymous triggers on the same table (such as replace the third trigger with ... or move trigger #4 in position #2, or deciding their order of execution). Cheers, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers