Re: [HACKERS] cataloguing NOT NULL constraints
On 4 August 2011 18:57, Peter Eisentraut pete...@gmx.net wrote: Have you considered just cataloging NOT NULL constraints as CHECK constraints and teaching the reverse parser to convert x CHECK (x IS NOT NULL) to x NOT NULL. It seems to me that we're adding a whole lot of hoopla here that is essentially identical to the existing CHECK constraint support (it must be, per SQL standard), for no additional functionality. With this approach, if the user explicitly wrote CHECK (x IS NOT NULL) would it end up setting attnotnull? Presumably, since the pg_constraint entry would be indistinguishable, it would be difficult to do otherwise. From a performance standpoint that might be a good thing, but it's going to be bad for compatibility. What if they wrote CHECK (NOT x IS NULL), or CHECK (x IS DISTINCT FROM NULL)? How many variants would it reverse parse? What would this do to error messages when the constraint is violated? I'm not convinced this simplifies things much; it just moves the complexity elsewhere, and at the cost of losing compatibility with the current behaviour. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
On 6 August 2011 01:01, Peter Eisentraut pete...@gmx.net wrote: Before embarking on rewriting this patch from scratch, I would like to know what's your opinion (or the SQL standard's) on the fact that this patch separated the PRIMARY KEY from NOT NULL constraints, so that they don't act exactly alike (to wit, the not-nullness of a PK does not inherit while the one from a NOT NULL constraint does). The SQL standard deals with inheritance in terms of composite types, which don't have constraints, so that doesn't give any guidance. That said, I think the substitutability property of object-oriented systems, namely that you can use a child object in place of a parent object, requires in principle that we inherit all constraints (by default, at least). We don't inherit primary keys because of implementation issues with indexes, but at some point in the future we should fix that. So to some degree, inheriting the not-null property of primary keys while not inheriting the rest of it is a bit wrong, but it would appear to be a step in the right direction, and changing established behavior seems a bit gratuitous to me. The current behaviour is inconsistent - the not-null property of a PK is sometimes inherited and sometimes not, depending on whether the PK is added at table-creation time or later. So a change in either direction is a change to some current behaviour, unless we leave it inconsistent, which seems unacceptable. So I don't think compatibility arguments apply here, and I would tend to agree that inheriting the not-null property of PKs while not inheriting the rest seems wrong. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
On 6 August 2011 08:17, Dean Rasheed dean.a.rash...@gmail.com wrote: The current behaviour is inconsistent - the not-null property of a PK is sometimes inherited and sometimes not, depending on whether the PK is added at table-creation time or later. Oops, that should have been depending on whether the PK is defined before or after the inheritance is set up. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
On 06.08.2011 06:32, Gokulakannan Somasundaram wrote: However, for small operations it's a net loss - you avoid writing a WAL record, but you have to fsync() the heap instead. If you only modify a few rows, the extra fsync (or fsyncs if there are indexes too) is more expensive than writing the WAL. We'd need a heuristic to decide whether to write WAL or fsync at the end. For regular INSERTs, UPDATEs and DELETEs, you have the planner's estimate of number of rows affected. Another thing we should do is move the fsync call from the end of COPY (and other such operations) to the end of transaction. That way if you do e.g one COPY followed by a bunch of smaller INSERTs or UPDATEs, you only need to fsync the files once. Have you thought about recovery, especially when the page size is greater than the disk block size( 4K ). With WAL, there is a way to restore the pages to the original state, during recovery, if there are partial page writes. Is it possible to do the same with direct fsync without WAL? The point of the optimization is that you can only skip WAL when it's been created (or truncated) in the same transaction. In that case, if the transaction aborts because of a crash, you don't care about the contents of the table anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
On lör, 2011-08-06 at 08:04 +0100, Dean Rasheed wrote: On 4 August 2011 18:57, Peter Eisentraut pete...@gmx.net wrote: Have you considered just cataloging NOT NULL constraints as CHECK constraints and teaching the reverse parser to convert x CHECK (x IS NOT NULL) to x NOT NULL. It seems to me that we're adding a whole lot of hoopla here that is essentially identical to the existing CHECK constraint support (it must be, per SQL standard), for no additional functionality. With this approach, if the user explicitly wrote CHECK (x IS NOT NULL) would it end up setting attnotnull? Yes, I would assume so. Presumably, since the pg_constraint entry would be indistinguishable, it would be difficult to do otherwise. From a performance standpoint that might be a good thing, but it's going to be bad for compatibility. What compatibility issue are you concerned about specifically? What if they wrote CHECK (NOT x IS NULL), or CHECK (x IS DISTINCT FROM NULL)? How many variants would it reverse parse? Well, it's already the case that the user can write check constraints in any number of forms that have the effect of restricting null values; and attnotnull isn't set in those cases. So in the beginning I'd be quite happy if we just recognized CHECK (x IS NOT NULL). Longer term, I think we could tie this in with more general nullability detection. For example, it is occasionally asked that we can expose nullability through views or CREATE TABLE AS. The SQL standards has rules for what cases we should detect (which don't include the two you give). What would this do to error messages when the constraint is violated? That's a reasonable concern, although not a fatal one, and it can be solved in any case. I'm not convinced this simplifies things much; it just moves the complexity elsewhere, and at the cost of losing compatibility with the current behaviour. No, I don't think this would lose compatibility (except perhaps in cases of error message wording). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
On lör, 2011-08-06 at 08:17 +0100, Dean Rasheed wrote: The current behaviour is inconsistent - the not-null property of a PK is sometimes inherited and sometimes not, depending on whether the PK is added at table-creation time or later. So a change in either direction is a change to some current behaviour, unless we leave it inconsistent, which seems unacceptable. Yeah, OK, that's wrong then. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
On Sat, Aug 6, 2011 at 4:16 AM, Bruce Momjian br...@momjian.us wrote: Well, if the table is created in the same transaction (which is the only case under consideration), no other sessions can write to the table so you are just writing the entire table on commit, rather than to the WAL. Below a certain point, skipping WAL is slower and over an intermediate range there is no benefit. So small amounts of data on large servers goes slower. heap_fsync() requires a scan of shared buffers, which may not be cheap. There is a difficulty because you would need to calculate the cut-off is for a particular database, and then predict ahead of time whether the number of rows that will be handled by the statement is low enough to warrant using the optimisation. Both of which I call a hard problem. I think we should remove the COPY optimisation because of this and definitely not extend INSERT SELECT to perform it automatically. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
On 6 August 2011 11:03, Peter Eisentraut pete...@gmx.net wrote: On lör, 2011-08-06 at 08:04 +0100, Dean Rasheed wrote: On 4 August 2011 18:57, Peter Eisentraut pete...@gmx.net wrote: Have you considered just cataloging NOT NULL constraints as CHECK constraints and teaching the reverse parser to convert x CHECK (x IS NOT NULL) to x NOT NULL. It seems to me that we're adding a whole lot of hoopla here that is essentially identical to the existing CHECK constraint support (it must be, per SQL standard), for no additional functionality. With this approach, if the user explicitly wrote CHECK (x IS NOT NULL) would it end up setting attnotnull? Yes, I would assume so. Presumably, since the pg_constraint entry would be indistinguishable, it would be difficult to do otherwise. From a performance standpoint that might be a good thing, but it's going to be bad for compatibility. What compatibility issue are you concerned about specifically? What if they wrote CHECK (NOT x IS NULL), or CHECK (x IS DISTINCT FROM NULL)? How many variants would it reverse parse? Well, it's already the case that the user can write check constraints in any number of forms that have the effect of restricting null values; and attnotnull isn't set in those cases. So in the beginning I'd be quite happy if we just recognized CHECK (x IS NOT NULL). Longer term, I think we could tie this in with more general nullability detection. For example, it is occasionally asked that we can expose nullability through views or CREATE TABLE AS. The SQL standards has rules for what cases we should detect (which don't include the two you give). What would this do to error messages when the constraint is violated? That's a reasonable concern, although not a fatal one, and it can be solved in any case. I'm not convinced this simplifies things much; it just moves the complexity elsewhere, and at the cost of losing compatibility with the current behaviour. No, I don't think this would lose compatibility (except perhaps in cases of error message wording). Hmm, maybe my compatibility concerns are not so serious. I'm still trying to work out exactly what user-visible changes this approach would lead to. Suppose you had: CREATE TABLE foo ( a int NOT NULL, b int CHECK (b IS NOT NULL), c int CHECK (NOT c IS NULL) ); Right now \d gives: Table public.foo Column | Type | Modifiers +-+--- a | integer | not null b | integer | c | integer | Check constraints: foo_b_check CHECK (b IS NOT NULL) foo_c_check CHECK (NOT c IS NULL) With this approach, one change would be that you'd gain an extra not null in the Modifiers column for b. But how many CHECK constraints would show? I guess it would show 3, although it could be changed to just show 1. But it certainly couldn't continue to show 2, since nothing in the catalogs could distinguish the constraints on a from those on b. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further news on Clang - spurious warnings
On 5 August 2011 20:07, Tom Lane t...@sss.pgh.pa.us wrote: This patch is a truly horrid idea, as it eliminates the error checking the compiler is trying to provide, and does so globally not only in the trouble spots. I think that the mistake I may have made here was assuming that the existing code was perfect, and therefore it was a simply matter of tweaking. With that assumption in mind, the only fix could have been a global fix. I should not have acted in haste. If I were trying to get rid of this warning, I'd be wondering why ReplNodeTag is a distinct enum in the first place. Indeed, that doesn't seem to be justified anywhere, and seems to be a violation of the abstraction of Node as a pseudo base class which would have broken any downcasting that we might have attempted to do. Since ReplNodeTag wasn't being used directly, just its enumeration constants, simply moving the constants to the global, generic NodeTag enum fixes the issue. That is what the attached patch does. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index d8bc6b8..2c5e162 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -403,6 +403,13 @@ typedef enum NodeTag T_CommonTableExpr, /* + * TAGS FOR REPLICATION PARSER (replnodes.h) + */ + T_IdentifySystemCmd = 950, + T_BaseBackupCmd, + T_StartReplicationCmd, + + /* * TAGS FOR RANDOM OTHER STUFF * * These are objects that aren't part of parse/plan/execute node tree @@ -410,7 +417,7 @@ typedef enum NodeTag * purposes (usually because they are involved in APIs where we want to * pass multiple object types through the same pointer). */ - T_TriggerData = 950, /* in commands/trigger.h */ + T_TriggerData = 1000, /* in commands/trigger.h */ T_ReturnSetInfo, /* in nodes/execnodes.h */ T_WindowObjectData, /* private in nodeWindowAgg.c */ T_TIDBitmap,/* in nodes/tidbitmap.h */ diff --git a/src/include/replication/replnodes.h b/src/include/replication/replnodes.h index e027f92..8523e7e 100644 --- a/src/include/replication/replnodes.h +++ b/src/include/replication/replnodes.h @@ -18,16 +18,6 @@ #include nodes/primnodes.h #include nodes/value.h -/* - * NodeTags for replication parser - */ -typedef enum ReplNodeTag -{ - T_IdentifySystemCmd = 10, - T_BaseBackupCmd, - T_StartReplicationCmd -} ReplNodeTag; - /* -- * IDENTIFY_SYSTEM command * -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
Dean Rasheed dean.a.rash...@gmail.com writes: [ wonders what psql's \d will do with NOT NULL constraints ] I think this might be taking the notion of it should be backwards compatible too far. We're not expecting this patch to not change the wording of error messages, for instance (in fact, I think a large part of the point is to add constraint names to not-null errors). Why would we expect it to not change what \d shows? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
On 06.08.2011 13:13, Simon Riggs wrote: I think we should remove the COPY optimisation because of this and definitely not extend INSERT SELECT to perform it automatically. It can be very helpful when loading a lot of data, so I'm not in favor of removing it altogether. Maybe WAL-log the first 1 rows or such normally, and skip WAL after that. Of course, loading 10001 rows becomes the worst case then, but something along those lines... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
On Sat, Aug 6, 2011 at 11:05 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.08.2011 13:13, Simon Riggs wrote: I think we should remove the COPY optimisation because of this and definitely not extend INSERT SELECT to perform it automatically. It can be very helpful when loading a lot of data, so I'm not in favor of removing it altogether. Maybe WAL-log the first 1 rows or such normally, and skip WAL after that. Of course, loading 10001 rows becomes the worst case then, but something along those lines... why 1 rows? maybe the right solution is move towards make a normal table unlogged and viceversa... probably that's harder to do but we will have better control and less odd heuristics -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
Heikki Linnakangas wrote: On 06.08.2011 13:13, Simon Riggs wrote: I think we should remove the COPY optimisation because of this and definitely not extend INSERT SELECT to perform it automatically. It can be very helpful when loading a lot of data, so I'm not in favor of removing it altogether. Yeah, it can currently help a lot. Of course, if WAL-logging could in any way facilitate hint bit and frozen xmin setting during bulk loads, I'm sure the WAL-logged version would win easily. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
Jaime Casanova ja...@2ndquadrant.com writes: On Sat, Aug 6, 2011 at 11:05 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It can be very helpful when loading a lot of data, so I'm not in favor of removing it altogether. Maybe WAL-log the first 1 rows or such normally, and skip WAL after that. Of course, loading 10001 rows becomes the worst case then, but something along those lines... why 1 rows? Yeah; any particular number is wrong. Perhaps it'd be better to put the behavior under user control. In the case of COPY, where we already have a place to stick random options, you could imagine writing something like COPY ... WITH (bulk) to cue the system that a lot of data is coming in. But I don't see any nice way to do something similar for INSERT/SELECT. I hesitate to suggest a GUC, but something like SET bulk_load = on would be pretty straightforward to use in pg_dump for instance. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Module extension for parsing and rewriting functions with infixed syntax
Hello, I am working on a PostgreSQL extension module which defines new grammar rules completing the classical SQL syntax defined in the src/backend/parser/gram.y file. Basically, I want to handle predicates having an infixed syntax { X IS Y } and rewrite them as classical Boolean functions with prefixed syntax { Y(X) }. For instance, the following query : SELECT * FROM cars WHERE cars.color IS yellow; would be rewritten into : SELECT * FROM cars WHERE yellow(cars.color); The new predicate could be rewritten as a plpgsql Boolean function with an unique argument (cars.color IS yellow -- yellow(cars.color)). I have then added the following rule to the func_expr definition (see gram.y:10280 in postgresql-9.1beta3 source code) : func_expr: ... | func_arg_expr IS func_name over_clause { FuncCall *n = makeNode(FuncCall); n-funcname = $3; n-args = list_make1($1); n-agg_order = NIL; n-agg_star = FALSE; n-agg_distinct = FALSE; n-func_variadic = FALSE; n-over = $4; n-location = @1; $$ = (Node *)n; } ... However, my first attempt leads to the following errors : /usr/bin/bison -d -o gram.c gram.y gram.y: conflicts: 84 shift/reduce, 807 reduce/reduce gram.y: expected 0 shift/reduce conflicts gram.y: expected 0 reduce/reduce conflicts How can I avoid this kind of errors without changing the entire grammar? In addition, I would rather making this new functionality independent of the original PostgreSQL source code. Ideally, the new defined bison rules would be defined in an autonomous module extension. I have seen that some contrib modules (such as SEG or CUBE) define separate bison grammar rules. However, I don't understand yet how such rules integrate with the gram.y file without any conflicts. Can I define my new bison rules separately of the gram.y file? Can I use the new functionality dynamically after loading an extension module (LOAD 'MY_EXTENSION';)? I am new in the PostgreSQL community and any ideas for solving these problems would be very helpful. Thanks by advance, Thomas Girault -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Module extension for parsing and rewriting functions with infixed syntax
Thomas Girault toma.gira...@gmail.com writes: func_expr: ... | func_arg_expr IS func_name over_clause However, my first attempt leads to the following errors : /usr/bin/bison -d -o gram.c gram.y gram.y: conflicts: 84 shift/reduce, 807 reduce/reduce gram.y: expected 0 shift/reduce conflicts gram.y: expected 0 reduce/reduce conflicts Most likely, you need to think about how to distinguish this case from IS NULL, IS TRUE, and the other things that can follow IS already. I don't think all those words are considered fully reserved today, but they'd have to be (or at least not be allowed as func_names) to make this non-ambiguous. Looking at bison -v output will help you figure out for sure what it thinks is ambiguous. In addition, I would rather making this new functionality independent of the original PostgreSQL source code. Ideally, the new defined bison rules would be defined in an autonomous module extension. Bison doesn't support that; it needs to see the entire grammar at once. (Many years ago I worked with systems at HP that did have run-time- extensible parsers. They were a pain: not all that flexible, and it was easy to get grammar bugs wherein things parsed some other way than you expected. To me the main value of bison is that it tells you about it when you wrote something ambiguous; and for that, it really has to see all the rules not just some of them.) I have seen that some contrib modules (such as SEG or CUBE) define separate bison grammar rules. Those grammars have nothing to do with parsing SQL; they're just used to parse literal values of those data types. Can I define my new bison rules separately of the gram.y file? Can I use the new functionality dynamically after loading an extension module (LOAD 'MY_EXTENSION';)? No, and no. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mosbench revisited
On Wed, Aug 3, 2011 at 11:21 AM, Robert Haas robertmh...@gmail.com wrote: About nine months ago, we had a discussion of some benchmarking that was done by the mosbench folks at MIT: http://archives.postgresql.org/pgsql-hackers/2010-10/msg00160.php Although the authors used PostgreSQL as a test harness for driving load, it's pretty clear from reading the paper that their primary goal was to stress the Linux kernel, so the applicability of the paper to real-world PostgreSQL performance improvement is less than it might be. Still, having now actually investigated in some detail many of the same performance issues that they were struggling with, I have a much clearer understanding of what's really going on here. In PostgreSQL terms, here are the bottlenecks they ran into: 1. We configure PostgreSQL to use a 2 Gbyte application-level cache because PostgreSQL protects its free-list with a single lock and thus scales poorly with smaller caches. This is a complaint about BufFreeList lock which, in fact, I've seen as a huge point of contention on some workloads. In fact, on read-only workloads, with my lazy vxid lock patch applied, this is, I believe, the only remaining unpartitioned LWLock that is ever taken in exclusive mode; or at least the only one that's taken anywhere near often enough to matter. I think we're going to do something about this, although I don't have a specific idea in mind at the moment. I was going to ask if you if had done any benchmarks with scale such that the tables fit in RAM but not in shared_buffers. I guess you have. The attached experimental patch fixed freelist contention on 8 cores. It would be nice to see what happens above that. It has been cherry picked up to HEAD, but not tested against it. (Last tested in Dec 2010, my how time flies) The approach is to move the important things from a LWLock to a spinlock, and to not do any locking for increments to clock-hand increment and numBufferAllocs. That means that some buffers might occasionally get inspected twice and some might not get inspected at all during any given clock cycle, but this should not lead to any correctness problems. (Disclosure: Tom didn't like this approach when it was last discussed.) I just offer this for whatever it is worth to you--I'm not proposing it as an actual patch to be applied. When data fits in RAM but not shared_buffers, maybe the easiest fix is to increase shared_buffers. Which brings up the other question I had for you about your work with Nate's celebrated loaner machine. Have you tried to reproduce the performance problems that have been reported (but without public disclosure of how to reproduce) with shared_buffers 8GB on machines with RAM 8GB ? Cheers, Jeff diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index bf9903b..304135e 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -28,7 +28,8 @@ typedef struct int nextVictimBuffer; int firstFreeBuffer; /* Head of list of unused buffers */ - int lastFreeBuffer; /* Tail of list of unused buffers */ + int lastFreeBuffer; /* Tail of list of unused buffers */ + slock_t mutex; /* Protects all of these, except sometimes not nextVictimBuffer */ /* * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is, @@ -44,7 +45,7 @@ typedef struct } BufferStrategyControl; /* Pointers to shared state */ -static BufferStrategyControl *StrategyControl = NULL; +static volatile BufferStrategyControl *StrategyControl = NULL; /* * Private (non-shared) state for managing a ring of shared buffers to re-use. @@ -123,15 +124,13 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } } - /* Nope, so lock the freelist */ - *lock_held = true; - LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); - /* * We count buffer allocation requests so that the bgwriter can estimate * the rate of buffer consumption. Note that buffers recycled by a * strategy object are intentionally not counted here. + * This is now done without a lock, and so updates can be lost */ + StrategyControl-numBufferAllocs++; /* @@ -142,6 +141,12 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) */ while (StrategyControl-firstFreeBuffer = 0) { + SpinLockAcquire(StrategyControl-mutex); + if (StrategyControl-firstFreeBuffer0) + { + SpinLockRelease(StrategyControl-mutex); + break; + } buf = BufferDescriptors[StrategyControl-firstFreeBuffer]; Assert(buf-freeNext != FREENEXT_NOT_IN_LIST); @@ -149,6 +154,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) StrategyControl-firstFreeBuffer = buf-freeNext; buf-freeNext = FREENEXT_NOT_IN_LIST; + SpinLockRelease(StrategyControl-mutex); + /* * If the buffer is pinned or has a nonzero usage_count, we cannot use * it; discard it and retry. (This can only happen if VACUUM put
Re: [HACKERS] Reduce WAL logging of INSERT SELECT
On Fri, 2011-08-05 at 23:16 -0400, Bruce Momjian wrote: Well, if the table is created in the same transaction (which is the only case under consideration), no other sessions can write to the table so you are just writing the entire table on commit, rather than to the WAL. The transaction can still write to many tables that way, and that could mean many fsyncs. Also, there may be a bunch of other transactions trying to write to the WAL that have to wait as your transaction has to seek out to fsync the data file and then seek back to the WAL. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mosbench revisited
On Wed, Aug 3, 2011 at 3:21 PM, Jim Nasby j...@nasby.net wrote: On Aug 3, 2011, at 1:21 PM, Robert Haas wrote: 1. We configure PostgreSQL to use a 2 Gbyte application-level cache because PostgreSQL protects its free-list with a single lock and thus scales poorly with smaller caches. This is a complaint about BufFreeList lock which, in fact, I've seen as a huge point of contention on some workloads. In fact, on read-only workloads, with my lazy vxid lock patch applied, this is, I believe, the only remaining unpartitioned LWLock that is ever taken in exclusive mode; or at least the only one that's taken anywhere near often enough to matter. I think we're going to do something about this, although I don't have a specific idea in mind at the moment. This has been discussed before: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01406.php (which itself references 2 other threads). The basic idea is: have a background process that proactively moves buffers onto the free list so that backends should normally never have to run the clock sweep (which is rather expensive). The challenge there is figuring out how to get stuff onto the free list with minimal locking impact. I think one possible option would be to put the freelist under it's own lock (IIRC we currently use it to protect the clock sweep as well). Of course, that still means the free list lock could be a point of contention, but presumably it's far faster to add or remove something from the list than it is to run the clock sweep. Hi Jim, My experiments have shown that the freelist proper is not substantially faster than the freelist clocksweep--and that is even under the assumption that putting things back into the freelist is absolutely free. Under all the workload's I've been able to contrive, other than ones contrived by actually hacking the code itself to make it pathological, the average number of buffers inspected per run of the clock sweep is 2.5. Under contention, the mere act of acquiring a lock is more traumatic than the actual work carried out under the lock. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mosbench revisited
Robert Haas robertmh...@gmail.com writes: It would be nice if the Linux guys would fix this problem for us, but I'm not sure whether they will. For those who may be curious, the problem is in generic_file_llseek() in fs/read-write.c. On a platform with 8-byte atomic reads, it seems like it ought to be very possible to read inode-i_size without taking a spinlock. A little Googling around suggests that some patches along these lines have been proposed and - for reasons that I don't fully understand - rejected. That now seems unfortunate. Barring a kernel-level fix, we could try to implement our own cache to work around this problem. However, any such cache would need to be darn cheap to check and update (since we can't assume that relation extension is an infrequent event) and must somehow having the same sort of mutex contention that's killing the kernel in this workload. What about making the relation extension much less frequent? It's been talked about before here, that instead of extending 8kB at a time we could (should) extend by much larger chunks. I would go as far as preallocating the whole next segment (1GB) (in the background) as soon as the current is more than half full, or such a policy. Then you have the problem that you can't really use lseek() anymore to guess'timate a relation size, but Tom said in this thread that the planner certainly doesn't need something that accurate. Maybe the reltuples would do? If not, it could be that some adapting of its accuracy could be done? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transient plans versus the SPI API
Tom Lane t...@sss.pgh.pa.us writes: I think we'll be a lot better off with the framework discussed last year: build a generic plan, as well as custom plans for the first few sets of parameter values, and then observe whether there's a significant reduction in estimated costs for the custom plans. Another way here would be to cache more than a single plan and to keep execution time samples or some other relevant runtime characteristics. Then what we need would be a way to switch from a plan to another at run time on some conditions, like realizing that the reason why the planner thought a nestloop would be perfect is obviously wrong, or maybe just based on runtime characteristics. But in any case, it's way premature to be debating this until we have the infrastructure in which we can experiment with different policies. That too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https
On Fri, Aug 5, 2011 at 08:53, Andrew Dunstan and...@dunslane.net wrote: On 08/04/2011 11:23 PM, Alex Hunsaker wrote: [ ... don't let people set signal handlers postgres sets ] This whole thing is a massive over-reaction to a problem we almost certainly know how to fix fairly simply and relatively painlessly, and attempts (unsuccessfully, at least insofar as comprehensiveness is concerned) to fix a problem nobody's actually reported having AFAIK. *shrug* OK. Find attached a version that does the equivalent of local %SIG for each pl/perl(u) call. I was only able to test on 5.14.1, but I looked at 5.8.9 to make sure it looks like it works. Its a tad slower (something like 0.00037ms per call), but uhh thats quite acceptable IMHO (best of 5 runs): = create or replace function simple() returns void as $$ $$ language plperl; CREATE FUNCTION -- pre patch = select count(simple()) from generate_series(1, 1000); Time: 10219.149 ms -- patched = select count(simple()) from generate_series(1, 1000); Time: 13924.025 ms Thoughts? *** a/src/pl/plperl/expected/plperl.out --- b/src/pl/plperl/expected/plperl.out *** *** 639,641 CONTEXT: PL/Perl anonymous code block --- 639,643 DO $do$ use warnings FATAL = qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl; ERROR: Useless use of sort in scalar context at line 1. CONTEXT: PL/Perl anonymous code block + DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $SIG{'ALARM'} = sub { print alarm!\n}; $do$ LANGUAGE plperl; + DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $do$ LANGUAGE plperl; *** a/src/pl/plperl/plperl.c --- b/src/pl/plperl/plperl.c *** *** 1895,1906 plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo) --- 1895,1912 { dSP; SV *retval; + GV *gv; int i; int count; ENTER; SAVETMPS; + gv = gv_fetchpv(SIG, 0, SVt_PVHV); + if (!gv) + elog(ERROR, couldn't fetch %%SIG); + save_hash(gv); /* local %SIG */ + PUSHMARK(SP); EXTEND(sp, desc-nargs); *** *** 1976,1981 plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo, --- 1982,1988 dSP; SV *retval, *TDsv; + GV *gv; int i, count; Trigger*tg_trigger = ((TriggerData *) fcinfo-context)-tg_trigger; *** *** 1986,1995 plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo, TDsv = get_sv(_TD, 0); if (!TDsv) elog(ERROR, couldn't fetch $_TD); - save_item(TDsv); /* local $_TD */ sv_setsv(TDsv, td); PUSHMARK(sp); EXTEND(sp, tg_trigger-tgnargs); --- 1993,2006 TDsv = get_sv(_TD, 0); if (!TDsv) elog(ERROR, couldn't fetch $_TD); save_item(TDsv); /* local $_TD */ sv_setsv(TDsv, td); + gv = gv_fetchpv(SIG, 0, SVt_PVHV); + if (!gv) + elog(ERROR, couldn't fetch %%SIG); + save_hash(gv); /* local %SIG */ + PUSHMARK(sp); EXTEND(sp, tg_trigger-tgnargs); *** a/src/pl/plperl/sql/plperl.sql --- b/src/pl/plperl/sql/plperl.sql *** *** 415,417 DO $do$ use strict; my $name = foo; my $ref = $$name; $do$ LANGUAGE plperl; --- 415,420 -- check that we can use warnings (in this case to turn a warn into an error) -- yields ERROR: Useless use of sort in scalar context. DO $do$ use warnings FATAL = qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl; + + DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $SIG{'ALARM'} = sub { print alarm!\n}; $do$ LANGUAGE plperl; + DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $do$ LANGUAGE plperl; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further news on Clang - spurious warnings
Peter Geoghegan pe...@2ndquadrant.com writes: On 5 August 2011 20:07, Tom Lane t...@sss.pgh.pa.us wrote: If I were trying to get rid of this warning, I'd be wondering why ReplNodeTag is a distinct enum in the first place. Indeed, that doesn't seem to be justified anywhere, and seems to be a violation of the abstraction of Node as a pseudo base class which would have broken any downcasting that we might have attempted to do. Since ReplNodeTag wasn't being used directly, just its enumeration constants, simply moving the constants to the global, generic NodeTag enum fixes the issue. That is what the attached patch does. I did this plus moving replnodes.h into a saner spot: if they're full-fledged Nodes, they ought to be defined in src/include/nodes/. I did not renumber the existing node tags as your patch suggests. We might want to do that at some point to make a bit more breathing room, but I think it's too late in the 9.1 cycle to be doing that in 9.1. People have probably already built third-party code that incorporates values like T_ReturnSetInfo. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mosbench revisited
Jeff Janes jeff.ja...@gmail.com writes: My experiments have shown that the freelist proper is not substantially faster than the freelist clocksweep--and that is even under the assumption that putting things back into the freelist is absolutely free. The freelist isn't there to make buffer allocation faster, though; it's there for allocation efficiency. The point is that when some buffers have become completely useless (eg, because we dropped the table they were for), they'll be recycled in preference to reclaiming buffers that contain still-possibly-useful data. It would certainly be simple to get rid of the freelist and only recycle dead buffers when the clock sweep reaches them, but I think we'd be paying for that in extra, unnecessary I/O. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transient plans versus the SPI API
Jeff Davis pg...@j-davis.com writes: A control knob sounds limited. For instance, what if the application knows that some parameters will be constant over the time that the plan is saved? It would be nice to be able to bind some parameters to come up with a generic (but less generic) plan, and then execute it many times. Right now that can only be done by inlining such constants in the SQL, which is what we want to avoid. +1 I was already thinking in those term at the application level for the example I've been using before in this thread, and only reading your mail I realize that maybe the backend should be able to do that itself. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
On Tue, Jul 26, 2011 at 12:18 PM, Timothy D. F. Lewis elatl...@gmail.com wrote: I'm not sure what David did for me so as per Roberts suggestion I have added this patch to the commit fest. I'm hoping I have not put this patch in more than one workflow. Hi Tim, I would be willing to review this patch for the next CommitFest. I'd like to request that you send an updated version of your patch *as an attachment* to avoid the problems with long lines getting automatically wrapped, as Alvaro mentioned. I had trouble getting the existing patches to apply. A few preliminary comments about the patch: 1. It wasn't clear to me whether you're OK with Aron's suggested tweak, please include it in your patch if so. 2. I think it might be better to use INT_MAX instead of hardcoding 2147483647. 3. Your patch has some minor code style differences wrt. the existing code, e.g. +if(param-transaction_limit!=0 deleted=param-transaction_limit) should have a space before the first '(' and around the '!=' and '=' 4. The rest of the usage strings spell out 'large object(s)' instead of abbreviating 'LO' +printf( -l LIMIT stop after removing LIMIT LOs\n); 5. transaction_limit is an int, yet you're using strtol() which returns long. Maybe you want to use atoi() or make transaction_limit a long? Thanks Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
Hi Josh, Thanks for help. Attached is a patch including changes suggested in your comments. Excerpts from Josh's message On Sat, Aug 6, 2011 at 9:57 PM: 1. It wasn't clear to me whether you're OK with Aron's suggested tweak, please include it in your patch if so. I decided to and included Aron's tweak. I'm not sure if the PostgreSQL project prefers simplified code over faster run time (if only under rare conditions). Who knows maybe the compiler will re-optimize it anyway. 2. I think it might be better to use INT_MAX instead of hardcoding 2147483647. Fixed, though I'm not sure if I put the include in the preferred order. 3. ... should have a space before the first '(' and around the '!=' and '=' Fixed. 4. The rest of the usage strings spell out 'large object(s)' instead of abbreviating 'LO'... Fixed, I omitted the brackets around the s to conform with the other usage strings. 5. transaction_limit is an int, yet you're using strtol() which returns long. Maybe you want to use atoi() or make transaction_limit a long? The other INT in the code is using strtol() so I also used strtol instead of changing more code. I'm not sure if there are any advantages or disadvantages. maybe it's easier to prevent/detect/report overflow wraparound. I can't think of a good reason anyone would want to limit transaction to something more than INT_MAX. Implementing that would best be done in separate and large patch as PQntuples returns an int and is used on 271 lines across PostgreSQL. Thanks again for the help. 2147483647 vacuumlo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers