Re: [HACKERS] About method of PostgreSQL's Optimizer
Pryscila, For research reference, you may want to look at the work done on the Columbia Query Optimization Framework. As I recall, I think it (or its predecessors) had both cost and rule-based optimization. If you need the code to it, I can dig it up on one of my old systems. Albeit dated, another good reference for optimizer implementation is the cascades query optimization framework. On 9/15/05, Jonah H. Harris <[EMAIL PROTECTED]> wrote: Tom, I agree. There have been several occasions where GEQO has performed poorly for me. I'll search the archives for the past discussions. sorry for sending this to you twice Tom... forgot to hit reply all :(On 9/14/05, Tom Lane < [EMAIL PROTECTED] > wrote:"Jonah H. Harris" < [EMAIL PROTECTED] > writes:> As for using both in the same optimizer, I could only see an algorithm such> as a customized-A* being used to planning *some* large queries. The reason I> say this is because the cost calculation, which would still need to be > breadth-first, could calculate and cache the cost of most nodes thereby> allowing you to possibly perform transformations at the tail of calculation.We do already have two different plan search algorithms: the strict bottom-up dynamic programming approach (System R style) and the GEQOoptimizer, which we switch to when there are too many joins needed toallow exhaustive search. The GEQO code still depends on the normal plan cost estimation code, but it doesn't consider every possible plan. I've never been very happy with the GEQO code: the random component ofthe algorithm means you get unpredictable (and sometimes awful) plans,and the particular variant that we are using is really designed to solve traveling-salesman problems. It's at best a poor fit to the joinplanning problem.So it seems interesting to me to think about replacing GEQO with arule-based optimizer for large join search spaces. There are previous discussions about this in the archives, I believe.regards, tom lane-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporation http://www.enterprisedb.com/ -- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/
Re: [HACKERS] About method of PostgreSQL's Optimizer
Tom, I agree. There have been several occasions where GEQO has performed poorly for me. I'll search the archives for the past discussions. sorry for sending this to you twice Tom... forgot to hit reply all :(On 9/14/05, Tom Lane <[EMAIL PROTECTED] > wrote:"Jonah H. Harris" <[EMAIL PROTECTED] > writes:> As for using both in the same optimizer, I could only see an algorithm such> as a customized-A* being used to planning *some* large queries. The reason I> say this is because the cost calculation, which would still need to be > breadth-first, could calculate and cache the cost of most nodes thereby> allowing you to possibly perform transformations at the tail of calculation.We do already have two different plan search algorithms: the strict bottom-up dynamic programming approach (System R style) and the GEQOoptimizer, which we switch to when there are too many joins needed toallow exhaustive search. The GEQO code still depends on the normalplan cost estimation code, but it doesn't consider every possible plan. I've never been very happy with the GEQO code: the random component ofthe algorithm means you get unpredictable (and sometimes awful) plans,and the particular variant that we are using is really designed to solve traveling-salesman problems. It's at best a poor fit to the joinplanning problem.So it seems interesting to me to think about replacing GEQO with arule-based optimizer for large join search spaces. There are previous discussions about this in the archives, I believe.regards, tom lane-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/
Re: [HACKERS] Per-table freeze limit proposal
Alvaro Herrera <[EMAIL PROTECTED]> writes: > In fact this seems pretty easy to do. Add a field to pg_class, tell > VACUUM to update it using the determined freezeLimit, and that's it. I think that it'd be worth fixing things so that the recorded value is not the freeze cutoff value (as now), but the actual lowest not-frozen XID present anywhere in the table. The present code does not do that because it's painful to track across multiple tables, but on a per-table basis it seems easy. In particular this rule allows you to set a sane value for the pg_class field when the table is created (ie, current transaction's XMIN, rather than a billion less). > (Note that if we ever implement partial vacuum, it won't be able to > update the freeze point. But that was true before anyway.) Sure. > We also need to teach autovacuum to update pg_database.datfreezexid, > using the minimum from pg_class. No, no, no. autovacuum is not a required part of the system and it's not going to become so any time soon. Updating the pg_database entry will have to be the responsibility of VACUUM itself. It's not that terrible: you don't have to scan pg_class unless you see that the pg_class.relfreezexid value you are replacing is equal to pg_database.datfreezexid, and with the exact computation suggested above, that won't be a common occurrence. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] inverse OR distributive law?
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > * process_duplicate_ors > * Given a list of exprs which are ORed together, try to apply > * the inverse OR distributive law. > Anybody enlighten what "inverse OR distributive law" is? Well, it's defined right above that: * The following code attempts to apply the inverse OR distributive law: * ((A AND B) OR (A AND C)) => (A AND (B OR C)) * That is, locate OR clauses in which every subclause contains an * identical term, and pull out the duplicated terms. I'm not sure that "inverse OR distributive law" is standard terminology, but I believe the implication in the other direction is usually called the "OR distributive law". Anyone know of better terminology? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Constraint Type Coercion issue?
Martijn van Oosterhout writes: > Well yes, but given the number of possible locales, creating one class > for each seems excessive. And each class would have to create 5 > operators (with underlying functions) and 1 comparitor function. Unless > you could shortcut something like: > CREATE OPERATOR CLASS ... >OPERATOR 1 <(text,text,'en_US') > ... >FUNCTION 1 mycompare(text,text,'en_US') > ... >COLLATE en_us; The thing that's still fairly unclear to me is whether the collation information is attached to the operators/functions or to the data. I recall there's been some discussion of sticking collation IDs into individual text Datums, which is a completely different path than what you are positing above. Does the SQL spec mandate one or the other of these approaches? If it does, do we want to believe it? (The more I read of SQL2003, the less impressed I get...) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Per-table freeze limit proposal
Hackers, As you've probably heard too many times already, I'm thinking in improving vacuum, so we can keep track of the freeze Xid on a table level, rather than database level. Hopefully this will eliminate the need for database-wide vacuums. In fact this seems pretty easy to do. Add a field to pg_class, tell VACUUM to update it using the determined freezeLimit, and that's it. (Note that if we ever implement partial vacuum, it won't be able to update the freeze point. But that was true before anyway.) We also need to teach autovacuum to update pg_database.datfreezexid, using the minimum from pg_class. (I don't think it's a good idea to seqscan pg_class to find out the minimum on each VACUUM call.) So, an autovacuum iteration would issue all needed VACUUM/ANALYZE calls, then get the minimum freezexid from pg_class to update pg_database. This way, GetNewTransactionId can continue checking pg_database.datfreezexid as the hard limit for issuing warnings for Xid wraparound. Does anyone see a need for anything other than the autovacuum process to be updating pg_database.datfreezexid? Of course, if autovacuum is not in use, things would continue as now, that is, manual database-wide VACUUM calls updating pg_database.datfreezexid. But note that you can mark all tables as disabled on pg_autovacuum, issue your manuals VACUUM calls as needed (from cron or whatever), and use autovacuum to set pg_database.datfreezexid -- so autovacuum would in fact do nothing except set the freeze limit. The problem is, this seems so awfully simple that I fear I am missing something ... Otherwise, does this sound like a plan? -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com The easiest way to resolve [trivial code guidelines disputes] is to fire one or both of the people involved. (Damian Conway) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] inverse OR distributive law?
To find out about boolean logic, take a look here: http://www.laynetworks.com/Boolean%20Algebra.htm Where I work, we took the SIS toolkit from Berkeley and did a simplification of the where clause as if it was a Boolean integrated circuit. Of course, you may get answers that you do not expect if the data has NULL values, so you can turn that simplification option off also. After Boolean simplification, this: SELECT Products.RECORD_NUMBER , Products.PRODUCTID , Products.PRODUCTNAME , Products.PRODUCTPRICE , Products.PRODUCTKEYWORDS , Products.PRODUCTGROUPID FROM Products WHERE ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) AND ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTGROUPID < 10 ) ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTGROUPID < 10 ) ) AND ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTGROUPID < 10 ) ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTGROUPID < 10 ) ) OR ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTGROUPID < 10 ) ) AND ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTGROUPID < 10 ) ) OR ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) AND ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) OR ( NOT ( Products.PRODUCTPRICE > 14 ) AND NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( NOT ( Products.PRODUCTPRICE > 14 ) AND NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) OR ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) AND ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTPRICE > 14 ) AND ( ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' AND ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' OR ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' OR ( Products.PRODUCTPRICE > 14 ) AND NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' AND ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' OR ( Products.PRODUCTPRICE > 14 ) AND NOT ( Products.PRODUCTKEYWORDS ) Like '%coffee%' OR ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTPRICE > 14 ) AND 0 AND ( Products.PRODUCTPR
[HACKERS] inverse OR distributive law?
Hi, I have been looking around optimizer's code and found a comment: /* * process_duplicate_ors *Given a list of exprs which are ORed together, try to apply *the inverse OR distributive law. Anybody enlighten what "inverse OR distributive law" is? -- SRA OSS, Inc. Japan Tatsuo Ishii ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Bug with cursor declaration in PL/pgSQL in CVS tip?
I have used to declare cursors in the DECLARE section of a PL/pgSQL function. The example here seems to be broken in CVS tip: CREATE FUNCTION test () RETURNS void AS ' DECLARE credit_cursor CURSOR (p_account integer, p_reference integer) FOR SELECT * FROM booking WHERE account_id=p_account AND reference=p_reference AND unassigned_amount = amount AND amount > 0 AND side=''credit'' AND position_closed AND type NOT IN (''RC'', ''RP'') ORDER BY journal_id ASC; BEGIN END ' LANGUAGE PLPGSQL; I get: ERROR: syntax error at or near "," at character 237 LINE 9: credit_cursor CURSOR (p_account integer, p_reference integ... The same function works perfectly well in 7.4.8 and 8.0.3. A bug? Best Regards, Michael Paesold ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] parameterized fetch
Merlin Moncure wrote: > I've noticed that trying to parameterize a fetch statement via > ExecParams returns a syntax error: > > fetch $1 from my_cursor; > > This is not really a big deal, but maybe it should be documented which > statements can be parameterized and which can't Currently the documentation is "the backend's grammar". You can only put parameters where there is a PARAM node, which currently means "anywhere you can put a c_expr". So if you can replace something with an expression, you can probably also replace it with a parameter. -O ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Constraint Type Coercion issue?
On Wed, Sep 14, 2005 at 05:28:42PM -0400, Tom Lane wrote: > > To some extent, collate > > implies a sort of parameterised operator class... > > Hmm. But an index couldn't support more than one collation order > AFAICS. It'd probably make more sense to create operators and an > operator class for each collation you want to support; the mapping > to a call of a common support function would be embedded inside the > operator definition. Otherwise we have to pass around an additional > parameter through an awful lot of places... Well yes, but given the number of possible locales, creating one class for each seems excessive. And each class would have to create 5 operators (with underlying functions) and 1 comparitor function. Unless you could shortcut something like: CREATE OPERATOR CLASS ... OPERATOR 1 <(text,text,'en_US') ... FUNCTION 1 mycompare(text,text,'en_US') ... COLLATE en_us; Otherwise you end up with lots of functions which have be created on the fly as the user decides what collate orders he wants. Invoking SQL functions in the btree index create cycle doesn't seem efficient. You would have to come up with new names for the operators each time because the argument types are going to be the same. Although I guess you could call it OPERATOR( http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpMc8fzZLktq.pgp Description: PGP signature
Re: [HACKERS] Constraint Type Coercion issue?
Martijn van Oosterhout writes: > How much discussion has there been on this? None yet; I had a few half-baked ideas but nothing worth presenting to the list. > To some extent, collate > implies a sort of parameterised operator class... Hmm. But an index couldn't support more than one collation order AFAICS. It'd probably make more sense to create operators and an operator class for each collation you want to support; the mapping to a call of a common support function would be embedded inside the operator definition. Otherwise we have to pass around an additional parameter through an awful lot of places... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Constraint Type Coercion issue?
On Wed, Sep 14, 2005 at 02:23:29PM -0400, Tom Lane wrote: > I've been thinking about this off and on, and would like to solve it > in the 8.2 time frame, but it's not happening for 8.1. At a minimum > it'll require some significant changes in our concept of what an > operator class is. The half-jelled ideas I have involve inventing [snip] How much discussion has there been on this? I've been working my way through COLLATE support and indexes and realised that what I really want is to allow the comparison functions in operator classes to be three argument functions. The two things to compare and the collate order. A descending index is really just another collate order, albeit one easily imposed from the outside. Although numbers tend not to have many interesting collate orders, complex numbers do, as do obviously strings. To some extent, collate implies a sort of parameterised operator class... Definitly 8.2 stuff, and it's not simple stuff either... -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpVP4Exb2Tl2.pgp Description: PGP signature
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom Lane wrote: I wrote: We could ameliorate this if there were a way to acquire ownership of the cache line without necessarily winning the spinlock. I'm imagining that we insert a "dummy" locked instruction just ahead of the xchgb, which touches the spinlock in such a way as to not change its state. I tried this, using this tas code: ... I have tried it on the P4 w/ HT. CVS tip 1: 37s 2: 78s 4: 159s 8: 324 only slock-no-cmpb 1: 37s 2: 82s 4: 178s 8: 362 only dummy-locking 1: 44s 2: 96s 4: 197s ... I guess there is no reason to try any more. Best Regards, Michael Paesold ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Constraint Type Coercion issue?
Josh Berkus writes: > So, is this a real bug in constraints or does the problem lie somewhere > else? Is it fixable? Not readily. The problem is here: * We must find a btree opclass that contains both operators, else the * implication can't be determined. Also, the pred_op has to be of * default subtype (implying left and right input datatypes are the * same); otherwise it's unsafe to put the pred_const on the left side * of the test. Also, the opclass must contain a suitable test * operator matching the clause_const's type (which we take to mean * that it has the same subtype as the original clause_operator). The predtest code depends on btree operator classes to tell it the semantics of comparisons, and the operator class infrastructure just doesn't support making cross-type inferences very well. Given, say, int8var <= int4const we'd like to determine whether int8var <= otherint4const is implied (or refuted), but to do this we need to compare the two int4 constants, for which we need the int4 vs int4 comparison operator, which has no relationship whatever to the int8 operator class in which we find the int8 <= int4 operators that are present in the clauses. There are some related cases in btree index search startup that would be nice to fix too. I've been thinking about this off and on, and would like to solve it in the 8.2 time frame, but it's not happening for 8.1. At a minimum it'll require some significant changes in our concept of what an operator class is. The half-jelled ideas I have involve inventing families of operator classes, so that we could for instance represent the idea that "int2_ops, int4_ops, and int8_ops are all compatible, and you can get consistent results from any of these operators". There are some related problems involving mergejoin and the ability to deal with reverse-sort indexes that also need to be dealt with, and seem to require extensions to the operator class concept. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] About method of PostgreSQL's Optimizer
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > As for using both in the same optimizer, I could only see an algorithm such > as a customized-A* being used to planning *some* large queries. The reason I > say this is because the cost calculation, which would still need to be > breadth-first, could calculate and cache the cost of most nodes thereby > allowing you to possibly perform transformations at the tail of calculation. We do already have two different plan search algorithms: the strict bottom-up dynamic programming approach (System R style) and the GEQO optimizer, which we switch to when there are too many joins needed to allow exhaustive search. The GEQO code still depends on the normal plan cost estimation code, but it doesn't consider every possible plan. I've never been very happy with the GEQO code: the random component of the algorithm means you get unpredictable (and sometimes awful) plans, and the particular variant that we are using is really designed to solve traveling-salesman problems. It's at best a poor fit to the join planning problem. So it seems interesting to me to think about replacing GEQO with a rule-based optimizer for large join search spaces. There are previous discussions about this in the archives, I believe. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Constraint Type Coercion issue?
Folks, Bob Ippolito found this while testing Bizgres. It *seems* like our smarter type coercion rules do not apply when constraints are being generated. That is, the types of constants in constraints, if not coerced, still default to the old "dumb" casting where the type of the comparing column isn't checked. This is visible if you run a simple test on constraint exclusion: CE not used set constraint_exclusion=on; create table a ( a bigint, b text ); create table a1 () inherits (a); create table a2 () inherits (a); create table a3 () inherits (a); alter table a1 add constraint a1_a check ( a between 1 and 3); alter table a2 add constraint a2_a check ( a between 4 and 6); alter table a3 add constraint a3_a check ( a between 7 and 9); insert into a1 values ( 1, 'B' ); insert into a2 values ( 5, 'F' ); insert into a3 values ( 8, 'G' ); explain analyze select * from a where a.a between 5 and 6; CE used - create table a ( a bigint, b text ); create table a1 () inherits (a); create table a2 () inherits (a); create table a3 () inherits (a); alter table a1 add constraint a1_a check ( a between 1::BIGINT and 3::BIGINT); alter table a2 add constraint a2_a check ( a between 4::BIGINT and 6::BIGINT); alter table a3 add constraint a3_a check ( a between 7::BIGINT and 9::BIGINT); insert into a1 values ( 1, 'B' ); insert into a2 values ( 5, 'F' ); insert into a3 values ( 8, 'G' ); explain analyze select * from a where a.a between 5 and 6; So, is this a real bug in constraints or does the problem lie somewhere else? Is it fixable? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
I wrote: > Another thought came to mind: maybe the current data layout for LWLocks > is bad. Right now, the spinlock that protects each LWLock data struct > is itself part of the struct, and since the structs aren't large (circa > 20 bytes), the whole thing is usually all in the same cache line. ... > Maybe it'd be better to allocate the spinlocks off by themselves. Well, this is odd. I made up a patch to do this (attached) and found that it pretty much sucks. Still the 4-way Opteron, previous best (slock-no-cmpb and spin-delay-2): 1 31s 2 42s 4 51s 8 100s with lwlock-separate added: 1 31s 2 52s 4 106s 8 213s What I had expected to see was a penalty in the single-thread case (due to instructions added to the LWLock functions), but there isn't any. I did not expect to see a factor-of-2 penalty for four threads. I guess what this means is that there's no real problem with losing the cache line while manipulating the LWLock, which is what the patch was intended to prevent. Instead, we're paying for swapping two cache lines (the spinlock and the LWLock) across processors instead of just one line. But that should at worst be a 2x inflation of the time previously spent in LWLockAcquire/Release, which is surely not yet all of the application ;-). Why the heck is this so bad? Should we expect that apparently minor changes in shared data structures might be costing equivalently huge penalties in SMP performance elsewhere? Unfortunately I don't have root on the Opteron and can't run oprofile. But I'd really like to see some oprofile stats from these two cases so we can figure out what in the world is going on here. Can anyone help? regards, tom lane *** src/backend/storage/lmgr/lwlock.c.orig Sat Aug 20 19:26:24 2005 --- src/backend/storage/lmgr/lwlock.c Wed Sep 14 11:50:09 2005 *** *** 27,35 #include "storage/spin.h" typedef struct LWLock { ! slock_t mutex; /* Protects LWLock and queue of PGPROCs */ boolreleaseOK; /* T if ok to release waiters */ charexclusive; /* # of exclusive holders (0 or 1) */ int shared; /* # of shared holders (0..MaxBackends) */ --- 27,44 #include "storage/spin.h" + /* + * Each LWLock has an associated spinlock, which we consider as locking the + * content of the LWLock struct as well as the associated queue of waiting + * PGPROCs. To minimize cache-line contention on multiprocessors, the + * spinlocks are kept physically separate from their LWLocks --- we don't + * want a spinning processor to interfere with access to the LWLock for + * the processor holding the spinlock. + */ + typedef struct LWLock { ! slock_t*mutex; /* Protects LWLock and queue of PGPROCs */ boolreleaseOK; /* T if ok to release waiters */ charexclusive; /* # of exclusive holders (0 or 1) */ int shared; /* # of shared holders (0..MaxBackends) */ *** *** 39,44 --- 48,65 } LWLock; /* + * We allocate CACHE_LINE_SIZE bytes for the fixed LWLocks' spinlocks. + * We assume that the fixed locks are few enough and heavily used enough + * that it's worth trying to put each one's spinlock in its own cache line. + * The dynamic locks are assumed to be many and less heavily hit, so they + * get just MAXALIGN space apiece. (Packing spinlocks tightly sounds like a + * bad idea on machines where slock_t is just a byte ... even for lesser-used + * spinlocks, there would be enough locks per cache line to create a + * contention issue.) + */ + #define CACHE_LINE_SIZE 64 + + /* * This points to the array of LWLocks in shared memory. Backends inherit * the pointer by fork from the postmaster. LWLockIds are indexes into * the array. *** *** 135,144 Sizesize; int numLocks = NumLWLocks(); ! /* Allocate the LWLocks plus space for shared allocation counter. */ size = mul_size(numLocks, sizeof(LWLock)); ! size = add_size(size, 2 * sizeof(int)); return size; } --- 156,174 Sizesize; int numLocks = NumLWLocks(); ! /* Space for the array of LWLock structs. */ size = mul_size(numLocks, sizeof(LWLock)); ! /* Space for dynamic allocation counter and MAXALIGN padding. */ ! size = add_size(size, 2 * sizeof(int) + MAXIMUM_ALIGNOF); ! ! /* Space for the spinlocks --- see notes for CACHE_LINE_SIZE. */ ! size = add_size(size, ! mul_size((int) NumFixedLWLocks, !Max(sizeof(slock_t), CACHE_LINE_SIZE)));
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
I wrote: > We could ameliorate this if there were a way to acquire ownership of the > cache line without necessarily winning the spinlock. I'm imagining > that we insert a "dummy" locked instruction just ahead of the xchgb, > which touches the spinlock in such a way as to not change its state. I tried this, using this tas code: static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1; register slock_t _dummy = 0; /* Use a locking test before trying to take the spinlock */ /* xchg implies a LOCK prefix, so no need to say LOCK for it */ __asm__ __volatile__( " lock\n" " xaddb %2,%1 \n" " xchgb %0,%1 \n" : "+q"(_res), "+m"(*lock), "+q"(_dummy) : : "memory", "cc"); return (int) _res; } At least on Opteron, it's a loser. The previous best results (with slock-no-cmpb and spin-delay patches) were 1 31s 2 42s 4 51s 8 100s and with this instead of slock-no-cmpb, 1 33s 2 45s 4 55s 8 104s The xadd may indeed be helping in terms of protecting the xchg from end-of-timeslice --- the rate of select() delays is really tiny, one every few seconds, which is better than I saw before. But the extra cost of the extra locked operation isn't getting repaid overall. Feel free to try it on other hardware, but it doesn't look promising. BTW, I also determined that on that 4-way Opteron box, the integer modulo idea doesn't make any difference --- that is, spin-delay and what Michael called spin-delay-2 are the same speed. I think I had tried the modulo before adding the variable spin delay, and it did help in that configuration; but most likely, it was just helping by stretching out the amount of time spent looping before entering the kernel. So we can drop that idea too. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] About method of PostgreSQL's Optimizer
Pryscila, Step 2 is basically where you find the difference between a cost-based optimizer (CBO) and a rule-based optimizer (RBO). A CBO is based on the computed execution cost of the query whereas an RBO uses more generalized heuristics. Let's get an example of what you're proposing and see if we can work it out from there. Say we have the following (this is a generalized CBO approach, not PostgreSQL specific): Oracle's SCOTT.EMP table with cardinality of 1 million and an index on empno and ename. For storage purposes say that the empno index takes up 3600 blocks, the ename index takes up 7800 blocks, and the table itself takes up 17000 blocks. We'll also say that we have a 256 megabyte buffer cache of which we have cached 50% of the empno index, 10% of the ename index, and 5% of the emp table data. A user then issues the following query: SELECT empno, ename FROM emp; A cost-based optimizer will see the following: 1. See that the query is a full table scan (FTS) and calculate the cost of retrieving all 17000 blocks from disk. 2. See that the query is a FTS and that it can retrieve all data from the indexes (11400 blocks) and join the data (which join algorithm?) Without performing a breadth-first algorithm, how can one evaluate both options in a way that would allow you to perform heuristic transformations dynamically? What transformation/heuristic/rule can you use? A CBO implementation has to calculate the amount of I/O needed on each plan based on several statistics such as what's *potentially* in the cache, what's the access time for block I/O (including prefetching if the storage manager has it), and other factors. If you could name a database that uses a best-first algorithm, such as A*, please send me the link to their docs; I'd be interested in reading the implementation. As for using both in the same optimizer, I could only see an algorithm such as a customized-A* being used to planning *some* large queries. The reason I say this is because the cost calculation, which would still need to be breadth-first, could calculate and cache the cost of most nodes thereby allowing you to possibly perform transformations at the tail of calculation. As for references to query optimization possibly using best-first algorithms, I think I saw several types of algorithms used in work from a university query optimization engine. I can't remember if it was Cornell, Stanford, or Wisconsin... I'll try and get you a link to their info. -JonahOn 9/14/05, Pryscila B Guttoski <[EMAIL PROTECTED]> wrote: Hi Jonah,Thank's for your email, I really appreciate your opinions.Is it interesting to use both techniques? For example:Given a query, an optimizer:1. Generates one of the possible execution plans. 2. Does transformations on the original plan, based on rules andheuristics, resulting in new alternative plans.3. Evaluates the cost of generated plans by using statistics.4. Keeps plans that have lower cost than the original plan 5. Repeat 2-4 over the new alternative plans.What do you think about it? Are there any restrictions that I haven't seen?About other method...Have you heard about using PDDL ("Planning Domain Definition Language") for query optimization?[]'sPryscilaOn 9/14/05, Jonah H. Harris <[EMAIL PROTECTED]> wrote:> Pryscila,>> While I haven't been too involved in the open source PostgreSQL optimizer, > I have done some work on it and optimizers in other database systems.>> Based on my work, it is my opinion that PostgreSQL, as-well-as other> databases which use a cost-based optimizer, prefer a breadth-first algorithm > because one cannot determine the "real" cost of each node at run-time> without systematically examining all possibilities through calculation.> This is the opposite of a rule-based optimizer which defines heuristics > which can be evaulated by a best-first algorithm such as A*.>> In a cost-based optimizer, the system must calculate the "cost" of each> path based on data that changes during run-time including indexing, > cardinality, tuple size, available memory, CPU usage, disk access times,> etc. To a cost-based optimizer, every query is unique and therefore cannot> follow a weighted path in the same fashion. I can certainly see A* being > used in a rule-based optimizer but not in a real-time cost-based optimizer.>> Perhaps Tom, Bruce, et al have more specifics on PostgreSQL's> implementation.>> -Jonah> On 9/13/05, Pryscila B Guttoski <[EMAIL PROTECTED]> wrote:> > Hello all!> >> > On my master course, I'm studying the PostgreSQL's optimizer. > > I don't know if anyone in this list have been participated from the> PostgreSQL's Optimizer development, but maybe someone can help me on this> question.> > PostgreSQL generates all possible plans of executing the query (using an > almost exhaustive search), then gives a cost to each plan and finally the> cheapest one is selected for execution.> > There are other methods for query optimization, one of them is based on> plan tra
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom, et al., Updated, with full recompiles between everything and the new modification: N, runtime: Tip:1 31s 2 37s 4 86s 8 159s no-cmpb:1 32s 2 43s 4 83s 8 168s spin: 1 32s 2 51s 4 84s 8 160s spin+mod: 1 32s 2 51s 4 89s 8 158s spin+no-cmpb: 1 32s 2 51s 4 87s 8 163s spin+mod+no-cmpb: 1 32s 2 50s 4 86s 8 161s Unfortunately, the results don't seem to be terribly consistent between runs anyway: Run 2: Tip:1 32s 2 43s 4 87s 8 160s no-cmpb:1 31s 2 47s 4 83s 8 167s spin: 1 32s 2 52s 4 88s 8 154s spin+no-cmpb: 1 32s 2 51s 4 102s 8 166s spin+mod: 1 32s 2 53s 4 85s 8 154s spin+mod+no-cmpb: 1 32s 2 51s 4 91s 8 161s Hope it helps, Stephen signature.asc Description: Digital signature
Re: [HACKERS] About method of PostgreSQL's Optimizer
Hi Jonah, Thank's for your email, I really appreciate your opinions. Is it interesting to use both techniques? For example: Given a query, an optimizer: 1. Generates one of the possible execution plans. 2. Does transformations on the original plan, based on rules and heuristics, resulting in new alternative plans. 3. Evaluates the cost of generated plans by using statistics. 4. Keeps plans that have lower cost than the original plan 5. Repeat 2-4 over the new alternative plans. What do you think about it? Are there any restrictions that I haven't seen? About other method... Have you heard about using PDDL ("Planning Domain Definition Language") for query optimization? []'s Pryscila On 9/14/05, Jonah H. Harris <[EMAIL PROTECTED]> wrote: > Pryscila, > > While I haven't been too involved in the open source PostgreSQL optimizer, > I have done some work on it and optimizers in other database systems. > > Based on my work, it is my opinion that PostgreSQL, as-well-as other > databases which use a cost-based optimizer, prefer a breadth-first algorithm > because one cannot determine the "real" cost of each node at run-time > without systematically examining all possibilities through calculation. > This is the opposite of a rule-based optimizer which defines heuristics > which can be evaulated by a best-first algorithm such as A*. > > In a cost-based optimizer, the system must calculate the "cost" of each > path based on data that changes during run-time including indexing, > cardinality, tuple size, available memory, CPU usage, disk access times, > etc. To a cost-based optimizer, every query is unique and therefore cannot > follow a weighted path in the same fashion. I can certainly see A* being > used in a rule-based optimizer but not in a real-time cost-based optimizer. > > Perhaps Tom, Bruce, et al have more specifics on PostgreSQL's > implementation. > > -Jonah > > > > > On 9/13/05, Pryscila B Guttoski <[EMAIL PROTECTED]> wrote: > > Hello all! > > > > On my master course, I'm studying the PostgreSQL's optimizer. > > I don't know if anyone in this list have been participated from the > PostgreSQL's Optimizer development, but maybe someone can help me on this > question. > > PostgreSQL generates all possible plans of executing the query (using an > almost exhaustive search), then gives a cost to each plan and finally the > cheapest one is selected for execution. > > There are other methods for query optimization, one of them is based on > plan transformations (for example, using A-Star algorithm) instead of plan > constructions used by PostgreSQL. > > Does anyone know why this method was choosen? Are there any papers or > researches about it? > > > > Thank's a lot, > > Pryscila. > > > > > > -- > Respectfully, > > Jonah H. Harris, Database Internals Architect > EnterpriseDB Corporation > http://www.enterprisedb.com/ > ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] About method of PostgreSQL's Optimizer
Pryscila, > > There are other methods for query optimization, one of them is based on > > plan transformations (for example, using A-Star algorithm) instead of > > plan constructions used by PostgreSQL. We do certainly need a specific optimization for large star-schema joins. I'm not certain that A* is suitable for our cost model, though; I think we might need to work up something more particular to Postgres. > > Does anyone know why this method was choosen? Are there any papers or > > researches about it? There probably are on ACM but I've not read them. Ours is a pretty straightforward implementation of a cost-based optimizer. You can always read the code ;-) Mark Kirkwood put together this nice paper on planner statistics: http://www.powerpostgresql.com/PlanStats -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.1 system info / admin functions
Stephen Frost wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: Neil Conway <[EMAIL PROTECTED]> writes: While we're on the subject, the units used by pg_size_pretty() are incorrect, at least according to the IEC: for example, "MB" is strictly-speaking one million bytes, not 1024^2 bytes. 1024^2 bytes is 1 MiB (similarly for KiB, GiB, and TiB). I'll take a look at fixing this as well, barring any objections. [ itch... ] The IEC may think they get to define what's correct, but I don't think that squares with common usage. The only people who think MB is measured in decimal are disk-manufacturer marketroids. That isn't entirely accurate, unfortunately. Telecom also generally uses decimal. I'm as unhappy with the current situation as anyone but it's not just disk makers & marketing folks, even amoung the computing industry. Telcos don't tend to follow standard computer industry thinking... Kilo, Mega and so on are used as decimal exponent multiples to SI units, like meter, gram, There's no such thing as a byte as physical unit, it's completely artificial. pg_size_pretty follows the standard of "df -h" or "du -h" and so on, which use the well-known 1024-multiples, as every sane user of bytes will. Regards, Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] parameterized fetch
I've noticed that trying to parameterize a fetch statement via ExecParams returns a syntax error: fetch $1 from my_cursor; This is not really a big deal, but maybe it should be documented which statements can be parameterized and which can't (I take it that any statement that can be prepared can be parameterized; prepare test as fetch 1 from my_cursor; also returns a syntax error; Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bug with cursor declaration in plpgsql? (Repost)
"Michael Paesold" <[EMAIL PROTECTED]> writes: > I get: > ERROR: syntax error at or near "," at character 237 > LINE 9: credit_cursor CURSOR (p_account integer, p_reference integ... > The same function works perfectly well in 7.4.8 and 8.0.3. > A bug? Yeah, looks like Neil accidentally dropped the comma from decl_cursor_arglist when he whacked that code around to use Lists instead of handmade arrays. Thanks for catching it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.1 system info / admin functions
* Tom Lane ([EMAIL PROTECTED]) wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > While we're on the subject, the units used by pg_size_pretty() are > > incorrect, at least according to the IEC: for example, "MB" is > > strictly-speaking one million bytes, not 1024^2 bytes. 1024^2 bytes is 1 > > MiB (similarly for KiB, GiB, and TiB). I'll take a look at fixing this > > as well, barring any objections. > > [ itch... ] The IEC may think they get to define what's correct, but > I don't think that squares with common usage. The only people who > think MB is measured in decimal are disk-manufacturer marketroids. That isn't entirely accurate, unfortunately. Telecom also generally uses decimal. I'm as unhappy with the current situation as anyone but it's not just disk makers & marketing folks, even amoung the computing industry. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] VACUUM VERBOSE 8.1dev
I'm not even sure that the new output does tell me the same thing. I certainly prefer the previous output. I think this will be very confusing to users who aren't familiar with the internals of postgres. Dave On 13-Sep-05, at 11:44 PM, Joshua D. Drake wrote: Hello, It seems the new VACUUM VERBOSE output is not quite as helpful as 8.0. In 8.0 I get a nice output at the end like this: INFO: free space map: 1377 relations, 22478 pages stored; 44112 total pages needed DETAIL: Allocated FSM size: 10 relations + 200 pages = 17702 kB shared memory. Which tells me exactly what I need to do with max_fsm_pages/relations. The new output appears: booktown=# VACUUM FULL VERBOSE books ; INFO: vacuuming "public.books" INFO: "books": found 32804 removable, 14 nonremovable row versions in 242 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 48 to 72 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 1845484 bytes. 241 pages are or will become empty, including 241 at the end of the table. 1 pages containing 6768 free bytes are potential move destinations. I am sure the above tells me the same thing but not as easily. Comments? Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On Tue, Sep 13, 2005 at 08:14:47PM -0400, Stephen Frost wrote: > I suppose another option would be to provide seperate packages... Could > this be done as a shared library so it's more 'plug-and-play' to switch > between the two? I dunno, just trying to think about how to deal with > this without making it suck terribly bad for me (as a Debian user and > maintainer). Note that the Linux kernel has played with moving spinlock code out of line. Due to the effect of having so many, it ended up that the memory saved by moving the code out of line actually benefitted overall. An unconditional call to a function can't be that expensive, surely. However, it would *have* to be in the same compile unit, no shared libraries. ELF imposes some overhead for calls in different compile units, using function pointers won't save you (see Ulrich Drepper paper on it). However, to make it flexible you would need a pointer to a function pointer. Fortunatly these variables won't change often so the function pointer and the function itself should be in the cache if used often enough. Finally, the kernel solves the problem by saying, if you compile for uniprocessor, optimize the code away, since a lot of the issues don't apply. My main concern is how you even detect the number of processors in a portable way... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpLWptHgoZDN.pgp Description: PGP signature
Re: [HACKERS] 8.1 system info / admin functions
Tom Lane wrote: Neil Conway <[EMAIL PROTECTED]> writes: (2) pg_cancel_backend(), pg_reload_conf(), and pg_rotate_logfile() all return an int indicating success (1) or failure (0). Why shouldn't these functions return a boolean? I would have used boolean as return code for success and failure, but the previously existing pg_cancel_backend did return int so I matched that behaviour. (Presumably there is a good reason why these functions return a status code at all, rather than aborting via elog on error -- right?) I agree with both of those criticisms: total is more in line with our nomenclature than complete, and the other functions should return void and ereport when they are unhappy. (Saying "I failed" and not having any mechanism to report why sucks.) These functions will only handle being called from a non-superuser as fatal error, failures from normal executions (i.e. returning 0) will give an elog WARNING and thus complete report about the cause, if needed. Nonexecution of these functions isn't really a problem (and not synchronizable in transactions for following statements anyway), OTOH having to deal with errors in the client that could be safely ignored or add savepoints sounds like overkill and a solution for a non-problem. Regards, Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql CVS callgraph data from dbt2
Mark Wong wrote: Hi everyone, For those of you watching the the daily results generated from STP (http://developer.osdl.org/markw/postgrescvs/dbt2/) I have callgraph data from oprofile collected starting from the Sept 9 results. Here is an example of what it looks like: http://www.testing.osdl.org/stp/303184/results/0/callgraph.txt I hope it's useful. Rather unrelated, but the tests here use the standard settings for wal_buffers. As I have read on this list, 8.1 needs a much higher setting for wal_buffers for good performance (128 or 256?). I don't know if you want to change this setting because it will make the results incomparable, but it should improve overall performance. See http://archives.postgresql.org/pgsql-hackers/2005-07/msg01004.php Best Regards, Michael Paesold ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom Lane wrote: "Michael Paesold" <[EMAIL PROTECTED]> writes: To have other data, I have retested the patches on a single-cpu Intel P4 3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 dual-Xeon results it's clear that this is in reality only one cpu. While the runtime for N=1 is better than the other system, for N=4 it's already worse. The situation with the patches is quite different, though. Unfortunatly. CVS tip from 2005-09-12: 1: 36s 2: 77s (cpu ~85%)4: 159s (cpu ~98%) only slock-no-cmpb: 1: 36s 2: 81s (cpu ~79%)4: 177s (cpu ~94%) (doesn't help this time) Hm. This is the first configuration we've seen in which slock-no-cmpb was a loss. Could you double-check that result? The first tests were compiled with CFLAGS='-O2 -mcpu=pentium4 -march=pentium4'. I had redone the tests with just CFLAGS='-O2' yesterday. The difference was just about a second, but the result from the patch was the same. The results for N=4 and N=8 show the positive effect more clearly. configure: CFLAGS='-O2' --enable-casserts On RHEL 4.1, gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1) CVS tip from 2005-09-12: 1: 37s 2: 78s 4: 159s 8: 324 only slock-no-cmpb: 1: 37s 2: 82s (5%) 4: 178s (12%) 8: 362 (12%) configure: --enable-casserts (Btw. I have always done "make clean ; make ; make install" between tests) Best Regards, Michael Paesold I can't see any reasonable way to do runtime switching of the cmpb test --- whatever logic we put in to control it would cost as much or more than the cmpb anyway :-(. I think that has to be a compile-time choice. From my perspective it'd be acceptable to remove the cmpb only for x86_64, since only there does it seem to be a really significant win. On the other hand it seems that removing the cmpb is a net win on most x86 setups too, so maybe we should just do it and accept that there are some cases where it's not perfect. How many test cases do we have yet? Summary of the effects without the cmpb instruction seems to be: 8-way Opteron: better Dual/HT Xeon w/o EM64T: better Dual/HT EM64T: better for N<=cpus, worse for N>cpus (Stephen's) HT P4 w/o EM64T: worse (stronger for N>cpus) Have I missed other reports that did test the slock-no-cmpb.patch alone? Two of the systems with positive effects are x86_64, one is an older high-end Intel x86 chip. The negative effect is on a low-cost Pentium 4 with only hyper threading. According to the mentions thread's title, this was an optimization for hyperthreading, not regular multi-cpus. We could have more data, especially on newer and high-end systems. Could some of you test the slock-no-cmpb.patch? You'll need an otherwise idle system to get repeatable results. http://archives.postgresql.org/pgsql-hackers/2005-09/msg00565.php http://archives.postgresql.org/pgsql-hackers/2005-09/msg00566.php I have re-attached the relevant files from Tom's posts because in the archive it's not clear anymore what should go into which file. See instructions in the first messages above. The patch applies to CVS tip with patch -p1 < slock-no-cmpb.patch Best Regards, Michael Paesold slock-no-cmpb.patch Description: Binary data test_setup.sql Description: Binary data test_run_small.sql Description: Binary data startn.sh Description: Binary data ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Stephen Frost <[EMAIL PROTECTED]> writes: > * Tom Lane ([EMAIL PROTECTED]) wrote: > > I'm starting to think that we might have to succumb to having a compile > > option "optimize for multiprocessor" or "optimize for single processor". > > It's pretty hard to see how we'd alter a data structure decision like > > this on the fly. > > I'd really hate to see this happen. I suspect you're thinking too hard about this. It's not like some LWLocks would need SMP spinlocks and others uniprocessor locks. And it's not like it's going to change dynamically. Wouldn't it just be a couple: if (multiprocessor) { complex version } else { simple version } Ugly to be sure but it's not going to spread to other areas of the source base and you know the if statement isn't going to be growing any more arms. It's no uglier than doing the same thing with an #ifdef anyways. If you really really want to have two versions and you think using function pointers would impose too big a performance penalty you could play linker games. But that way lies madness. -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane > Sent: 13 September 2005 23:03 > To: Marko Kreen > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Spinlocks, yet again: analysis and > proposed patches > > I'm starting to think that we might have to succumb to having > a compile > option "optimize for multiprocessor" or "optimize for single > processor". > It's pretty hard to see how we'd alter a data structure decision like > this on the fly. That would be a major pita for packagers of binary distributions. Regards, Dave. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom Lane wrote: But the cmpb instruction in the 8.0 version of TAS would have done that, and I think we've already established that the cmpb is a loss on most machines (except maybe single-physical-CPU Xeons). Note that this was a regular Pentium 4 system, not a Xeon. Best Regards, Michael Paesold ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster