date:20050914

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-14 Thread Jonah H. Harris

Pryscila,

For research reference, you may want to look at the work done on the
Columbia Query Optimization Framework.  As I recall, I think it
(or its predecessors) had both cost and rule-based optimization. 
If you need the code to it, I can dig it up on one of my old systems.

Albeit dated, another good reference for optimizer implementation is the cascades query optimization framework.

On 9/15/05, Jonah H. Harris <[EMAIL PROTECTED]> wrote:
Tom,

I agree.  There have been several occasions where GEQO has
performed poorly for me.  I'll search the archives for the past
discussions.

sorry for sending this to you twice Tom... forgot to hit reply all :(On 9/14/05, Tom Lane <
[EMAIL PROTECTED]
> wrote:"Jonah H. Harris" <
[EMAIL PROTECTED]
> writes:> As for using both in the same optimizer, I could only see an algorithm such> as a customized-A* being used to planning *some* large queries. The reason I> say this is because the cost calculation, which would still need to be
> breadth-first, could calculate and cache the cost of most nodes thereby> allowing you to possibly perform transformations at the tail of calculation.We do already have two different plan search algorithms: the strict
bottom-up dynamic programming approach (System R style) and the GEQOoptimizer, which we switch to when there are too many joins needed toallow exhaustive search.  The GEQO code still depends on the normal
plan cost estimation code, but it doesn't consider every possible plan.
I've never been very happy with the GEQO code: the random component ofthe algorithm means you get unpredictable (and sometimes awful) plans,and the particular variant that we are using is really designed to solve
traveling-salesman problems.  It's at best a poor fit to the joinplanning problem.So it seems interesting to me to think about replacing GEQO with arule-based optimizer for large join search spaces.

There are previous discussions about this in the archives, I believe.regards,
tom lane-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporation
http://www.enterprisedb.com/

-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-14 Thread Jonah H. Harris

Tom,



I agree.  There have been several occasions where GEQO has
performed poorly for me.  I'll search the archives for the past
discussions.

sorry for sending this to you twice Tom... forgot to hit reply all :(On 9/14/05, Tom Lane <[EMAIL PROTECTED]
> wrote:"Jonah H. Harris" <[EMAIL PROTECTED]
> writes:> As for using both in the same optimizer, I could only see an algorithm such> as a customized-A* being used to planning *some* large queries. The reason I> say this is because the cost calculation, which would still need to be
> breadth-first, could calculate and cache the cost of most nodes thereby> allowing you to possibly perform transformations at the tail of calculation.We do already have two different plan search algorithms: the strict
bottom-up dynamic programming approach (System R style) and the GEQOoptimizer, which we switch to when there are too many joins needed toallow exhaustive search.  The GEQO code still depends on the normalplan cost estimation code, but it doesn't consider every possible plan.
I've never been very happy with the GEQO code: the random component ofthe algorithm means you get unpredictable (and sometimes awful) plans,and the particular variant that we are using is really designed to solve
traveling-salesman problems.  It's at best a poor fit to the joinplanning problem.So it seems interesting to me to think about replacing GEQO with arule-based optimizer for large join search spaces.
There are previous discussions about this in the archives, I believe.regards,
tom lane-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/

Re: [HACKERS] Per-table freeze limit proposal

2005-09-14 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> In fact this seems pretty easy to do.  Add a field to pg_class, tell
> VACUUM to update it using the determined freezeLimit, and that's it.

I think that it'd be worth fixing things so that the recorded value
is not the freeze cutoff value (as now), but the actual lowest
not-frozen XID present anywhere in the table.  The present code does not
do that because it's painful to track across multiple tables, but on a
per-table basis it seems easy.  In particular this rule allows you to
set a sane value for the pg_class field when the table is created (ie,
current transaction's XMIN, rather than a billion less).

> (Note that if we ever implement partial vacuum, it won't be able to
> update the freeze point.  But that was true before anyway.)

Sure.

> We also need to teach autovacuum to update pg_database.datfreezexid,
> using the minimum from pg_class.

No, no, no.  autovacuum is not a required part of the system and it's
not going to become so any time soon.  Updating the pg_database entry
will have to be the responsibility of VACUUM itself.  It's not that
terrible: you don't have to scan pg_class unless you see that the
pg_class.relfreezexid value you are replacing is equal to
pg_database.datfreezexid, and with the exact computation suggested
above, that won't be a common occurrence.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] inverse OR distributive law?

2005-09-14 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>  * process_duplicate_ors
>  *  Given a list of exprs which are ORed together, try to apply
>  *  the inverse OR distributive law.

> Anybody enlighten what "inverse OR distributive law" is?

Well, it's defined right above that:

 * The following code attempts to apply the inverse OR distributive law:
 *  ((A AND B) OR (A AND C))  =>  (A AND (B OR C))
 * That is, locate OR clauses in which every subclause contains an
 * identical term, and pull out the duplicated terms.

I'm not sure that "inverse OR distributive law" is standard terminology,
but I believe the implication in the other direction is usually called
the "OR distributive law".  Anyone know of better terminology?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Constraint Type Coercion issue?

2005-09-14 Thread Tom Lane

Martijn van Oosterhout  writes:
> Well yes, but given the number of possible locales, creating one class
> for each seems excessive. And each class would have to create 5
> operators (with underlying functions) and 1 comparitor function. Unless
> you could shortcut something like:

> CREATE OPERATOR CLASS ...
>OPERATOR 1 <(text,text,'en_US')
> ...
>FUNCTION 1 mycompare(text,text,'en_US')
> ...
>COLLATE en_us;

The thing that's still fairly unclear to me is whether the collation
information is attached to the operators/functions or to the data.
I recall there's been some discussion of sticking collation IDs into
individual text Datums, which is a completely different path than what
you are positing above.  Does the SQL spec mandate one or the other of
these approaches?  If it does, do we want to believe it?  (The more I
read of SQL2003, the less impressed I get...)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Per-table freeze limit proposal

2005-09-14 Thread Alvaro Herrera

Hackers,

As you've probably heard too many times already, I'm thinking in
improving vacuum, so we can keep track of the freeze Xid on a table
level, rather than database level.  Hopefully this will eliminate the
need for database-wide vacuums.

In fact this seems pretty easy to do.  Add a field to pg_class, tell
VACUUM to update it using the determined freezeLimit, and that's it.
(Note that if we ever implement partial vacuum, it won't be able to
update the freeze point.  But that was true before anyway.)

We also need to teach autovacuum to update pg_database.datfreezexid,
using the minimum from pg_class.  (I don't think it's a good idea to
seqscan pg_class to find out the minimum on each VACUUM call.) So, an
autovacuum iteration would issue all needed VACUUM/ANALYZE calls, then
get the minimum freezexid from pg_class to update pg_database.  This
way, GetNewTransactionId can continue checking pg_database.datfreezexid
as the hard limit for issuing warnings for Xid wraparound.

Does anyone see a need for anything other than the autovacuum process to
be updating pg_database.datfreezexid?  Of course, if autovacuum is not
in use, things would continue as now, that is, manual database-wide
VACUUM calls updating pg_database.datfreezexid.  But note that you can
mark all tables as disabled on pg_autovacuum, issue your manuals VACUUM
calls as needed (from cron or whatever), and use autovacuum to set
pg_database.datfreezexid -- so autovacuum would in fact do nothing
except set the freeze limit.

The problem is, this seems so awfully simple that I fear I am missing
something ...  Otherwise, does this sound like a plan?

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
The easiest way to resolve [trivial code guidelines disputes] is to fire
one or both of the people involved.  (Damian Conway)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] inverse OR distributive law?

2005-09-14 Thread Dann Corbit

To find out about boolean logic, take a look here:
http://www.laynetworks.com/Boolean%20Algebra.htm

Where I work, we took the SIS toolkit from Berkeley and did a
simplification of the where clause as if it was a Boolean integrated
circuit.  Of course, you may get answers that you do not expect if the
data has NULL values, so you can turn that simplification option off
also.

After Boolean simplification, this:
SELECT Products.RECORD_NUMBER , 
Products.PRODUCTID , 
Products.PRODUCTNAME , 
Products.PRODUCTPRICE , 
Products.PRODUCTKEYWORDS , 
Products.PRODUCTGROUPID 
FROM Products 
WHERE ( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS )
Like '%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) AND ( (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( ( Products.PRODUCTPRICE > 14 ) AND (
Products.PRODUCTGROUPID < 10 ) ) OR ( ( Products.PRODUCTPRICE > 14 ) AND
( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTGROUPID < 10 ) ) AND
( ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( ( Products.PRODUCTPRICE > 14 ) AND (
Products.PRODUCTGROUPID < 10 ) ) OR ( ( Products.PRODUCTPRICE > 14 ) AND
( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR ( (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTGROUPID < 10 ) ) OR (
( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( ( Products.PRODUCTPRICE > 14 ) OR (
Products.PRODUCTGROUPID < 10 ) ) AND ( ( Products.PRODUCTPRICE > 14 ) OR
( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( (
Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTGROUPID < 10 ) ) OR (
( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) AND ( (
Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) OR ( NOT (
Products.PRODUCTPRICE > 14 ) AND NOT ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( NOT ( Products.PRODUCTPRICE > 14 ) AND NOT (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT (
Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR NOT (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( Products.PRODUCTGROUPID < 10 ) OR ( (
Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) OR ( (
Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( Products.PRODUCTGROUPID < 10 ) AND ( NOT (
Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( NOT (
Products.PRODUCTPRICE > 14 ) OR NOT ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( NOT ( Products.PRODUCTPRICE > 14 ) OR (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND ( Products.PRODUCTPRICE
> 14 ) AND ( ( Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND (
Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND (
Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND (
Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) AND (
Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR (
Products.PRODUCTGROUPID < 10 ) OR ( Products.PRODUCTPRICE > 14 ) AND ( (
Products.PRODUCTKEYWORDS ) Like '%coffee%' ) OR (
Products.PRODUCTGROUPID < 10 ) AND ( Products.PRODUCTPRICE > 14 ) AND (
( Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) AND ( Products.PRODUCTPRICE > 14 ) AND ( (
Products.PRODUCTPRICE > 14 ) OR ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' ) OR ( Products.PRODUCTPRICE > 14 ) AND (
Products.PRODUCTKEYWORDS ) Like '%coffee%' AND ( Products.PRODUCTPRICE >
14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' OR (
Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' OR ( Products.PRODUCTPRICE > 14 ) AND NOT (
Products.PRODUCTKEYWORDS ) Like '%coffee%' AND ( Products.PRODUCTPRICE >
14 ) AND ( Products.PRODUCTKEYWORDS ) Like '%coffee%' OR (
Products.PRODUCTPRICE > 14 ) AND NOT ( Products.PRODUCTKEYWORDS ) Like
'%coffee%' OR ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTPRICE
> 14 ) AND ( Products.PRODUCTPRICE > 14 ) AND ( Products.PRODUCTPRICE >
14 ) OR ( Products.PRODUCTPRICE > 14 ) AND 0 AND ( Products.PRODUCTPR

[HACKERS] inverse OR distributive law?

2005-09-14 Thread Tatsuo Ishii

Hi,

I have been looking around optimizer's code and found a comment:

/*
 * process_duplicate_ors
 *Given a list of exprs which are ORed together, try to apply
 *the inverse OR distributive law.

Anybody enlighten what "inverse OR distributive law" is?
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Bug with cursor declaration in PL/pgSQL in CVS tip?

2005-09-14 Thread Michael Paesold

I have used to declare cursors in the DECLARE section of a PL/pgSQL 
function. The example here seems to be broken in CVS tip:


CREATE FUNCTION test () RETURNS void AS '
DECLARE
  credit_cursor CURSOR (p_account integer, p_reference integer) FOR
   SELECT * FROM booking
 WHERE account_id=p_account AND reference=p_reference
   AND unassigned_amount = amount AND amount > 0 AND side=''credit''
   AND position_closed AND type NOT IN (''RC'', ''RP'')
   ORDER BY journal_id ASC;
BEGIN
END
'
LANGUAGE PLPGSQL;

I get:
ERROR:  syntax error at or near "," at character 237
LINE 9:   credit_cursor CURSOR (p_account integer, p_reference integ...


The same function works perfectly well in 7.4.8 and 8.0.3.
A bug?

Best Regards,
Michael Paesold 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] parameterized fetch

2005-09-14 Thread Oliver Jowett

Merlin Moncure wrote:
> I've noticed that trying to parameterize a fetch statement via
> ExecParams returns a syntax error:
> 
> fetch $1 from my_cursor;
> 
> This is not really a big deal, but maybe it should be documented which
> statements can be parameterized and which can't

Currently the documentation is "the backend's grammar". You can only put
parameters where there is a PARAM node, which currently means "anywhere
you can put a c_expr". So if you can replace something with an
expression, you can probably also replace it with a parameter.

-O

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Constraint Type Coercion issue?

2005-09-14 Thread Martijn van Oosterhout

On Wed, Sep 14, 2005 at 05:28:42PM -0400, Tom Lane wrote:
> > To some extent, collate
> > implies a sort of parameterised operator class...
> 
> Hmm.  But an index couldn't support more than one collation order
> AFAICS.  It'd probably make more sense to create operators and an
> operator class for each collation you want to support; the mapping
> to a call of a common support function would be embedded inside the
> operator definition.  Otherwise we have to pass around an additional
> parameter through an awful lot of places...

Well yes, but given the number of possible locales, creating one class
for each seems excessive. And each class would have to create 5
operators (with underlying functions) and 1 comparitor function. Unless
you could shortcut something like:

CREATE OPERATOR CLASS ...
   OPERATOR 1 <(text,text,'en_US')
...
   FUNCTION 1 mycompare(text,text,'en_US')
...
   COLLATE en_us;

Otherwise you end up with lots of functions which have be created on
the fly as the user decides what collate orders he wants. Invoking SQL
functions in the btree index create cycle doesn't seem efficient.

You would have to come up with new names for the operators each time
because the argument types are going to be the same. Although I guess
you could call it OPERATOR(   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpMc8fzZLktq.pgp
Description: PGP signature

Re: [HACKERS] Constraint Type Coercion issue?

2005-09-14 Thread Tom Lane

Martijn van Oosterhout  writes:
> How much discussion has there been on this?

None yet; I had a few half-baked ideas but nothing worth presenting to
the list.

> To some extent, collate
> implies a sort of parameterised operator class...

Hmm.  But an index couldn't support more than one collation order
AFAICS.  It'd probably make more sense to create operators and an
operator class for each collation you want to support; the mapping
to a call of a common support function would be embedded inside the
operator definition.  Otherwise we have to pass around an additional
parameter through an awful lot of places...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Constraint Type Coercion issue?

2005-09-14 Thread Martijn van Oosterhout

On Wed, Sep 14, 2005 at 02:23:29PM -0400, Tom Lane wrote:
> I've been thinking about this off and on, and would like to solve it
> in the 8.2 time frame, but it's not happening for 8.1.  At a minimum
> it'll require some significant changes in our concept of what an
> operator class is.  The half-jelled ideas I have involve inventing
[snip]

How much discussion has there been on this? I've been working my way
through COLLATE support and indexes and realised that what I really
want is to allow the comparison functions in operator classes to be
three argument functions. The two things to compare and the collate
order. A descending index is really just another collate order, albeit
one easily imposed from the outside.

Although numbers tend not to have many interesting collate orders,
complex numbers do, as do obviously strings. To some extent, collate
implies a sort of parameterised operator class...

Definitly 8.2 stuff, and it's not simple stuff either...

-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgpVP4Exb2Tl2.pgp
Description: PGP signature

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Michael Paesold


Tom Lane wrote:



I wrote:

We could ameliorate this if there were a way to acquire ownership of the
cache line without necessarily winning the spinlock.  I'm imagining
that we insert a "dummy" locked instruction just ahead of the xchgb,
which touches the spinlock in such a way as to not change its state.


I tried this, using this tas code:

...

I have tried it on the P4 w/ HT.

CVS tip 1: 37s  2: 78s  4: 159s  8: 324
only slock-no-cmpb  1: 37s  2: 82s  4: 178s  8: 362
only dummy-locking  1: 44s  2: 96s  4: 197s ...

I guess there is no reason to try any more.

Best Regards,
Michael Paesold

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Constraint Type Coercion issue?

2005-09-14 Thread Tom Lane

Josh Berkus  writes:
> So, is this a real bug in constraints or does the problem lie somewhere 
> else?   Is it fixable?

Not readily.  The problem is here:

 * We must find a btree opclass that contains both operators, else the
 * implication can't be determined.  Also, the pred_op has to be of
 * default subtype (implying left and right input datatypes are the
 * same); otherwise it's unsafe to put the pred_const on the left side
 * of the test.  Also, the opclass must contain a suitable test
 * operator matching the clause_const's type (which we take to mean
 * that it has the same subtype as the original clause_operator).

The predtest code depends on btree operator classes to tell it the
semantics of comparisons, and the operator class infrastructure just
doesn't support making cross-type inferences very well.  Given, say,
int8var <= int4const
we'd like to determine whether
int8var <= otherint4const
is implied (or refuted), but to do this we need to compare the two
int4 constants, for which we need the int4 vs int4 comparison operator,
which has no relationship whatever to the int8 operator class in which
we find the int8 <= int4 operators that are present in the clauses.

There are some related cases in btree index search startup that would
be nice to fix too.

I've been thinking about this off and on, and would like to solve it
in the 8.2 time frame, but it's not happening for 8.1.  At a minimum
it'll require some significant changes in our concept of what an
operator class is.  The half-jelled ideas I have involve inventing
families of operator classes, so that we could for instance represent
the idea that "int2_ops, int4_ops, and int8_ops are all compatible,
and you can get consistent results from any of these operators".
There are some related problems involving mergejoin and the ability
to deal with reverse-sort indexes that also need to be dealt with,
and seem to require extensions to the operator class concept.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-14 Thread Tom Lane

"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> As for using both in the same optimizer, I could only see an algorithm such 
> as a customized-A* being used to planning *some* large queries. The reason I 
> say this is because the cost calculation, which would still need to be 
> breadth-first, could calculate and cache the cost of most nodes thereby 
> allowing you to possibly perform transformations at the tail of calculation.

We do already have two different plan search algorithms: the strict
bottom-up dynamic programming approach (System R style) and the GEQO
optimizer, which we switch to when there are too many joins needed to
allow exhaustive search.  The GEQO code still depends on the normal
plan cost estimation code, but it doesn't consider every possible plan.

I've never been very happy with the GEQO code: the random component of
the algorithm means you get unpredictable (and sometimes awful) plans,
and the particular variant that we are using is really designed to solve
traveling-salesman problems.  It's at best a poor fit to the join
planning problem.

So it seems interesting to me to think about replacing GEQO with a
rule-based optimizer for large join search spaces.

There are previous discussions about this in the archives, I believe.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Constraint Type Coercion issue?

2005-09-14 Thread Josh Berkus

Folks,

Bob Ippolito found this while testing Bizgres.

It *seems* like our smarter type coercion rules do not apply when 
constraints are being generated.  That is, the types of constants in 
constraints, if not coerced, still default to the old "dumb" casting where 
the type of the comparing column isn't checked.

This is visible if you run a simple test on constraint exclusion:

CE not used

set constraint_exclusion=on;
create table a ( a bigint, b text );
create table a1 () inherits (a);
create table a2 () inherits (a);
create table a3 () inherits (a);
alter table a1 add constraint a1_a check ( a between 1 and 3);
alter table a2 add constraint a2_a check ( a between 4 and 6);
alter table a3 add constraint a3_a check ( a between 7 and 9);
insert into a1 values ( 1, 'B' );
insert into a2 values ( 5, 'F' );
insert into a3 values ( 8, 'G' );
explain analyze select * from a where a.a between 5 and 6;

CE used
-
create table a ( a bigint, b text );
create table a1 () inherits (a);
create table a2 () inherits (a);
create table a3 () inherits (a);
alter table a1 add constraint a1_a check ( a between 1::BIGINT and 
3::BIGINT);
alter table a2 add constraint a2_a check ( a between 4::BIGINT and 
6::BIGINT);
alter table a3 add constraint a3_a check ( a between 7::BIGINT and 
9::BIGINT);
insert into a1 values ( 1, 'B' );
insert into a2 values ( 5, 'F' );
insert into a3 values ( 8, 'G' );
explain analyze select * from a where a.a between 5 and 6;


So, is this a real bug in constraints or does the problem lie somewhere 
else?   Is it fixable?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Tom Lane

I wrote:
> Another thought came to mind: maybe the current data layout for LWLocks
> is bad.  Right now, the spinlock that protects each LWLock data struct
> is itself part of the struct, and since the structs aren't large (circa
> 20 bytes), the whole thing is usually all in the same cache line. ...
> Maybe it'd be better to allocate the spinlocks off by themselves.

Well, this is odd.  I made up a patch to do this (attached) and found
that it pretty much sucks.  Still the 4-way Opteron, previous best
(slock-no-cmpb and spin-delay-2):
1 31s   2 42s   4 51s   8 100s
with lwlock-separate added:
1 31s   2 52s   4 106s  8 213s

What I had expected to see was a penalty in the single-thread case (due
to instructions added to the LWLock functions), but there isn't any.
I did not expect to see a factor-of-2 penalty for four threads.

I guess what this means is that there's no real problem with losing the
cache line while manipulating the LWLock, which is what the patch was
intended to prevent.  Instead, we're paying for swapping two cache lines
(the spinlock and the LWLock) across processors instead of just one line.
But that should at worst be a 2x inflation of the time previously spent
in LWLockAcquire/Release, which is surely not yet all of the application
;-).  Why the heck is this so bad?  Should we expect that apparently
minor changes in shared data structures might be costing equivalently
huge penalties in SMP performance elsewhere?

Unfortunately I don't have root on the Opteron and can't run oprofile.
But I'd really like to see some oprofile stats from these two cases
so we can figure out what in the world is going on here.  Can anyone
help?

regards, tom lane

*** src/backend/storage/lmgr/lwlock.c.orig  Sat Aug 20 19:26:24 2005
--- src/backend/storage/lmgr/lwlock.c   Wed Sep 14 11:50:09 2005
***
*** 27,35 
  #include "storage/spin.h"
  
  
  typedef struct LWLock
  {
!   slock_t mutex;  /* Protects LWLock and queue of 
PGPROCs */
boolreleaseOK;  /* T if ok to release waiters */
charexclusive;  /* # of exclusive holders (0 or 
1) */
int shared; /* # of shared holders 
(0..MaxBackends) */
--- 27,44 
  #include "storage/spin.h"
  
  
+ /*
+  * Each LWLock has an associated spinlock, which we consider as locking the
+  * content of the LWLock struct as well as the associated queue of waiting
+  * PGPROCs.  To minimize cache-line contention on multiprocessors, the
+  * spinlocks are kept physically separate from their LWLocks --- we don't
+  * want a spinning processor to interfere with access to the LWLock for
+  * the processor holding the spinlock.
+  */
+ 
  typedef struct LWLock
  {
!   slock_t*mutex;  /* Protects LWLock and queue of 
PGPROCs */
boolreleaseOK;  /* T if ok to release waiters */
charexclusive;  /* # of exclusive holders (0 or 
1) */
int shared; /* # of shared holders 
(0..MaxBackends) */
***
*** 39,44 
--- 48,65 
  } LWLock;
  
  /*
+  * We allocate CACHE_LINE_SIZE bytes for the fixed LWLocks' spinlocks.
+  * We assume that the fixed locks are few enough and heavily used enough
+  * that it's worth trying to put each one's spinlock in its own cache line.
+  * The dynamic locks are assumed to be many and less heavily hit, so they
+  * get just MAXALIGN space apiece.  (Packing spinlocks tightly sounds like a
+  * bad idea on machines where slock_t is just a byte ... even for lesser-used
+  * spinlocks, there would be enough locks per cache line to create a
+  * contention issue.)
+  */
+ #define CACHE_LINE_SIZE   64
+ 
+ /*
   * This points to the array of LWLocks in shared memory.  Backends inherit
   * the pointer by fork from the postmaster.  LWLockIds are indexes into
   * the array.
***
*** 135,144 
Sizesize;
int numLocks = NumLWLocks();
  
!   /* Allocate the LWLocks plus space for shared allocation counter. */
size = mul_size(numLocks, sizeof(LWLock));
  
!   size = add_size(size, 2 * sizeof(int));
  
return size;
  }
--- 156,174 
Sizesize;
int numLocks = NumLWLocks();
  
!   /* Space for the array of LWLock structs. */
size = mul_size(numLocks, sizeof(LWLock));
  
!   /* Space for dynamic allocation counter and MAXALIGN padding. */
!   size = add_size(size, 2 * sizeof(int) + MAXIMUM_ALIGNOF);
! 
!   /* Space for the spinlocks --- see notes for CACHE_LINE_SIZE. */
!   size = add_size(size,
!   mul_size((int) NumFixedLWLocks,
!Max(sizeof(slock_t), 
CACHE_LINE_SIZE)));

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Tom Lane

I wrote:
> We could ameliorate this if there were a way to acquire ownership of the
> cache line without necessarily winning the spinlock.  I'm imagining
> that we insert a "dummy" locked instruction just ahead of the xchgb,
> which touches the spinlock in such a way as to not change its state.

I tried this, using this tas code:

static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;
register slock_t _dummy = 0;

/* Use a locking test before trying to take the spinlock */
/* xchg implies a LOCK prefix, so no need to say LOCK for it */
__asm__ __volatile__(
"   lock\n"
"   xaddb   %2,%1   \n"
"   xchgb   %0,%1   \n"
:   "+q"(_res), "+m"(*lock), "+q"(_dummy)
:
:   "memory", "cc");
return (int) _res;
}

At least on Opteron, it's a loser.  The previous best results (with
slock-no-cmpb and spin-delay patches) were
1 31s   2 42s   4 51s   8 100s
and with this instead of slock-no-cmpb,
1 33s   2 45s   4 55s   8 104s

The xadd may indeed be helping in terms of protecting the xchg from
end-of-timeslice --- the rate of select() delays is really tiny, one
every few seconds, which is better than I saw before.  But the extra
cost of the extra locked operation isn't getting repaid overall.
Feel free to try it on other hardware, but it doesn't look promising.

BTW, I also determined that on that 4-way Opteron box, the integer
modulo idea doesn't make any difference --- that is, spin-delay and
what Michael called spin-delay-2 are the same speed.  I think I had
tried the modulo before adding the variable spin delay, and it did
help in that configuration; but most likely, it was just helping by
stretching out the amount of time spent looping before entering the
kernel.  So we can drop that idea too.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-14 Thread Jonah H. Harris

Pryscila,

Step 2 is basically where you find the difference between a cost-based
optimizer (CBO) and a rule-based optimizer (RBO).  A CBO is based
on the computed execution cost of the query whereas an RBO uses more
generalized heuristics.

Let's get an example of what you're proposing and see if we can work it out from there.

Say we have the following (this is a generalized CBO approach, not PostgreSQL specific):

Oracle's SCOTT.EMP table with cardinality of 1 million and an index on
empno and ename.  For storage purposes say that the empno index
takes up 3600 blocks, the ename index takes up 7800 blocks, and the
table itself takes up 17000 blocks.  We'll also say that we have a
256 megabyte buffer cache of which we have cached 50% of the empno
index, 10% of the ename index, and 5% of the emp table data.

A user then issues the following query:

SELECT empno, ename FROM emp;

A cost-based optimizer will see the following:
1. See that the query is a full table scan (FTS) and calculate the cost of retrieving all 17000 blocks from disk.
2. See that the query is a FTS and that it can retrieve all data from
the indexes (11400 blocks) and join the data (which join algorithm?)

Without performing a breadth-first algorithm, how can one evaluate both
options in a way that would allow you to perform heuristic
transformations dynamically?  What transformation/heuristic/rule
can you use?  A CBO implementation has to calculate the amount of
I/O needed on each plan based on several statistics such as what's
*potentially* in the cache, what's the access time for block I/O
(including prefetching if the storage manager has it), and other
factors.  If you could name a database that uses a best-first
algorithm, such as A*, please send me the link to their docs; I'd be
interested in reading the implementation.

As for using both in the same optimizer, I could only see an algorithm
such as a customized-A* being used to planning *some* large
queries.  The reason I say this is because the cost calculation,
which would still need to be breadth-first, could calculate and cache
the cost of most nodes thereby allowing you to possibly perform
transformations at the tail of calculation.

As for references to query optimization possibly using best-first
algorithms, I think I saw several types of algorithms used in work from
a university query optimization engine.  I can't remember if it
was Cornell, Stanford, or Wisconsin... I'll try and get you a link to
their info.

-JonahOn 9/14/05, Pryscila B Guttoski <[EMAIL PROTECTED]> wrote:
Hi Jonah,Thank's for your email, I really appreciate your opinions.Is it interesting to use both techniques? For example:Given a query, an optimizer:1. Generates one of the possible execution plans.
2. Does transformations on the original plan, based on rules andheuristics, resulting in new alternative plans.3. Evaluates the cost of generated plans by using statistics.4. Keeps plans that have lower cost than the original plan
5. Repeat 2-4 over the new alternative plans.What do you think about it? Are there any restrictions that I haven't seen?About other method...Have you heard about using PDDL ("Planning Domain Definition
Language") for query optimization?[]'sPryscilaOn 9/14/05, Jonah H. Harris <[EMAIL PROTECTED]> wrote:> Pryscila,>>  While I haven't been too involved in the open source PostgreSQL optimizer,
> I have done some work on it and optimizers in other database systems.>>  Based on my work, it is my opinion that PostgreSQL, as-well-as other> databases which use a cost-based optimizer, prefer a breadth-first algorithm
> because one cannot determine the "real" cost of each node at run-time> without systematically examining all possibilities through calculation.> This is the opposite of a rule-based optimizer which defines heuristics
> which can be evaulated by a best-first algorithm such as A*.>>  In a cost-based optimizer, the system must calculate the "cost" of each> path based on data that changes during run-time including indexing,
> cardinality, tuple size, available memory, CPU usage, disk access times,> etc.  To a cost-based optimizer, every query is unique and therefore cannot> follow a weighted path in the same fashion.  I can certainly see A* being
> used in a rule-based optimizer but not in a real-time cost-based optimizer.>>  Perhaps Tom, Bruce, et al have more specifics on PostgreSQL's> implementation.>>  -Jonah>
 On 9/13/05, Pryscila B Guttoski <[EMAIL PROTECTED]> wrote:> > Hello all!> >> > On my master course, I'm studying the PostgreSQL's optimizer.
> > I don't know if anyone in this list have been participated from the> PostgreSQL's Optimizer development, but maybe someone can help me on this> question.> > PostgreSQL generates all possible plans of executing the query (using an
> almost exhaustive search), then gives a cost to each plan and finally the> cheapest one is selected for execution.> > There are other methods for query optimization, one of them is based on> plan tra

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Stephen Frost

Tom, et al.,

Updated, with full recompiles between everything and the new
modification:

N, runtime:
Tip:1 31s   2 37s   4 86s   8 159s
no-cmpb:1 32s   2 43s   4 83s   8 168s
spin:   1 32s   2 51s   4 84s   8 160s
spin+mod:   1 32s   2 51s   4 89s   8 158s
spin+no-cmpb:   1 32s   2 51s   4 87s   8 163s
spin+mod+no-cmpb:   1 32s   2 50s   4 86s   8 161s

Unfortunately, the results don't seem to be terribly consistent between
runs anyway:

Run 2:
Tip:1 32s   2 43s   4 87s   8 160s
no-cmpb:1 31s   2 47s   4 83s   8 167s
spin:   1 32s   2 52s   4 88s   8 154s
spin+no-cmpb:   1 32s   2 51s   4 102s  8 166s
spin+mod:   1 32s   2 53s   4 85s   8 154s
spin+mod+no-cmpb:   1 32s   2 51s   4 91s   8 161s

Hope it helps,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-14 Thread Pryscila B Guttoski

Hi Jonah,

Thank's for your email, I really appreciate your opinions.

Is it interesting to use both techniques? For example:
Given a query, an optimizer:
1. Generates one of the possible execution plans.
2. Does transformations on the original plan, based on rules and
heuristics, resulting in new alternative plans.
3. Evaluates the cost of generated plans by using statistics.
4. Keeps plans that have lower cost than the original plan
5. Repeat 2-4 over the new alternative plans.
What do you think about it? Are there any restrictions that I haven't seen?

About other method...
Have you heard about using PDDL ("Planning Domain Definition
Language") for query optimization?

[]'s
Pryscila


On 9/14/05, Jonah H. Harris <[EMAIL PROTECTED]> wrote:
> Pryscila,
>  
>  While I haven't been too involved in the open source PostgreSQL optimizer,
> I have done some work on it and optimizers in other database systems.
>  
>  Based on my work, it is my opinion that PostgreSQL, as-well-as other
> databases which use a cost-based optimizer, prefer a breadth-first algorithm
> because one cannot determine the "real" cost of each node at run-time
> without systematically examining all possibilities through calculation. 
> This is the opposite of a rule-based optimizer which defines heuristics
> which can be evaulated by a best-first algorithm such as A*.
>  
>  In a cost-based optimizer, the system must calculate the "cost" of each
> path based on data that changes during run-time including indexing,
> cardinality, tuple size, available memory, CPU usage, disk access times,
> etc.  To a cost-based optimizer, every query is unique and therefore cannot
> follow a weighted path in the same fashion.  I can certainly see A* being
> used in a rule-based optimizer but not in a real-time cost-based optimizer.
>  
>  Perhaps Tom, Bruce, et al have more specifics on PostgreSQL's
> implementation.
>  
>  -Jonah
> 
>  
>  
>  
> On 9/13/05, Pryscila B Guttoski <[EMAIL PROTECTED]> wrote:
> > Hello all!
> > 
> > On my master course, I'm studying the PostgreSQL's optimizer.
> > I don't know if anyone in this list have been participated from the
> PostgreSQL's Optimizer development, but maybe someone can help me on this
> question.
> > PostgreSQL generates all possible plans of executing the query (using an
> almost exhaustive search), then gives a cost to each plan and finally the
> cheapest one is selected for execution.
> > There are other methods for query optimization, one of them is based on
> plan transformations (for example, using A-Star algorithm) instead of plan
> constructions used by PostgreSQL. 
> > Does anyone know why this method was choosen? Are there any papers or
> researches about it?
> > 
> > Thank's a lot,
> > Pryscila.
> > 
> 
> 
> 
> -- 
> Respectfully,
> 
> Jonah H. Harris, Database Internals Architect
> EnterpriseDB Corporation
> http://www.enterprisedb.com/ 
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-14 Thread Josh Berkus

Pryscila,

> > There are other methods for query optimization, one of them is based on
> > plan transformations (for example, using A-Star algorithm) instead of
> > plan constructions used by PostgreSQL.

We do certainly need a specific optimization for large star-schema joins.  I'm 
not certain that A* is suitable for our cost model, though; I think we might 
need to work up something more particular to Postgres.

> > Does anyone know why this method was choosen? Are there any papers or
> > researches about it?

There probably are on ACM but I've not read them.   Ours is a pretty 
straightforward implementation of a cost-based optimizer.   You can always 
read the code ;-)

Mark Kirkwood put together this nice paper on planner statistics:
http://www.powerpostgresql.com/PlanStats

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] 8.1 system info / admin functions

2005-09-14 Thread Andreas Pflug


Stephen Frost wrote:

* Tom Lane ([EMAIL PROTECTED]) wrote:


Neil Conway <[EMAIL PROTECTED]> writes:

While we're on the subject, the units used by pg_size_pretty() are 
incorrect, at least according to the IEC: for example, "MB" is 
strictly-speaking one million bytes, not 1024^2 bytes. 1024^2 bytes is 1 
MiB (similarly for KiB, GiB, and TiB). I'll take a look at fixing this 
as well, barring any objections.


[ itch... ] The IEC may think they get to define what's correct, but
I don't think that squares with common usage.  The only people who
think MB is measured in decimal are disk-manufacturer marketroids.



That isn't entirely accurate, unfortunately.  Telecom also generally
uses decimal.  I'm as unhappy with the current situation as anyone but
it's not just disk makers & marketing folks, even amoung the computing
industry.


Telcos don't tend to follow standard computer industry thinking...
Kilo, Mega and so on are used as decimal exponent multiples to SI units, 
like meter, gram,  There's no such thing as a byte as physical unit, 
it's completely artificial.


pg_size_pretty follows the standard of "df -h" or "du -h" and so on, 
which use the well-known 1024-multiples, as every sane user of bytes will.


Regards,
Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] parameterized fetch

2005-09-14 Thread Merlin Moncure

I've noticed that trying to parameterize a fetch statement via
ExecParams returns a syntax error:

fetch $1 from my_cursor;

This is not really a big deal, but maybe it should be documented which
statements can be parameterized and which can't (I take it that any
statement that can be prepared can be parameterized;

prepare test as fetch 1 from my_cursor; 
also returns a syntax error;

Merlin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug with cursor declaration in plpgsql? (Repost)

2005-09-14 Thread Tom Lane

"Michael Paesold" <[EMAIL PROTECTED]> writes:
> I get:
> ERROR:  syntax error at or near "," at character 237
> LINE 9:   credit_cursor CURSOR (p_account integer, p_reference integ...

> The same function works perfectly well in 7.4.8 and 8.0.3.
> A bug?

Yeah, looks like Neil accidentally dropped the comma from
decl_cursor_arglist when he whacked that code around to use Lists
instead of handmade arrays.  Thanks for catching it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 8.1 system info / admin functions

2005-09-14 Thread Stephen Frost

* Tom Lane ([EMAIL PROTECTED]) wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > While we're on the subject, the units used by pg_size_pretty() are 
> > incorrect, at least according to the IEC: for example, "MB" is 
> > strictly-speaking one million bytes, not 1024^2 bytes. 1024^2 bytes is 1 
> > MiB (similarly for KiB, GiB, and TiB). I'll take a look at fixing this 
> > as well, barring any objections.
> 
> [ itch... ] The IEC may think they get to define what's correct, but
> I don't think that squares with common usage.  The only people who
> think MB is measured in decimal are disk-manufacturer marketroids.

That isn't entirely accurate, unfortunately.  Telecom also generally
uses decimal.  I'm as unhappy with the current situation as anyone but
it's not just disk makers & marketing folks, even amoung the computing
industry.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] VACUUM VERBOSE 8.1dev

2005-09-14 Thread Dave Cramer


I'm not even sure that the new output does tell me the same thing.

I certainly prefer the previous output. I think this will be very  
confusing to users

who aren't familiar with the internals of postgres.

Dave
On 13-Sep-05, at 11:44 PM, Joshua D. Drake wrote:


Hello,

It seems the new VACUUM VERBOSE output is not quite as helpful as 8.0.
In 8.0 I get a nice output at the end like this:

INFO:  free space map: 1377 relations, 22478 pages stored; 44112  
total pages needed
DETAIL:  Allocated FSM size: 10 relations + 200 pages =  
17702 kB shared memory.


Which tells me exactly what I need to do with max_fsm_pages/relations.

The new output appears:

booktown=# VACUUM FULL VERBOSE books ;
INFO:  vacuuming "public.books"
INFO:  "books": found 32804 removable, 14 nonremovable row versions  
in 242 pages

DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 48 to 72 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 1845484 bytes.
241 pages are or will become empty, including 241 at the end of the  
table.

1 pages containing 6768 free bytes are potential move destinations.

I am sure the above tells me the same thing but not as easily.

Comments?

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc.  
1.800.492.2240

PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Martijn van Oosterhout

On Tue, Sep 13, 2005 at 08:14:47PM -0400, Stephen Frost wrote:
> I suppose another option would be to provide seperate packages...  Could
> this be done as a shared library so it's more 'plug-and-play' to switch
> between the two?  I dunno, just trying to think about how to deal with
> this without making it suck terribly bad for me (as a Debian user and
> maintainer).

Note that the Linux kernel has played with moving spinlock code out of
line. Due to the effect of having so many, it ended up that the memory
saved by moving the code out of line actually benefitted overall. An
unconditional call to a function can't be that expensive, surely.

However, it would *have* to be in the same compile unit, no shared
libraries. ELF imposes some overhead for calls in different compile
units, using function pointers won't save you (see Ulrich Drepper
paper on it).

However, to make it flexible you would need a pointer to a function
pointer. Fortunatly these variables won't change often so the function
pointer and the function itself should be in the cache if used often
enough.

Finally, the kernel solves the problem by saying, if you compile for
uniprocessor, optimize the code away, since a lot of the issues don't
apply. My main concern is how you even detect the number of processors
in a portable way...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgpLWptHgoZDN.pgp
Description: PGP signature

Re: [HACKERS] 8.1 system info / admin functions

2005-09-14 Thread Andreas Pflug


Tom Lane wrote:

Neil Conway <[EMAIL PROTECTED]> writes:

(2) pg_cancel_backend(), pg_reload_conf(), and pg_rotate_logfile() all 
return an int indicating success (1) or failure (0). Why shouldn't these 
functions return a boolean?


I would have used boolean as return code for success and failure, but 
the previously existing pg_cancel_backend did return int so I matched 
that behaviour.




(Presumably there is a good reason why these functions return a status 
code at all, rather than aborting via elog on error -- right?)



I agree with both of those criticisms: total is more in line with our
nomenclature than complete, and the other functions should return void
and ereport when they are unhappy.  (Saying "I failed" and not having
any mechanism to report why sucks.)


These functions will only handle being called from a non-superuser as 
fatal error, failures from normal executions (i.e. returning 0) will 
give an elog WARNING and thus complete report about the cause, if needed.


Nonexecution of these functions isn't really a problem (and not 
synchronizable in transactions for following statements anyway), OTOH 
having to deal with errors in the client that could be safely ignored or 
add savepoints sounds like overkill and a solution for a non-problem.


Regards,
Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] postgresql CVS callgraph data from dbt2

2005-09-14 Thread Michael Paesold


Mark Wong wrote:


Hi everyone,

For those of you watching the the daily results generated from STP
(http://developer.osdl.org/markw/postgrescvs/dbt2/) I have callgraph
data from oprofile collected starting from the Sept 9 results.  Here is
an example of what it looks like:

http://www.testing.osdl.org/stp/303184/results/0/callgraph.txt

I hope it's useful.


Rather unrelated, but the tests here use the standard settings for
wal_buffers. As I have read on this list, 8.1 needs a much higher setting 
for wal_buffers for good performance (128 or 256?). I don't know if you want 
to change this setting because it will make the results incomparable, but it 
should improve overall performance.


See
http://archives.postgresql.org/pgsql-hackers/2005-07/msg01004.php

Best Regards,
Michael Paesold


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Michael Paesold


Tom Lane wrote:

"Michael Paesold" <[EMAIL PROTECTED]> writes:

To have other data, I have retested the patches on a single-cpu Intel P4
3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 
dual-Xeon
results it's clear that this is in reality only one cpu. While the 
runtime

for N=1 is better than the other system, for N=4 it's already worse. The
situation with the patches is quite different, though. Unfortunatly.



CVS tip from 2005-09-12:
1: 36s   2: 77s (cpu ~85%)4: 159s (cpu ~98%)



only slock-no-cmpb:
1: 36s   2: 81s (cpu ~79%)4: 177s (cpu ~94%)
(doesn't help this time)


Hm.  This is the first configuration we've seen in which slock-no-cmpb
was a loss.  Could you double-check that result?


The first tests were compiled with 
CFLAGS='-O2 -mcpu=pentium4 -march=pentium4'. I had redone the tests with 
just CFLAGS='-O2' yesterday. The difference was just about a second, but the 
result from the patch was the same. The results for N=4 and N=8 show the 
positive effect more clearly.


configure: CFLAGS='-O2' --enable-casserts
On RHEL 4.1, gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1)

CVS tip from 2005-09-12:
1: 37s   2: 78s  4: 159s   8: 324

only slock-no-cmpb:
1: 37s   2: 82s (5%) 4: 178s (12%) 8: 362 (12%)

configure:  --enable-casserts

(Btw. I have always done "make clean ; make ; make install" between tests)

Best Regards,
Michael Paesold


I can't see any reasonable way to do runtime switching of the cmpb test
--- whatever logic we put in to control it would cost as much or more
than the cmpb anyway :-(.  I think that has to be a compile-time choice.
From my perspective it'd be acceptable to remove the cmpb only for
x86_64, since only there does it seem to be a really significant win.
On the other hand it seems that removing the cmpb is a net win on most
x86 setups too, so maybe we should just do it and accept that there are
some cases where it's not perfect.


How many test cases do we have yet?
Summary of the effects without the cmpb instruction seems to be:

8-way Opteron:   better
Dual/HT Xeon w/o EM64T:  better
Dual/HT EM64T:   better for N<=cpus, worse for N>cpus (Stephen's)
HT P4 w/o EM64T: worse (stronger for N>cpus)

Have I missed other reports that did test the slock-no-cmpb.patch alone?
Two of the systems with positive effects are x86_64, one is an older 
high-end Intel x86 chip. The negative effect is on a low-cost Pentium 4 with 
only hyper threading. According to the mentions thread's title, this was an 
optimization for hyperthreading, not regular multi-cpus.


We could have more data, especially on newer and high-end systems. Could 
some of you test the slock-no-cmpb.patch? You'll need an otherwise idle 
system to get repeatable results.


http://archives.postgresql.org/pgsql-hackers/2005-09/msg00565.php
http://archives.postgresql.org/pgsql-hackers/2005-09/msg00566.php

I have re-attached the relevant files from Tom's posts because in the 
archive it's not clear anymore what should go into which file. See 
instructions in the first messages above.


The patch applies to CVS tip with
patch -p1 < slock-no-cmpb.patch

Best Regards,
Michael Paesold



slock-no-cmpb.patch
Description: Binary data


test_setup.sql
Description: Binary data


test_run_small.sql
Description: Binary data


startn.sh
Description: Binary data

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Greg Stark

Stephen Frost <[EMAIL PROTECTED]> writes:

> * Tom Lane ([EMAIL PROTECTED]) wrote:
> > I'm starting to think that we might have to succumb to having a compile
> > option "optimize for multiprocessor" or "optimize for single processor".
> > It's pretty hard to see how we'd alter a data structure decision like
> > this on the fly.
> 
> I'd really hate to see this happen.  

I suspect you're thinking too hard about this. It's not like some LWLocks
would need SMP spinlocks and others uniprocessor locks. And it's not like it's
going to change dynamically.

Wouldn't it just be a couple:

if (multiprocessor) {
  complex version
} else {
  simple version
}

Ugly to be sure but it's not going to spread to other areas of the source base
and you know the if statement isn't going to be growing any more arms. 

It's no uglier than doing the same thing with an #ifdef anyways.

If you really really want to have two versions and you think using function
pointers would impose too big a performance penalty you could play linker
games. But that way lies madness.

-- 
greg

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Dave Page

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
> Sent: 13 September 2005 23:03
> To: Marko Kreen
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Spinlocks, yet again: analysis and 
> proposed patches 
> 
> I'm starting to think that we might have to succumb to having 
> a compile
> option "optimize for multiprocessor" or "optimize for single 
> processor".
> It's pretty hard to see how we'd alter a data structure decision like
> this on the fly.

That would be a major pita for packagers of binary distributions. 

Regards, Dave.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-14 Thread Michael Paesold


Tom Lane wrote:


But the cmpb instruction in the 8.0 version of TAS would have done that,
and I think we've already established that the cmpb is a loss on most
machines (except maybe single-physical-CPU Xeons).


Note that this was a regular Pentium 4 system, not a Xeon.

Best Regards,
Michael Paesold

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] About method of PostgreSQL's Optimizer

Re: [HACKERS] About method of PostgreSQL's Optimizer

Re: [HACKERS] Per-table freeze limit proposal

Re: [HACKERS] inverse OR distributive law?

Re: [HACKERS] Constraint Type Coercion issue?

[HACKERS] Per-table freeze limit proposal

Re: [HACKERS] inverse OR distributive law?

[HACKERS] inverse OR distributive law?

[HACKERS] Bug with cursor declaration in PL/pgSQL in CVS tip?

Re: [HACKERS] parameterized fetch

Re: [HACKERS] Constraint Type Coercion issue?

Re: [HACKERS] Constraint Type Coercion issue?

Re: [HACKERS] Constraint Type Coercion issue?

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] Constraint Type Coercion issue?

Re: [HACKERS] About method of PostgreSQL's Optimizer

[HACKERS] Constraint Type Coercion issue?

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] About method of PostgreSQL's Optimizer

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] About method of PostgreSQL's Optimizer

Re: [HACKERS] About method of PostgreSQL's Optimizer

Re: [HACKERS] 8.1 system info / admin functions

[HACKERS] parameterized fetch

Re: [HACKERS] Bug with cursor declaration in plpgsql? (Repost)

Re: [HACKERS] 8.1 system info / admin functions

Re: [HACKERS] VACUUM VERBOSE 8.1dev

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] 8.1 system info / admin functions

Re: [HACKERS] postgresql CVS callgraph data from dbt2

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

35 matches

Site Navigation

Mail list logo

Footer information