Re: [PERFORM] fast read of binary data

2012-11-21 Thread Eildert Groeneveld
On Mo, 2012-11-12 at 12:18 +0100, Albe Laurenz wrote:
> Eildert Groeneveld wrote:
> > I am currently implementing using a compressed binary storage scheme
> > genotyping data. These are basically vectors of binary data which may be
> > megabytes in size.
> > 
> > Our current implementation uses the data type bit varying.
> > 
> > What we want to do is very simple: we want to retrieve such records from
> > the database and transfer it unaltered to the client which will do
> > something (uncompressing) with it. As massive amounts of data are to be
> > moved, speed is of great importance, precluding any to and fro
> > conversions.
> > 
> > Our current implementation uses Perl DBI; we can retrieve the data ok,
> > but apparently there is some converting going on.
> > 
> > Further, we would like to use ODBC from Fortran90 (wrapping the
> > C-library)  for such transfers. However, all sorts funny things happen
> > here which look like conversion issues.
> > 
> > In old fashioned network database some decade ago (in pre SQL times)
> > this was no problem. Maybe there is someone here who knows the PG
> > internals sufficiently well to give advice on how big blocks of memory
> > (i.e. bit varying records) can between transferred UNALTERED between
> > backend and clients.
> 
> Using the C API you can specify binary mode for your data, which
> meand that they won't be converted.
> 
> I don't think you will be able to use this with DBI or ODBC,
> but maybe binary corsors can help
> (http://www.postgresql.org/docs/current/static/sql-declare.html),
> but I don't know if DBI or ODBC handles them well.
> 
> If you can avoid DBI or ODBC, that would be best.
ok, I did have a look at the libpq librar, and you are right, there is a
way to obtain binary data from the backend through the  PQexecParams

res = PQexecParams(conn,
   "DECLARE myportal CURSOR FOR select genotype_bits
from v_genotype_data",
   0,   /* zero param */
   NULL,/* let the backend deduce param type */
   paramValues,
   NULL,/* don't need param lengths since text*/
   NULL,/* default to all text params */
   1);  /* ask for binary results */

genotype_bits is defined as bit varying in the backend. When writing the
results:
for (i = 0; i < PQntuples(res); i++)
{
for (j = 0; j < nFields; j++)
  fwrite(PQgetvalue(res, i, j),10,1,f);
}

it is clear that the results are NOT in binary format:
eg(eno,snp): od -b junk |head
000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060

clearly, these are nice 0 and 1 in ASCII and not as I need it as a bit
stream. 

Also, (and in line with this) PQgetvalue(res, i, j) seems to be of type
text.

What am I missing?






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Jeff Janes
On Wed, Nov 21, 2012 at 8:05 AM, Heikki Linnakangas
 wrote:
> On 21.11.2012 15:42, Kevin Grittner wrote:
>>
>> Better, IMV, would be to identify what sorts of hints people actually
>> find useful, and use that as the basis for TODO items for optimizer
>> improvement as well as inventing clear ways to specify the desired
>> coercion. I liked the suggestion that a CTE which didn't need to be
>> materialized because of side-effects or multiple references have a
>> keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
>> would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
>> specify that.
>
>
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and quals
> involved in the query. A typical source of bad plans is when the planner
> gets its cost estimates wrong. So rather than telling the planner to use a
> nested loop join for "a INNER JOIN b ON a.id = b.id", the user could tell
> the planner that there are only 10 rows that match the "a.id = b.id" qual.

For each a.id there are 10 b.id, or for each b.id there are 10 a.id?

> That gives the planner the information it needs to choose the right plan on
> its own. That kind of hints would be much less implementation specific and
> much more likely to still be useful, or at least not outright
> counter-productive, in a future version with a smarter planner.

When I run into unexpectedly poor performance, I have an intuitive
enough feel for my own data that I know what plan it ought to be
using.  Figuring out why it is not using it is very hard.  For one
thing, EXPLAIN tells you about the "winning" plan, but there is no
visibility into what ought to be the winning plan but isn't, so no way
to see why it isn't.So you first have to use our existing non-hint
hints (enable_*, doing weird things with cost_*, CTE stuff) to trick
it into using the plan I want it to use, before I can figure out why
it isn't using it, before I could figure out what hints of the style
you are suggesting to supply to get it to use it.

So I think the type of hints you are suggesting would be about as hard
for the user to use as debugging the planner for the particular case
would be.  While the more traditional type of hint is easy to use,
because the end user understands their data more than they understand
the guts of the planner.


Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Craig Ringer
On 11/22/2012 08:38 AM, Gavin Flower wrote:
> I suspect most people are blissfully unaware of CTE's being fenced, or
> at least not really sure what it means. So I suspect NOT FENCE would
> be the better default.
It's also probably more standard, and a better fit with what other DBs do.

Pg would still need to detect conditions like the use of functions with
side effects or (obviously) INSERT/UPDATE/DELETE wCTEs and not push
conditions down into them / pull conditions up from them, etc. That's
how I read the standard, though; must have the same effect as if the
queries were executed as written, so Pg is free to transform them so
long as it doesn't change the results.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Craig Ringer
On 11/22/2012 03:30 AM, Gavin Flower wrote:
> On 22/11/12 04:56, Heikki Linnakangas wrote:
>> On 21.11.2012 17:42, Gavin Flower wrote:
>>> On 22/11/12 04:32, Andres Freund wrote:
 On 2012-11-21 10:21:16 -0500, Andrew Dunstan wrote:
> I wasn't talking about removing it. My point was that if the
> optimization
> fence around CTEs is removed a lot of people will need to rework apps
> where
> they have used them for that purpose. And I continue to think that
> spelling
> it "OFFSET 0" is horribly obscure.
 +1
>>
>> FWIW, I'm happy with "OFFSET 0". Granted, it's pretty obscure, but
>> that's what we've historically recommended, and it's pretty ugly to
>> have to specify a fence like that in the first place. Whenever you
>> have to resort to it, you ought have a comment in the query
>> explaining why you need to force the planner like that, anyway.
>>
 WITH foo AS (SELECT ...) (barrier=on|off)?

 9.3 introduces the syntax, defaulting to on
 9.4 switches the default to off.
>>>
>>> WITH foo AS (SELECT ...) (fence=on|off)?
>>>
>>> WITH foo AS (SELECT ...) (optimisation_fence=on|off)?
>>
>> If we are to invent a new syntax for this, can we please come up with
>> something that's more widely applicable than just the WITH syntax.
>> Something that you could use to replace OFFSET 0 in a subquery, too.
>>
>> - Heikki
> WITH FENCE foo AS (SELECT ...)
> default?
That doesn't bind tightly enough to a specific CTE term. Consider:

WITH
  FENCE foo AS (SELECT ...),
  bar AS (SELECT ...)
SELECT * FROM bar;

Are we fencing just foo? Or all expressions?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 02:30 PM, Gavin Flower wrote:

WITH FENCE foo AS (SELECT ...)
default?


WITHOUT FENCE foo AS (SELECT ...) :-)
Nah!
I prefer this, but it is too specific to 'WITH',
and very unSQL standardish!

Alternatively one of the following

 1. WITH UNFENCED foo AS (SELECT ...)
 2. WITH NO FENCE foo AS (SELECT ...)
 3. WITH NOT FENCE foo AS (SELECT ...)

I loke the firsat variant, but the 3rd is
most SQL standardish!



As Tom (I think) pointed out, we should not have a syntax tied to CTEs.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints - experiences from other rdbms

2012-11-21 Thread Willem Leenen



> 
> Rather than telling the planner what to do or not to do, I'd much rather 
> have hints that give the planner more information about the tables and 
> quals involved in the query. A typical source of bad plans is when the 
> planner gets its cost estimates wrong. So rather than telling the 
> planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id", 
> the user could tell the planner that there are only 10 rows that match 
> the "a.id = b.id" qual. 



Instead of gathering statistics for all possible joins ( and join orders) , in 
Oracle there is a functionality that can be switched on where the optimizer is 
given cardinality feedback for the chosen plans, so it can choose another plan 
if the same statement comes around. 

Secondly, there is functionality to insert a hint into an SQL statement. That's 
very good for COTS apps where the statement can't be altered. Now I know that 
there's relatively not much COTS for the Postgresql, ( hence arguments like 'we 
should not implement hints so we're forcing people to solve the underlying 
problem' ), but as Postgresql will replace oracle in the lower end of the 
market, this functionality is usefull. 


  

Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Andreas Kretschmer
Craig James  wrote:

> On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> > Rather than telling the planner what to do or not to do, I'd much rather
> > have hints that give the planner more information about the tables and
> > quals involved in the query. A typical source of bad plans is when the
> > planner gets its cost estimates wrong. So rather than telling the
> > planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> > the user could tell the planner that there are only 10 rows that match
> > the "a.id = b.id" qual. That gives the planner the information it needs
> > to choose the right plan on its own. That kind of hints would be much
> > less implementation specific and much more likely to still be useful, or
> > at least not outright counter-productive, in a future version with a
> > smarter planner.
> >
> > You could also attach that kind of hints to tables and columns, which
> > would be more portable and nicer than decorating all queries.
> 
> I like this idea, but also think that if we have a syntax to allow
> hints, it would be nice to have a simple way to ignore all hints (yes, I
> suppose I'm suggesting yet another GUC). That way after sprinkling your
> SQL with hints, you could easily periodically (e.g. after a Postgres
> upgrade) test what would happen if the hints were removed.
> 
> 
> Or a three-way choice: Allow, ignore, or generate an error.  That would allow
> developers to identify where hints are being used.
> 
> Craig

+1

I think, we HAVE a smart planner, but hints in this direction are okay,
and we need a simple way to make such hints obsolete - for/in the future. 


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Tom Lane
Craig James  writes:
> On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway  wrote:
>> I like this idea, but also think that if we have a syntax to allow
>> hints, it would be nice to have a simple way to ignore all hints (yes, I
>> suppose I'm suggesting yet another GUC). That way after sprinkling your
>> SQL with hints, you could easily periodically (e.g. after a Postgres
>> upgrade) test what would happen if the hints were removed.

> Or a three-way choice: Allow, ignore, or generate an error.  That would
> allow developers to identify where hints are being used.

Throwing errors would likely prevent you from reaching all parts of your
application, thus preventing complete testing.  Much more sensible to
just log such queries.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Joe Conway
On 11/21/2012 09:28 AM, Craig James wrote:
> 
> 
> On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway  > wrote:
> 
> On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> > Rather than telling the planner what to do or not to do, I'd much
> rather
> > have hints that give the planner more information about the tables and
> > quals involved in the query. A typical source of bad plans is when the
> > planner gets its cost estimates wrong. So rather than telling the
> > planner to use a nested loop join for "a INNER JOIN b ON a.id
>  = b.id ",
> > the user could tell the planner that there are only 10 rows that match
> > the "a.id  = b.id " qual. That gives the
> planner the information it needs
> > to choose the right plan on its own. That kind of hints would be much
> > less implementation specific and much more likely to still be
> useful, or
> > at least not outright counter-productive, in a future version with a
> > smarter planner.
> >
> > You could also attach that kind of hints to tables and columns, which
> > would be more portable and nicer than decorating all queries.
> 
> I like this idea, but also think that if we have a syntax to allow
> hints, it would be nice to have a simple way to ignore all hints (yes, I
> suppose I'm suggesting yet another GUC). That way after sprinkling your
> SQL with hints, you could easily periodically (e.g. after a Postgres
> upgrade) test what would happen if the hints were removed.
> 
> 
> Or a three-way choice: Allow, ignore, or generate an error.  That would
> allow developers to identify where hints are being used.

+1

Joe


-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Craig James
On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway  wrote:

> On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> > Rather than telling the planner what to do or not to do, I'd much rather
> > have hints that give the planner more information about the tables and
> > quals involved in the query. A typical source of bad plans is when the
> > planner gets its cost estimates wrong. So rather than telling the
> > planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> > the user could tell the planner that there are only 10 rows that match
> > the "a.id = b.id" qual. That gives the planner the information it needs
> > to choose the right plan on its own. That kind of hints would be much
> > less implementation specific and much more likely to still be useful, or
> > at least not outright counter-productive, in a future version with a
> > smarter planner.
> >
> > You could also attach that kind of hints to tables and columns, which
> > would be more portable and nicer than decorating all queries.
>
> I like this idea, but also think that if we have a syntax to allow
> hints, it would be nice to have a simple way to ignore all hints (yes, I
> suppose I'm suggesting yet another GUC). That way after sprinkling your
> SQL with hints, you could easily periodically (e.g. after a Postgres
> upgrade) test what would happen if the hints were removed.
>

Or a three-way choice: Allow, ignore, or generate an error.  That would
allow developers to identify where hints are being used.

Craig


>
> Joe
> --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
>
>
>


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Joe Conway
On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and
> quals involved in the query. A typical source of bad plans is when the
> planner gets its cost estimates wrong. So rather than telling the
> planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> the user could tell the planner that there are only 10 rows that match
> the "a.id = b.id" qual. That gives the planner the information it needs
> to choose the right plan on its own. That kind of hints would be much
> less implementation specific and much more likely to still be useful, or
> at least not outright counter-productive, in a future version with a
> smarter planner.
> 
> You could also attach that kind of hints to tables and columns, which
> would be more portable and nicer than decorating all queries.

I like this idea, but also think that if we have a syntax to allow
hints, it would be nice to have a simple way to ignore all hints (yes, I
suppose I'm suggesting yet another GUC). That way after sprinkling your
SQL with hints, you could easily periodically (e.g. after a Postgres
upgrade) test what would happen if the hints were removed.

Joe
-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Tom Lane
Claudio Freire  writes:
> collapse_cte_limit?

The join collapse limits address a completely different problem (ie,
explosion of planning time with too many relations), and are pretty much
useless as a model for this.  As multiple people told you already,
optimization fences are typically wanted for only specific subqueries.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Claudio Freire
On Wed, Nov 21, 2012 at 1:35 PM, Andrew Dunstan  wrote:
 Why syntax? What about a guc?

 collapse_cte_limit?
>>>
>>> Because there are very good reasons to want to current behaviour. A guc
>>> is a global either/or so I don't see it helping much.
>>
>> set collapse_cte_limit=8;
>> with blah as (blah) select blah;
>>
>> Not global at all.
>>
>
> Then you have to unset it again, which is ugly. You might even want it
> applying to *part* of a query, not the whole thing, so this strikes me as a
> dead end.

Really?

Because I've seen here people that want it generally (because
Oracle/MSSQL/your favourite db does it), and people that don't want it
(generally because they need it). I haven't seen any mention to mixing
fenced and unfenced usage.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Craig James
On Wed, Nov 21, 2012 at 5:42 AM, Kevin Grittner  wrote:

>
> It's a tough problem. Disguising and not documenting the available
> optimizer hints leads to more reports on where the optimizer should
> be smarter, and has spurred optimizer improvements. ...
> Regarding the above-mentioned benefits we would stand to lose by
> having clear and documented hints, perhaps we could occasionally
> solicit input on where people are finding hints useful to get ideas
> on where we might want to improve the optimizer. As far as worrying
> about people using hints to force a plan which is sub-optimal --
> isn't that getting into nanny mode a bit too much?
>

Toward that end, the hint documentation (which is almost always viewed as
HTML) could be prefaced by a strong suggestion to post performance
questions in this group first, with links to the "subscribe" page and the
"how to report performance problems" FAQ. The hint documentation could even
be minimalistic; suggest to developers that they should post their
problematic queries here before resorting to hints.  That would give the
experts an opportunity to provide the normal advice.  The correct hint
syntax would be suggested only when all other avenues failed.

Craig James


>
> -Kevin
>


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 11:32 AM, Claudio Freire wrote:

On Wed, Nov 21, 2012 at 1:24 PM, Andres Freund  wrote:

On 2012-11-21 13:16:25 -0300, Claudio Freire wrote:

On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund  wrote:

+1

WITH foo AS (SELECT ...) (barrier=on|off)?

9.3 introduces the syntax, defaulting to on
9.4 switches the default to off.

Why syntax? What about a guc?

collapse_cte_limit?

Because there are very good reasons to want to current behaviour. A guc
is a global either/or so I don't see it helping much.

set collapse_cte_limit=8;
with blah as (blah) select blah;

Not global at all.



Then you have to unset it again, which is ugly. You might even want it 
applying to *part* of a query, not the whole thing, so this strikes me 
as a dead end.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andres Freund
On 2012-11-21 13:32:45 -0300, Claudio Freire wrote:
> On Wed, Nov 21, 2012 at 1:24 PM, Andres Freund  wrote:
> > On 2012-11-21 13:16:25 -0300, Claudio Freire wrote:
> >> On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund  wrote:
> >> > +1
> >> >
> >> > WITH foo AS (SELECT ...) (barrier=on|off)?
> >> >
> >> > 9.3 introduces the syntax, defaulting to on
> >> > 9.4 switches the default to off.
> >>
> >> Why syntax? What about a guc?
> >>
> >> collapse_cte_limit?
> >
> > Because there are very good reasons to want to current behaviour. A guc
> > is a global either/or so I don't see it helping much.
> 
> set collapse_cte_limit=8;
> with blah as (blah) select blah;
> 
> Not global at all.

Not very manageable though. And it doesn't help if you need both in a
query which isn't actually that unlikely.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Claudio Freire
On Wed, Nov 21, 2012 at 1:24 PM, Andres Freund  wrote:
> On 2012-11-21 13:16:25 -0300, Claudio Freire wrote:
>> On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund  wrote:
>> > +1
>> >
>> > WITH foo AS (SELECT ...) (barrier=on|off)?
>> >
>> > 9.3 introduces the syntax, defaulting to on
>> > 9.4 switches the default to off.
>>
>> Why syntax? What about a guc?
>>
>> collapse_cte_limit?
>
> Because there are very good reasons to want to current behaviour. A guc
> is a global either/or so I don't see it helping much.

set collapse_cte_limit=8;
with blah as (blah) select blah;

Not global at all.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andres Freund
On 2012-11-21 13:16:25 -0300, Claudio Freire wrote:
> On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund  wrote:
> > +1
> >
> > WITH foo AS (SELECT ...) (barrier=on|off)?
> >
> > 9.3 introduces the syntax, defaulting to on
> > 9.4 switches the default to off.
>
> Why syntax? What about a guc?
>
> collapse_cte_limit?

Because there are very good reasons to want to current behaviour. A guc
is a global either/or so I don't see it helping much.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Claudio Freire
On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund  wrote:
> +1
>
> WITH foo AS (SELECT ...) (barrier=on|off)?
>
> 9.3 introduces the syntax, defaulting to on
> 9.4 switches the default to off.

Why syntax? What about a guc?

collapse_cte_limit?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Heikki Linnakangas

On 21.11.2012 15:42, Kevin Grittner wrote:

Better, IMV, would be to identify what sorts of hints people actually
find useful, and use that as the basis for TODO items for optimizer
improvement as well as inventing clear ways to specify the desired
coercion. I liked the suggestion that a CTE which didn't need to be
materialized because of side-effects or multiple references have a
keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
specify that.


Rather than telling the planner what to do or not to do, I'd much rather 
have hints that give the planner more information about the tables and 
quals involved in the query. A typical source of bad plans is when the 
planner gets its cost estimates wrong. So rather than telling the 
planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id", 
the user could tell the planner that there are only 10 rows that match 
the "a.id = b.id" qual. That gives the planner the information it needs 
to choose the right plan on its own. That kind of hints would be much 
less implementation specific and much more likely to still be useful, or 
at least not outright counter-productive, in a future version with a 
smarter planner.


You could also attach that kind of hints to tables and columns, which 
would be more portable and nicer than decorating all queries.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Heikki Linnakangas

On 21.11.2012 17:42, Gavin Flower wrote:

On 22/11/12 04:32, Andres Freund wrote:

On 2012-11-21 10:21:16 -0500, Andrew Dunstan wrote:

I wasn't talking about removing it. My point was that if the
optimization
fence around CTEs is removed a lot of people will need to rework apps
where
they have used them for that purpose. And I continue to think that
spelling
it "OFFSET 0" is horribly obscure.

+1


FWIW, I'm happy with "OFFSET 0". Granted, it's pretty obscure, but 
that's what we've historically recommended, and it's pretty ugly to have 
to specify a fence like that in the first place. Whenever you have to 
resort to it, you ought have a comment in the query explaining why you 
need to force the planner like that, anyway.



WITH foo AS (SELECT ...) (barrier=on|off)?

9.3 introduces the syntax, defaulting to on
9.4 switches the default to off.


WITH foo AS (SELECT ...) (fence=on|off)?

WITH foo AS (SELECT ...) (optimisation_fence=on|off)?


If we are to invent a new syntax for this, can we please come up with 
something that's more widely applicable than just the WITH syntax. 
Something that you could use to replace OFFSET 0 in a subquery, too.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Peter Geoghegan
On 21 November 2012 15:21, Andrew Dunstan  wrote:
> And I continue to think that spelling it "OFFSET 0" is horribly obscure.

I'm not sure that it's any more obscure than the very idea of an
optimisation fence.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andres Freund
On 2012-11-21 10:21:16 -0500, Andrew Dunstan wrote:
> 
> On 11/21/2012 09:59 AM, Tom Lane wrote:
> >Andrew Dunstan  writes:
> >>If we're going to do it can we please come up with something more
> >>intuitive and much, much more documented than "OFFSET 0"? And if/when we
> >>do this we'll need to have big red warnings all over then release notes,
> >>since a lot of people I know will need to do some extensive remediation
> >>before moving to such a release.
> >The probability that we would actually *remove* that behavior of OFFSET
> >0 is not distinguishable from zero.  I'm not terribly excited about
> >having an alternate syntax to specify an optimization fence, but even
> >if we do create such a thing, there's no need to break the historical
> >usage.
> >
> 
> I wasn't talking about removing it. My point was that if the optimization
> fence around CTEs is removed a lot of people will need to rework apps where
> they have used them for that purpose. And I continue to think that spelling
> it "OFFSET 0" is horribly obscure.

+1

WITH foo AS (SELECT ...) (barrier=on|off)?

9.3 introduces the syntax, defaulting to on
9.4 switches the default to off.

Greetings,

Andres Freund


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 09:59 AM, Tom Lane wrote:

Andrew Dunstan  writes:

If we're going to do it can we please come up with something more
intuitive and much, much more documented than "OFFSET 0"? And if/when we
do this we'll need to have big red warnings all over then release notes,
since a lot of people I know will need to do some extensive remediation
before moving to such a release.

The probability that we would actually *remove* that behavior of OFFSET
0 is not distinguishable from zero.  I'm not terribly excited about
having an alternate syntax to specify an optimization fence, but even
if we do create such a thing, there's no need to break the historical
usage.



I wasn't talking about removing it. My point was that if the 
optimization fence around CTEs is removed a lot of people will need to 
rework apps where they have used them for that purpose. And I continue 
to think that spelling it "OFFSET 0" is horribly obscure.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Tom Lane
Andrew Dunstan  writes:
> If we're going to do it can we please come up with something more 
> intuitive and much, much more documented than "OFFSET 0"? And if/when we 
> do this we'll need to have big red warnings all over then release notes, 
> since a lot of people I know will need to do some extensive remediation 
> before moving to such a release.

The probability that we would actually *remove* that behavior of OFFSET
0 is not distinguishable from zero.  I'm not terribly excited about
having an alternate syntax to specify an optimization fence, but even
if we do create such a thing, there's no need to break the historical
usage.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 08:04 AM, Heikki Linnakangas wrote:

On 21.11.2012 01:53, Tom Lane wrote:

I think the more interesting question is what cases wouldn't be covered
by such a rule.  Typically you need to use OFFSET 0 in situations where
the planner has guessed wrong about costs or rowcounts, and I think
people are likely using WITH for that as well.  Should we be telling
people that they ought to insert OFFSET 0 in WITH queries if they want
to be sure there's an optimization fence?


Yes, I strongly feel that we should. Writing a query using WITH often 
makes it more readable. It would be a shame if people have to refrain 
from using it, because the planner treats it as an optimization fence.







If we're going to do it can we please come up with something more 
intuitive and much, much more documented than "OFFSET 0"? And if/when we 
do this we'll need to have big red warnings all over then release notes, 
since a lot of people I know will need to do some extensive remediation 
before moving to such a release.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Kevin Grittner
Craig Ringer wrote:
> On 11/21/2012 09:35 AM, Craig James wrote:
>> Why not make an explicit hint syntax and document it? I've still
>> don't understand why "hint" is a dirty word in Postgres. There are
>> a half-dozen or so ways in common use to circumvent or correct
>> sub-optimal plans.
> 
> The reason usually given is that hints provide easy workarounds for
> planner and stats issues, so people don't report problems or fix
> the underlying problem.
> 
> Of course, if that's all there was to it, `OFFSET 0` would be made
> into an error or warning, or ignored and not fenced.
> 
> The reality is, as you say, that there's a need, because the
> planner can never be perfect - or rather, if it were nearly
> perfect, it'd take so long to read the stats and calculate plans
> that everything would be glacially slow anyway. The planner has to
> compromise, and so cases will always arise where it needs a little
> help.
> 
> I think it's time to admit that and get the syntax in place for
> CTEs so there's room to optimize them later, rather than cementing
> CTEs-as-fences in forever as a Pg quirk.

It's a tough problem. Disguising and not documenting the available
optimizer hints leads to more reports on where the optimizer should
be smarter, and has spurred optimizer improvements. And many type of
hints would undoubtedly cause people to force what they *think* would
be the best plan in many cases where they are wrong, or become wrong
as data scales up. But it does seem odd every time I hear people
saying that they don't want to eliminate some optimization fence
because "they find it useful" while simultaneously arguing that we
don't have or want hints.

Having a way to coerce the optimizer from the plan it would take with
straightforward coding *is* a hint, and one down-side of hiding the
hints inside syntax mostly supported for other reasons is that people
who don't know about these clever devices can't do reasonable
refactoring of queries for readability without risking performance
regressions. Another down-side is that perfectly reasonable queries
ported from other databases that use hint syntax for hints run afoul
of the secret hints when trying to run queries on PostgreSQL, and get
performance potentially orders of magnitude worse than they expect.

I'm not sure what the best answer is, but as long as we have hints,
but only through OFFSET 0 or CTE usage, that should be documented.
Better, IMV, would be to identify what sorts of hints people actually
find useful, and use that as the basis for TODO items for optimizer
improvement as well as inventing clear ways to specify the desired
coercion. I liked the suggestion that a CTE which didn't need to be
materialized because of side-effects or multiple references have a
keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
specify that.

Regarding the above-mentioned benefits we would stand to lose by
having clear and documented hints, perhaps we could occasionally
solicit input on where people are finding hints useful to get ideas
on where we might want to improve the optimizer. As far as worrying
about people using hints to force a plan which is sub-optimal --
isn't that getting into nanny mode a bit too much?

-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Shaun Thomas

On 11/20/2012 08:15 PM, Craig Ringer wrote:


I think it's time to admit that and get the syntax in place for CTEs so
there's room to optimize them later, rather than cementing
CTEs-as-fences in forever as a Pg quirk.


I know I'm just some schmo, but I'd vote for this. I'm certainly guilty 
of using OFFSET 0. Undocumented hints are still hints. As much as I 
think they're a bad idea by cementing a certain plan that may not get 
the benefits of future versions, non-intuitive side-effects by using 
overloaded syntax are worse.


I've been using CTEs as temp tables because I know that's how they work. 
But I'd be more than willing to modify my syntax one way or the other to 
adopt non-materialized CTEs, provided there's some way to get the 
current behavior.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Peter Geoghegan
On 21 November 2012 13:04, Heikki Linnakangas  wrote:
> Yes, I strongly feel that we should. Writing a query using WITH often makes
> it more readable. It would be a shame if people have to refrain from using
> it, because the planner treats it as an optimization fence.

+1

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Heikki Linnakangas

On 21.11.2012 01:53, Tom Lane wrote:

I think the more interesting question is what cases wouldn't be covered
by such a rule.  Typically you need to use OFFSET 0 in situations where
the planner has guessed wrong about costs or rowcounts, and I think
people are likely using WITH for that as well.  Should we be telling
people that they ought to insert OFFSET 0 in WITH queries if they want
to be sure there's an optimization fence?


Yes, I strongly feel that we should. Writing a query using WITH often 
makes it more readable. It would be a shame if people have to refrain 
from using it, because the planner treats it as an optimization fence.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance