Re: [GENERAL] A query planner that learns

2006-10-17 Thread AgentM


On Oct 16, 2006, at 16:17 , Madison Kelly wrote:


Alvaro Herrera wrote:

Jochem van Dieten wrote:

Scott Marlowe wrote:
While all the talk of a hinting system over in hackers and  
perform is
good, and I have a few queries that could live with a simple  
hint system
pop up now and again, I keep thinking that a query planner that  
learns

from its mistakes over time is far more desirable.
Is it reasonable or possible for the system to have a way to  
look at
query plans it's run and look for obvious mistakes its made,  
like being
off by a factor of 10 or more in estimations, and slowly learn  
to apply

its own hints?
Technically it is very feasible. But I think you might want to  
check US Patent 6,763,359 before you start writing any code.
I think it would be a very good idea if you guys stopped looking  
at the
US patent database.  It does no good to anyone.  There's no way we  
can
avoid stomping on a patent or another -- there are patents for  
everything.


Hasn't IBM release a pile of it's patents for use (or at least  
stated they won't sue) to OSS projects? If so, is this patent  
covered by that amnesty?


Simply ignoring patents because there is a patent for everything  
is a recipe for disaster. Companies like MS are running out of ways  
to tear open OSS and they are certainly not above (below?) suing  
the heck out of OSS projects for patent infringement.


What's needed is reform in the USPO. Call you congress (wo)man and  
complain, but don't flaunt the law; you will lose.


Alvaro's advice is sound. If the patent holder can prove that a  
developer looked at a patent (for example, from an email in a mailing  
list archive) and the project proceeded with the implementation  
regardless, malice can been shown and damages can be substantially  
higher. You're screwed either way but your safest bet is to never  
look at patents.


Disclaimer: I am not a lawyer- I don't even like lawyers.

-M

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Madison Kelly

AgentM wrote:
Alvaro's advice is sound. If the patent holder can prove that a 
developer looked at a patent (for example, from an email in a mailing 
list archive) and the project proceeded with the implementation 
regardless, malice can been shown and damages can be substantially 
higher. You're screwed either way but your safest bet is to never look 
at patents.


Disclaimer: I am not a lawyer- I don't even like lawyers.



Nor am I a lawyer, but I still hold that hoping ignorance will be a 
decent defense is very, very risky. In the end I am not a pgSQL 
developer so it isn't in my hands either way.


Madi

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Brian Mathis
On 10/17/06, Madison Kelly [EMAIL PROTECTED] wrote:
AgentM wrote: Alvaro's advice is sound. If the patent holder can prove that a developer looked at a patent (for example, from an email in a mailing list archive) and the project proceeded with the implementation
 regardless, malice can been shown and damages can be substantially higher. You're screwed either way but your safest bet is to never look at patents. Disclaimer: I am not a lawyer- I don't even like lawyers.
Nor am I a lawyer, but I still hold that hoping ignorance will be adecent defense is very, very risky. In the end I am not a pgSQLdeveloper so it isn't in my hands either way.Madi
I also am NAL, but I know enough about the patent system (in the US) to know that ignorance *IS* a defense. If you are ignorant of the patent, you only have to pay the damages. If you knew about the patent and did it anyway, you have to pay *triple* damages. Ignorance will save you lots of money.
You may not like it, but that's the way it is.


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Madison Kelly

Brian Mathis wrote:
I also am NAL, but I know enough about the patent system (in the US) to 
know that ignorance *IS* a defense.  If you are ignorant of the patent, 
you only have to pay the damages.  If you knew about the patent and did 
it anyway, you have to pay *triple* damages.  Ignorance will save you 
lots of money.


You may not like it, but that's the way it is.



I got that part. :) If you _do_ end up in court, plausible deniability 
helps.


My position though is that it is better, in the long term, to be aware 
of the patents and take the time to work around them so that *no* 
damages need to be paid. Or, as might be that chance in this case, to 
get a written okay from the patent holder for the use of the methods 
protected by the patent in a given program.


Colour me funny, but wouldn't staying out of the courts in the first 
place not be the best option?


Madi

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Brian Mathis
On 10/17/06, Madison Kelly [EMAIL PROTECTED] wrote:
Brian Mathis wrote: I also am NAL, but I know enough about the patent system (in the US) to know that ignorance *IS* a defense.If you are ignorant of the patent, you only have to pay the damages.If you knew about the patent and did
 it anyway, you have to pay *triple* damages.Ignorance will save you lots of money. You may not like it, but that's the way it is.I got that part. :) If you _do_ end up in court, plausible deniability
helps.My position though is that it is better, in the long term, to be awareof the patents and take the time to work around them so that *no*damages need to be paid. Or, as might be that chance in this case, to
get a written okay from the patent holder for the use of the methodsprotected by the patent in a given program.Colour me funny, but wouldn't staying out of the courts in the firstplace not be the best option?
MadiYes, good idea :)


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Scott Ribe
 Colour me funny, but wouldn't staying out of the courts in the first
 place not be the best option?

Yes, however some people feel that given the way the patent office is
spewing huge quantities of patents, many on old well-known techniques, and
the the absurd difficulty of reading patent claims, and the deliberately
vague language used in the hope of broadening the scope later, that it is
actually not possible to keep track of them, therefore not possible to avoid
infringement with any certainty. So, better to fall back to the 2nd-best
position...

It's a difficult question to answer because there are 0 good answers.

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Alvaro Herrera
Madison Kelly wrote:
 Brian Mathis wrote:
 I also am NAL, but I know enough about the patent system (in the US) to 
 know that ignorance *IS* a defense.  If you are ignorant of the patent, 
 you only have to pay the damages.  If you knew about the patent and did 
 it anyway, you have to pay *triple* damages.  Ignorance will save you 
 lots of money.
 
 You may not like it, but that's the way it is.
 
 
 I got that part. :) If you _do_ end up in court, plausible deniability 
 helps.
 
 My position though is that it is better, in the long term, to be aware 
 of the patents and take the time to work around them so that *no* 
 damages need to be paid. Or, as might be that chance in this case, to 
 get a written okay from the patent holder for the use of the methods 
 protected by the patent in a given program.
 
 Colour me funny, but wouldn't staying out of the courts in the first 
 place not be the best option?

Yeah.  I invite you to do all the extra (useless) development work
required.  But please do not charge other people with it.  Whoever
investigates patents and lets pgsql-hackers know about them, is charging
the Postgres community with that work.  We sure don't need it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Madison Kelly

Alvaro Herrera wrote:

Yeah.  I invite you to do all the extra (useless) development work
required.  But please do not charge other people with it.  Whoever
investigates patents and lets pgsql-hackers know about them, is charging
the Postgres community with that work.  We sure don't need it.


As a developer (different project that uses pgsql), I am no more happy 
about the current status of the USPO than you are. I think it's a giant 
cess-pool of greed, ignorance and waste... BUT... it is currently the 
law in the USA.


How fast would Oracle, Microsoft or others jump on a chance to tear 
PostgreSQL apart by dragging it through the courts and crushing it under 
any fines (regardless of the amount)? If that happened, *all* of the 
pgsql-hacker's time would be wasted, never mind the countless other 
projects that rely on PgSQL.


As Scott said, there are 0 good answers to this problem. If PgSQL is 
going to be active in the US then it has to play by the (bent and 
twisted) rules. That is why I started my comments with call your 
congress(wo)man. Simply ignoring that which you don't like is just far 
to risky with the sharks in our waters.


It's extra work, and it's unfair to the developers who already put their 
free time into this program, but IMHO it's the only safe way to go. 
Again though, not being a pgsql-hacker my comments here are just that.


Madi

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] A query planner that learns

2006-10-17 Thread AgentM


On Oct 17, 2006, at 10:46 , Madison Kelly wrote:


Brian Mathis wrote:
I also am NAL, but I know enough about the patent system (in the  
US) to know that ignorance *IS* a defense.  If you are ignorant of  
the patent, you only have to pay the damages.  If you knew about  
the patent and did it anyway, you have to pay *triple* damages.   
Ignorance will save you lots of money.

You may not like it, but that's the way it is.


I got that part. :) If you _do_ end up in court, plausible  
deniability helps.


My position though is that it is better, in the long term, to be  
aware of the patents and take the time to work around them so that  
*no* damages need to be paid. Or, as might be that chance in this  
case, to get a written okay from the patent holder for the use of  
the methods protected by the patent in a given program.


Colour me funny, but wouldn't staying out of the courts in the  
first place not be the best option?


That would be a nice, but naïve, approach. It is likely that (without  
admitting any guilt by any party) postgresql already steps on some  
patents. In fact, any project you can think of likely steps on some  
patents. There are patents on network communication: the getting a  
message from a server to client sort of thing.


If you spent the next twenty years searching through patents and  
creating patches for postgresql to circumvent the patents, not only  
would you turn postgresql into a shriveled raisin of its current  
self, you would be exposing postgresql to greater damages than if you  
had never looked at the patents.


***Please- for the safety of everyone on these lists- do not mention  
or link to any specific patents. This mailing list has a publicly- 
accessible archive which could be used against me or you (at least in  
the USA).***


Best regards,
M

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] A query planner that learns

2006-10-17 Thread Rafal Pietrak
On Tue, 2006-10-17 at 10:24 -0400, Madison Kelly wrote:
 Nor am I a lawyer, but I still hold that hoping ignorance will be a 
 decent defense is very, very risky. In the end I am not a pgSQL 
 developer so it isn't in my hands either way.

If I may.

The hoping, ignorance will save you line of defence is in fact very
very risky.

But civil disobidience is not. Well, it may be risky, but it is valid.

There is already strong support for the opinion (e.g. European battle
against software patents), that current patent/copyright regulations,
are devastating to inovations. And in fact contradict their 'intended
goal' - the stimulation of inovations.

At this point, there is no other way but to ignore such legislation - to
the point where we can: like, by not reading patent files. Be warrned,
when the law strikes back, you loose.

Calming down a little, I'd add, that this particular piece of 16'th
century law does not quite fit todays reality.

Just my 2c  and I admit, it's not on the subject in subject :)

-- 
-R

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-16 Thread Alban Hertroys

Jim C. Nasby wrote:

On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.


Yeah, that would be useful to have. The problem I see is storing that
info in a format that's actually useful... and I'm thinking that a
logfile doesn't qualify since you can't really query it.


You'd need something that contains the query plan (obviously), but also 
all relevant information about the underlying data model and data 
distribution. Some meta-data, like the version of PostgreSQL, is 
probably required as well.


The current statistics contain some of this information, but from 
reading this list I know that that's rarely enough information to 
determine an error made by the planner.


Regards,
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] A query planner that learns

2006-10-16 Thread Jochem van Dieten

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?


Technically it is very feasible. But I think you might want to check US 
Patent 6,763,359 before you start writing any code.




It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info.  Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful.  I could see something like that
eventually evolving into a self tuning system.


I think it would be a good start if we can specify a 
log_selectivity_error_threshold and if estimates are more then that 
factor off, the query, parameters and planner estimates get logged for 
later analysis. That would be driven entirely by selectivity estimates 
and not (estimated) cost since cost is influenced by outside factors 
such as other processes competing for resources. If a system for 
statistical hints emerges from the current discussion we would indeed 
have the input to start tuning the selectivity estimations.


Jochem


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-16 Thread Alvaro Herrera
Jochem van Dieten wrote:
 Scott Marlowe wrote:
 While all the talk of a hinting system over in hackers and perform is
 good, and I have a few queries that could live with a simple hint system
 pop up now and again, I keep thinking that a query planner that learns
 from its mistakes over time is far more desirable.
 
 Is it reasonable or possible for the system to have a way to look at
 query plans it's run and look for obvious mistakes its made, like being
 off by a factor of 10 or more in estimations, and slowly learn to apply
 its own hints?
 
 Technically it is very feasible. But I think you might want to check US 
 Patent 6,763,359 before you start writing any code.

I think it would be a very good idea if you guys stopped looking at the
US patent database.  It does no good to anyone.  There's no way we can
avoid stomping on a patent or another -- there are patents for everything.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] A query planner that learns

2006-10-16 Thread John D. Burger

Jochem van Dieten wrote:

I think you might want to check US Patent 6,763,359 before you  
start writing any code.


http://tinyurl.com/yzjdve

- John D. Burger
  MITRE


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-16 Thread Madison Kelly

Alvaro Herrera wrote:

Jochem van Dieten wrote:

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns

from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?
Technically it is very feasible. But I think you might want to check US 
Patent 6,763,359 before you start writing any code.


I think it would be a very good idea if you guys stopped looking at the
US patent database.  It does no good to anyone.  There's no way we can
avoid stomping on a patent or another -- there are patents for everything.



Hasn't IBM release a pile of it's patents for use (or at least stated 
they won't sue) to OSS projects? If so, is this patent covered by that 
amnesty?


Simply ignoring patents because there is a patent for everything is a 
recipe for disaster. Companies like MS are running out of ways to tear 
open OSS and they are certainly not above (below?) suing the heck out of 
OSS projects for patent infringement.


What's needed is reform in the USPO. Call you congress (wo)man and 
complain, but don't flaunt the law; you will lose.


Madi

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] A query planner that learns

2006-10-16 Thread Alvaro Herrera
Madison Kelly wrote:
 Alvaro Herrera wrote:
 Jochem van Dieten wrote:
 Scott Marlowe wrote:
 While all the talk of a hinting system over in hackers and perform is
 good, and I have a few queries that could live with a simple hint system
 pop up now and again, I keep thinking that a query planner that learns
 from its mistakes over time is far more desirable.
 Is it reasonable or possible for the system to have a way to look at
 query plans it's run and look for obvious mistakes its made, like being
 off by a factor of 10 or more in estimations, and slowly learn to apply
 its own hints?
 Technically it is very feasible. But I think you might want to check US 
 Patent 6,763,359 before you start writing any code.
 
 I think it would be a very good idea if you guys stopped looking at the
 US patent database.  It does no good to anyone.  There's no way we can
 avoid stomping on a patent or another -- there are patents for everything.
 
 Hasn't IBM release a pile of it's patents for use (or at least stated 
 they won't sue) to OSS projects? If so, is this patent covered by that 
 amnesty?

This is useless as a policy, because we have plenty of companies basing
their proprietary code on PostgreSQL, which wouldn't be subject to the
grant (EnterpriseDB, Command Prompt, Fujitsu, SRA).  We do support them.


 Simply ignoring patents because there is a patent for everything is a 
 recipe for disaster. Companies like MS are running out of ways to tear 
 open OSS and they are certainly not above (below?) suing the heck out of 
 OSS projects for patent infringement.

It has been said that unknowingly infringing a patent is much less
problematic than knowingly doing same.  We don't have the manpower to
implement the whole Postgres without infringing a single patent, so the
best approach is to refrain from researching possible patents applying
to us in the first place.

If people comes here and points at patents that we infringe or may
infringe, it will cause much more (useless) work for hackers which then
have to search alternative ways of doing the same things.


 What's needed is reform in the USPO. Call you congress (wo)man and 
 complain, but don't flaunt the law; you will lose.

I agree.  However, I am not an US inhabitant in the first place, and
bless my parents for that.  Heck, I was even denied a visa -- twice.
Please do us all a favor and write to your congresspeople.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] A query planner that learns

2006-10-16 Thread Madison Kelly

Alvaro Herrera wrote:
Hasn't IBM release a pile of it's patents for use (or at least stated 
they won't sue) to OSS projects? If so, is this patent covered by that 
amnesty?


This is useless as a policy, because we have plenty of companies basing
their proprietary code on PostgreSQL, which wouldn't be subject to the
grant (EnterpriseDB, Command Prompt, Fujitsu, SRA).  We do support them.


More specifically, it is then up to the 3rd party (non-OSS) developers 
to clear the patents. It's not the PgSQL team's problem in this case 
(assuming it's the case at all).


Simply ignoring patents because there is a patent for everything is a 
recipe for disaster. Companies like MS are running out of ways to tear 
open OSS and they are certainly not above (below?) suing the heck out of 
OSS projects for patent infringement.


It has been said that unknowingly infringing a patent is much less
problematic than knowingly doing same.  We don't have the manpower to
implement the whole Postgres without infringing a single patent, so the
best approach is to refrain from researching possible patents applying
to us in the first place.

If people comes here and points at patents that we infringe or may
infringe, it will cause much more (useless) work for hackers which then
have to search alternative ways of doing the same things.


Plausible Deniability and all that jazz. There is another truism 
though; Ignorance of the law is no excuse. Besides, claiming ignorance 
doesn't keep you out of the courts in the first place. The people who 
would attack OSS applications generally have very, very deep pockets and 
can run a project out of money before the trial was over. They could do 
that non-the-less (SCO, hello?) but I still suggest NOT tempting fate.


It's unfortunate that this legal mess causes the developers more 
headaches than they need, but it still can't be ignored, imho.


What's needed is reform in the USPO. Call you congress (wo)man and 
complain, but don't flaunt the law; you will lose.


I agree.  However, I am not an US inhabitant in the first place, and
bless my parents for that.  Heck, I was even denied a visa -- twice.
Please do us all a favor and write to your congresspeople.


Heh, I'm not an American either, so I can't do anything but shake my 
head (and be equally glad that my own personal OSS program is here in 
Canada).


American industry wonders why they are losing so many IT positions... 
It's such a difficult and unfriendly environment there for anyone but 
the biggest companies. Far too litigious.


Madi

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-14 Thread Guy Rouillier
Jim C. Nasby wrote:
 
 Just recording the query plan and actual vs estimated rowcounts would
 be a good start, though. And useful to DBA's, provided you had some
 means to query against it.  

If the DBMS stores the incoming query (or some parsed version of it) and
identifies frequency of usage over time, it can then spend spare cycles
more deeply analyzing frequently used queries.  Many DB servers have
usage patterns just like end-user workstations (not all, I realize):
they are busy for predictable periods of the day and fairly idle at
other times.  To provide acceptable response times, DBMSs don't have the
luxury of analyzing numerous query paths when queries are received.  But
they could do this offline so that the next time it sees the same query,
it can use a better-optimized plan.

Storing the query itself is probably a better idea than storing the
plan, since the plan may change over time.

-- 
Guy Rouillier


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] A query planner that learns

2006-10-14 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/13/06 10:47, John D. Burger wrote:
 Erik Jones wrote:
 
[snip]
 But with both approaches, the planner is just using the static
 statistics gathered by ANALYZE to estimate the cost of each candidate
 plan, and these statistics are based on sampling your data - they may be
 wrong, or at least misleading.  (In particular, the statistic for total
 number of unique values is frequently =way= off, per a recent thread
 here.  I have been reading about this, idly thinking about how to
 improve the estimate.)

What about an ANALYZE FULL table, reading every record in the
table, and ever node in every index, storing in pg_statistic (or
some new, similar table) such items as the AVG and STD of the number
of records in each page, and b-tree depth, keys per node, records
per key and per segment?  Maybe even average distance between pages
in the tablespace.

This would let the optimizer know things like the value of the
field which is first segment of an index (and which is the only part
of the index in the WHERE clause) describes 75% of the rows in the
table, and the records are all packed in tightly in the pages, and
the pages are close together, so the optimizer could decide a
table scan would be much more efficient.

In some ways, this would be similar in functionality to the existing
histogram created by ANALYZE, but would provide a slightly different
picture.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is common sense really valid?
For example, it is common sense to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that common sense is obviously wrong.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFMcu5S9HxQb37XmcRAvVJAJ0VFfEoxwrKn15VqPaZz54SNY4tPACg47zB
r3hZ+HqHE/1bCJK/xNZzNRE=
=OP9+
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] A query planner that learns

2006-10-13 Thread Erik Jones
Forgive me if I'm way off here as I'm not all that familiar with the 
internals of postgres, but isn't this what the genetic query optimizer 
discussed the one of the manual's appendixes is supposed to do.  Or, is 
that more about using the the known costs of atomic operations that make 
up a query plan, i.e. more of a bottom up approach than what you're 
discussing.  If it's the latter, then it sounds like what you're looking 
for is a classifier system.  See a 
href=http://en.wikipedia.org/wiki/Learning_classifier_system;this/a 
wikepedia article for a short description of them along with a couple of 
reference links, you can google for many more.


Scott Marlowe wrote:

On Thu, 2006-10-12 at 17:14, Jim C. Nasby wrote:
  

On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:


While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info.  Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful.  I could see something like that
eventually evolving into a self tuning system.
  
 
Saves it and then... does what? That's the whole key...



It's meant as a first step.  I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze.  Again, my
point was to do it incrementally.  This is something someone could do
now, and someone could build on later.

To start with, it does nothing.  Just saves it for the DBA to look at. 
Later, it could feed any number of the different hinting systems people

have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

  

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it.  But it would be a very cool project for the next college student
who shows up looking for one.
  

Why? There's a huge demand for PostgreSQL experts out there... or is
this for a current job?



Long story.  I do get to do lots of pgsql stuff.  But right now I'm
learning Oracle as well, cause we use both DBs.  It's just that since I
know pgsql pretty well, and know oracle hardly at all, Oracle is taking
up lots more of my time.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
  



--
erik jones [EMAIL PROTECTED]
software development
emma(r)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] A query planner that learns

2006-10-13 Thread John D. Burger

Erik Jones wrote:

Forgive me if I'm way off here as I'm not all that familiar with  
the internals of postgres, but isn't this what the genetic query  
optimizer discussed the one of the manual's appendixes is supposed  
to do.


No - it's not an optimizer in that sense.  When there are a small  
enough set of tables involved, the planner uses a dynamic programming  
algorithm to explore the entire space of all possible plans.  But the  
space grows exponentially (I think) with the number of tables - when  
this would take too long, the planner switches to a genetic algorithm  
approach, which explores a small fraction of the plan space, in a  
guided manner.


But with both approaches, the planner is just using the static  
statistics gathered by ANALYZE to estimate the cost of each candidate  
plan, and these statistics are based on sampling your data - they may  
be wrong, or at least misleading.  (In particular, the statistic for  
total number of unique values is frequently =way= off, per a recent  
thread here.  I have been reading about this, idly thinking about how  
to improve the estimate.)


The idea of a learning planner, I suppose, would be one that examines  
cases where these statistics lead to very misguided expectations.   
The simplest version of a learning planner could simply bump up the  
statistics targets on certain columns.  A slightly more sophisticated  
idea would be for some of the statistics to optionally use parametric  
modeling (this column is a Gaussian, let's estimate the mean and  
variance, this one is a Beta distribution ...).  Then the smarter  
planner could spend some cycles applying more sophisticated  
statistical modeling to problematic tables/columns.


- John D. Burger
  MITRE


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-13 Thread AgentM


On Oct 13, 2006, at 11:47 , John D. Burger wrote:


Erik Jones wrote:

Forgive me if I'm way off here as I'm not all that familiar with  
the internals of postgres, but isn't this what the genetic query  
optimizer discussed the one of the manual's appendixes is supposed  
to do.


No - it's not an optimizer in that sense.  When there are a small  
enough set of tables involved, the planner uses a dynamic  
programming algorithm to explore the entire space of all possible  
plans.  But the space grows exponentially (I think) with the number  
of tables - when this would take too long, the planner switches to  
a genetic algorithm approach, which explores a small fraction of  
the plan space, in a guided manner.


But with both approaches, the planner is just using the static  
statistics gathered by ANALYZE to estimate the cost of each  
candidate plan, and these statistics are based on sampling your  
data - they may be wrong, or at least misleading.  (In particular,  
the statistic for total number of unique values is frequently =way=  
off, per a recent thread here.  I have been reading about this,  
idly thinking about how to improve the estimate.)


The idea of a learning planner, I suppose, would be one that  
examines cases where these statistics lead to very misguided  
expectations.  The simplest version of a learning planner could  
simply bump up the statistics targets on certain columns.  A  
slightly more sophisticated idea would be for some of the  
statistics to optionally use parametric modeling (this column is a  
Gaussian, let's estimate the mean and variance, this one is a Beta  
distribution ...).  Then the smarter planner could spend some  
cycles applying more sophisticated statistical modeling to  
problematic tables/columns.


One simple first step would be to run an ANALYZE whenever a  
sequential scan is executed. Is there a reason not to do this? It  
could be controlled by a GUC variable in case someone wants  
repeatable plans.


Further down the line, statistics could be collected during the  
execution of any query- updating histograms on delete and update, as  
well.


-M

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-13 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:
   It seems to me the first logical step would be having the ability to
   flip a switch and when the postmaster hits a slow query, it saves both
   the query that ran long, as well as the output of explain or explain
   analyze or some bastardized version missing some of the inner timing
   info.  Even just saving the parts of the plan where the planner thought
   it would get 1 row and got instead 350,000 and was using a nested loop
   to join would be VERY useful.  I could see something like that
   eventually evolving into a self tuning system.
   
  Saves it and then... does what? That's the whole key...
 
 It's meant as a first step.  I could certainly use a daily report on
 which queries had bad plans so I'd know which ones to investigate
 without having to run them each myself in explain analyze.  Again, my
 point was to do it incrementally.  This is something someone could do
 now, and someone could build on later.
 
 To start with, it does nothing.  Just saves it for the DBA to look at. 
 Later, it could feed any number of the different hinting systems people
 have been proposing.
 
 It may well be that by first looking at the data collected from problems
 queries, the solution for how to adjust the planner becomes more
 obvious.

Yeah, that would be useful to have. The problem I see is storing that
info in a format that's actually useful... and I'm thinking that a
logfile doesn't qualify since you can't really query it.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] A query planner that learns

2006-10-13 Thread Jim C. Nasby
On Fri, Oct 13, 2006 at 11:53:15AM -0400, AgentM wrote:
 One simple first step would be to run an ANALYZE whenever a  
 sequential scan is executed. Is there a reason not to do this? It  

Yes. You want a seqscan on a small (couple pages) table, and ANALYZE has
a very high overhead on some platforms.

Just recording the query plan and actual vs estimated rowcounts would be
a good start, though. And useful to DBA's, provided you had some means
to query against it.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] A query planner that learns

2006-10-13 Thread Scott Marlowe
On Fri, 2006-10-13 at 12:48, Jim C. Nasby wrote:
 On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:
It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info.  Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful.  I could see something like that
eventually evolving into a self tuning system.

   Saves it and then... does what? That's the whole key...
  
  It's meant as a first step.  I could certainly use a daily report on
  which queries had bad plans so I'd know which ones to investigate
  without having to run them each myself in explain analyze.  Again, my
  point was to do it incrementally.  This is something someone could do
  now, and someone could build on later.
  
  To start with, it does nothing.  Just saves it for the DBA to look at. 
  Later, it could feed any number of the different hinting systems people
  have been proposing.
  
  It may well be that by first looking at the data collected from problems
  queries, the solution for how to adjust the planner becomes more
  obvious.
 
 Yeah, that would be useful to have. The problem I see is storing that
 info in a format that's actually useful... and I'm thinking that a
 logfile doesn't qualify since you can't really query it.

grep / sed / awk can do amazing things to a text file.  

I'd actually recommend URL encoding (or something like that) so they'd
be single lines, then you could grep for certain things and feed the
lines to a simple de-encoder.

We do it with our log files at work and can search through some fairly
large files for the exact entry we need fairly quickly.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] A query planner that learns

2006-10-12 Thread Scott Marlowe
While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info.  Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful.  I could see something like that
eventually evolving into a self tuning system.

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it.  But it would be a very cool project for the next college student
who shows up looking for one.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:
 While all the talk of a hinting system over in hackers and perform is
 good, and I have a few queries that could live with a simple hint system
 pop up now and again, I keep thinking that a query planner that learns
 from its mistakes over time is far more desirable.
 
 Is it reasonable or possible for the system to have a way to look at
 query plans it's run and look for obvious mistakes its made, like being
 off by a factor of 10 or more in estimations, and slowly learn to apply
 its own hints?
 
 Seems to me that would be far more useful than my having to babysit the
 queries that are running slow and come up with hints to have the
 database do what I want.
 
 I already log slow queries and review them once a week by running them
 with explain analyze and adjust what little I can, like stats targets
 and such.
 
 It seems to me the first logical step would be having the ability to
 flip a switch and when the postmaster hits a slow query, it saves both
 the query that ran long, as well as the output of explain or explain
 analyze or some bastardized version missing some of the inner timing
 info.  Even just saving the parts of the plan where the planner thought
 it would get 1 row and got instead 350,000 and was using a nested loop
 to join would be VERY useful.  I could see something like that
 eventually evolving into a self tuning system.
 
Saves it and then... does what? That's the whole key...

 Well, I'm busy learning to be an Oracle DBA right now, so I can't do
 it.  But it would be a very cool project for the next college student
 who shows up looking for one.

Why? There's a huge demand for PostgreSQL experts out there... or is
this for a current job?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] A query planner that learns

2006-10-12 Thread Scott Marlowe
On Thu, 2006-10-12 at 17:14, Jim C. Nasby wrote:
 On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:
  While all the talk of a hinting system over in hackers and perform is
  good, and I have a few queries that could live with a simple hint system
  pop up now and again, I keep thinking that a query planner that learns
  from its mistakes over time is far more desirable.
  
  Is it reasonable or possible for the system to have a way to look at
  query plans it's run and look for obvious mistakes its made, like being
  off by a factor of 10 or more in estimations, and slowly learn to apply
  its own hints?
  
  Seems to me that would be far more useful than my having to babysit the
  queries that are running slow and come up with hints to have the
  database do what I want.
  
  I already log slow queries and review them once a week by running them
  with explain analyze and adjust what little I can, like stats targets
  and such.
  
  It seems to me the first logical step would be having the ability to
  flip a switch and when the postmaster hits a slow query, it saves both
  the query that ran long, as well as the output of explain or explain
  analyze or some bastardized version missing some of the inner timing
  info.  Even just saving the parts of the plan where the planner thought
  it would get 1 row and got instead 350,000 and was using a nested loop
  to join would be VERY useful.  I could see something like that
  eventually evolving into a self tuning system.
  
 Saves it and then... does what? That's the whole key...

It's meant as a first step.  I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze.  Again, my
point was to do it incrementally.  This is something someone could do
now, and someone could build on later.

To start with, it does nothing.  Just saves it for the DBA to look at. 
Later, it could feed any number of the different hinting systems people
have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

  Well, I'm busy learning to be an Oracle DBA right now, so I can't do
  it.  But it would be a very cool project for the next college student
  who shows up looking for one.
 
 Why? There's a huge demand for PostgreSQL experts out there... or is
 this for a current job?

Long story.  I do get to do lots of pgsql stuff.  But right now I'm
learning Oracle as well, cause we use both DBs.  It's just that since I
know pgsql pretty well, and know oracle hardly at all, Oracle is taking
up lots more of my time.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly