Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Tomas Vondra
Dne 21.12.2010 20:03, Jeremy Harris napsal(a):
> On 2010-12-21 18:50, Tomas Vondra wrote:
>>> Then the index you just built gets automatically dropped, as I said
>>> above.
>>
>> I'm a bit confused. Should the indexes be dropped automatically (as you
>> state here) or kept for the future. Because if they should be dropped,
>> then it does not make sense to do this magic just for a limited time
>> after the DB goes live.
> 
> Here what I said:
> "track those that actually get re-used and remove the rest".
> 
> Which part is confusing?

As I described, identifying which indexes are actually used is a very
tricky task. And it's not difficult to come up with scenarios where this
causes significantly more harm than good.

Basically the time to keep the indices needs to be long enough that the
indexes that are actually used are not dropped (and the resources spent
creating them actually pays off). But on the other side it needs to be
short so that resources are not wasted because of unused indices.

Which are clearly contradictory requirements.

And the 'limits' you've proposed make that even worse, because when the
unnecessary indices get created first and take most of the resources
(e.g. disk space), then the indexes that are actually needed won't be
created because of those limits.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Jeremy Harris

On 2010-12-21 18:50, Tomas Vondra wrote:

Then the index you just built gets automatically dropped, as I said above.


I'm a bit confused. Should the indexes be dropped automatically (as you
state here) or kept for the future. Because if they should be dropped,
then it does not make sense to do this magic just for a limited time
after the DB goes live.


Here what I said:
"track those that actually get re-used and remove the rest".

Which part is confusing?

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Tomas Vondra
Dne 21.12.2010 16:34, Jeremy Harris napsal(a):
>> There really is no automatic way to solve this puzzle using a single
>> query. Indexing strategy is a very tough design discipline, and it
>> requires a complex knowledge of the workload. One slow query does not
>> mean
>> the index should be created - what if that was just an ad-hoc query and
>> will not be executed ever again?
> 
> Then the index you just built gets automatically dropped, as I said above.

I'm a bit confused. Should the indexes be dropped automatically (as you
state here) or kept for the future. Because if they should be dropped,
then it does not make sense to do this magic just for a limited time
after the DB goes live.

>> I really don't want to work with products that try to be smarter than me
>> (and I admit I'm pretty dumb from time to time) and act rather randomly
>> due to this 'artificial intelligence'. I've already been there and I
>> don't
>> want to repeat this experience.
> 
> Then, since you're not a neophyte, leave the feature turned off. But
> don't deny the possibility of using it to actual neophytes.

This is not a question of whether I am a neophyte or not, this is a
question of good practices. My experience is that building indexes
should be done when designing the application, and tested at in a test
environment. At production, log slow queries, analyze the log and add
indexes when needed.

This 'neophyte approach' is a really bad idea from my point of view. It
inevitably leads to a database with a zillion of unnecessary indexes,
missing the right ones. And any limits don't prevent this. In the end
you'll get a slow database, and the neophytes would blame the database
although that state is inevitable.

I've been working with several database products over the years, and
AFAIK none of them does this. In most cases "the whole industry is
wrong" is usually a really bad sign (does not hold if you're Steve Jobs).

Most of them do have 'advisors' though - that's a good idea, and I think
sooner or later that will be in PostgreSQL too.

 as in b): Creating an index is quite expensiv
>>>
>>> How much more so than doing that full-table-scan plus sort, which your
>>> query is doing anyway?
>>
>> A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
>> maintenance_work_mem) etc. So imagine a few users, building indices on a
>> big table simultaneously
> 
> Pffthht.   One simple trylock, used only by the auto-indexer.

Pffthht? Anyway trylock is not an answer. Think about a different table
for each user.

>> Building an index is just one side of the problem - maintenance of the
>> indexes is another thing. Each index has an impact on write operations
>> (INSERT/UPDATE) and may cause that HOT actually does not work (thus
>> causing unnecessary bloat of the index).
> 
> This is worth discussing.How much help does the DBMS currently give
> the DBA in evaluating these tradeoffs?Could we do better, given an
> actual installation and workload?

The database gives you explain plans, plus pg_stat_* and pg_statio_*
views. That's a lot of data, although does not provide a complete view
in many cases. And then there's a log_min_duration, which is the primary
weapon in fighting slow queries after going live.

>> And the indexes may need lot of
>> space on a disk.
> 
> By all means require limits as well as a "don't do that" switch.

As I said, this is not a good approach from my POV. And it's really
really hard to implement this in a way that those who don't want to use
it don't have to pay the price. Because what portion of users would
really use this feature? 0.0001%?

>> But the real show stopper is probably locking. Building an index takes a
>> write lock on a table, effectively blocking writes. Sure, you can use a
>> 'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
>> disadvantages of that (see
>> http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).
>>
> 
> The only disadvantages I see there are a) the leftover "invalid" index -
> which feels like a
> bug; why is it not auto-dropped?   and b) the second scan, which implies
> more total work
> and a wish to background that portion after completing the query triggering
> the auto-index.

Dropping an index automatically is a really bad idea. Actually dropping
indexes in general (no matter if the index was created automatically or
manually) is a very tricky thing.

How do you decide which indexes are really "leftover" from those that
are used only rarely? I've seen application failing spectacularily
because a DBA dropped an index that was not used ... except for a batch
process that runs once a year, to close a fiscal year.

> Don't forget I suggested doing this only for a limited time after DB
> creation.  Possibly another
> reason to turn it off should be any manual index creation, as that
> implies that the DBA
> knows about indexing.
> 
> I see in another thread you suggest merely placing hints in 

Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread bricklen
On Tue, Dec 21, 2010 at 7:34 AM, Jeremy Harris  wrote:
> On 2010-12-21 14:26, t...@fuzzy.cz wrote:
>>>
>>> Why not auto-create indices for some limited period after database load
>>> (copy?  any large number of inserts from a single connection?), track
>>> those
>>> that actually get re-used and remove the rest?   Would this not provide
>>> a better out-of-the-box experience for neophytes?
>>
>> Say you have a table with several columns (A,B,C), and the query is using
>> some of them. What indexes would you create? One index on every column? A
>> multi-column index on all columns? Indexs for each combination of columns?
>
> One single-column index, on the first index-worthy column appearing.
> Keep it simple.    Maybe, while you're doing that full-table-scan. gather
> stats on all the indexable columns for later reference, to guide choice of
> which column to index later.
>
>> There really is no automatic way to solve this puzzle using a single
>> query. Indexing strategy is a very tough design discipline, and it
>> requires a complex knowledge of the workload. One slow query does not mean
>> the index should be created - what if that was just an ad-hoc query and
>> will not be executed ever again?
>
> Then the index you just built gets automatically dropped, as I said above.
>
>> I really don't want to work with products that try to be smarter than me
>> (and I admit I'm pretty dumb from time to time) and act rather randomly
>> due to this 'artificial intelligence'. I've already been there and I don't
>> want to repeat this experience.
>
> Then, since you're not a neophyte, leave the feature turned off.   But don't
> deny the possibility of using it to actual neophytes.
>
>
 as in b): Creating an index is quite expensiv
>>>
>>> How much more so than doing that full-table-scan plus sort, which your
>>> query is doing anyway?
>>
>> A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
>> maintenance_work_mem) etc. So imagine a few users, building indices on a
>> big table simultaneously
>
> Pffthht.   One simple trylock, used only by the auto-indexer.
>
>>
>> Building an index is just one side of the problem - maintenance of the
>> indexes is another thing. Each index has an impact on write operations
>> (INSERT/UPDATE) and may cause that HOT actually does not work (thus
>> causing unnecessary bloat of the index).
>
> This is worth discussing.    How much help does the DBMS currently give
> the DBA in evaluating these tradeoffs?    Could we do better, given an
> actual installation and workload?
>
>> And the indexes may need lot of
>> space on a disk.
>
> By all means require limits as well as a "don't do that" switch.
>
>>
>> But the real show stopper is probably locking. Building an index takes a
>> write lock on a table, effectively blocking writes. Sure, you can use a
>> 'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
>> disadvantages of that (see
>>
>> http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).
>
> The only disadvantages I see there are a) the leftover "invalid" index -
> which feels like a
> bug; why is it not auto-dropped?   and b) the second scan, which implies
> more total work
> and a wish to background that portion after completing the query triggering
> the auto-index.
>
> Don't forget I suggested doing this only for a limited time after DB
> creation.  Possibly another
> reason to turn it off should be any manual index creation, as that implies
> that the DBA
> knows about indexing.
>
> I see in another thread you suggest merely placing hints in the log.
> That's a fine
> first step - but I'll then be wanting to auto-parse that log to
> auto-create
>
> Cheers,
>    Jeremy
>

This thread offers up some interesting possibilities. Expanding on
what has already been discussed, maybe a contrib module for query
tuning/index suggestions?
Some things that came to mind immediately that the module could do
(feasible or not):
- Look at the EXPLAIN ANALYZE
- Examine the index access methods and table scans + costs/rows
- Which indexes were used?
- What were the blocks/tuples hit & read?
- Look at join conditions and WHERE clause filters
- Data types in the joins (mismatched?)
- Churn rate of the tables, eg. the updates/deletes/inserts. This
might allow suggestion of other index types (eg. gist)

Tool then provides feedback on possibly helpful indexes to test, and
why (hypothetical indexes could be applied here). Possibly provided
suggestions on ways to improve the query, eg. data types don't match
in the join, EXISTS vs IN, etc

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Jeremy Harris

On 2010-12-21 14:26, t...@fuzzy.cz wrote:

Why not auto-create indices for some limited period after database load
(copy?  any large number of inserts from a single connection?), track
those
that actually get re-used and remove the rest?   Would this not provide
a better out-of-the-box experience for neophytes?


Say you have a table with several columns (A,B,C), and the query is using
some of them. What indexes would you create? One index on every column? A
multi-column index on all columns? Indexs for each combination of columns?


One single-column index, on the first index-worthy column appearing.
Keep it simple.Maybe, while you're doing that full-table-scan. gather
stats on all the indexable columns for later reference, to guide choice of
which column to index later.


There really is no automatic way to solve this puzzle using a single
query. Indexing strategy is a very tough design discipline, and it
requires a complex knowledge of the workload. One slow query does not mean
the index should be created - what if that was just an ad-hoc query and
will not be executed ever again?


Then the index you just built gets automatically dropped, as I said above.


I really don't want to work with products that try to be smarter than me
(and I admit I'm pretty dumb from time to time) and act rather randomly
due to this 'artificial intelligence'. I've already been there and I don't
want to repeat this experience.


Then, since you're not a neophyte, leave the feature turned off.   But don't
deny the possibility of using it to actual neophytes.



as in b): Creating an index is quite expensiv


How much more so than doing that full-table-scan plus sort, which your
query is doing anyway?


A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
maintenance_work_mem) etc. So imagine a few users, building indices on a
big table simultaneously


Pffthht.   One simple trylock, used only by the auto-indexer.



Building an index is just one side of the problem - maintenance of the
indexes is another thing. Each index has an impact on write operations
(INSERT/UPDATE) and may cause that HOT actually does not work (thus
causing unnecessary bloat of the index).


This is worth discussing.How much help does the DBMS currently give
the DBA in evaluating these tradeoffs?Could we do better, given an
actual installation and workload?


And the indexes may need lot of
space on a disk.


By all means require limits as well as a "don't do that" switch.



But the real show stopper is probably locking. Building an index takes a
write lock on a table, effectively blocking writes. Sure, you can use a
'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
disadvantages of that (see
http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).


The only disadvantages I see there are a) the leftover "invalid" index - which 
feels like a
bug; why is it not auto-dropped?   and b) the second scan, which implies more 
total work
and a wish to background that portion after completing the query triggering
the auto-index.

Don't forget I suggested doing this only for a limited time after DB creation.  
Possibly another
reason to turn it off should be any manual index creation, as that implies that 
the DBA
knows about indexing.

I see in another thread you suggest merely placing hints in the log.   That's a 
fine
first step - but I'll then be wanting to auto-parse that log to auto-create

Cheers,
Jeremy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread tv
> I don't think planner should do things like creating an index. But it
> might hint at doing it in the logs.
> There was a discussion around that sort of feature on -hackers not so
> long time ago. I don't remember what the conclusion was, but probably
> that it just isn't worth wasting planner's cycles when you execute a
> query.

Yes, that would be a much better solution, definitely. Something like
'watch seq-scan nodes and if the number of matching rows is much smaller
than the total number, put a HINT into log.'

I was thinking about doing something like this for correlated columns
(when the number of matching rows is severely underestimated). But that's
far in the future.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Grzegorz Jaśkiewicz
I don't think planner should do things like creating an index. But it
might hint at doing it in the logs.
There was a discussion around that sort of feature on -hackers not so
long time ago. I don't remember what the conclusion was, but probably
that it just isn't worth wasting planner's cycles when you execute a
query.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread tv
> On 2010-12-21 10:42, Massa, Harald Armin wrote:
>> b) creating an index requires to read the data-to-be-indexed. So, to
>> have an
>> index pointing at the interesting rows for your query, the table has to
>> be
>> read ... which would be the perfect time to allready select the
>> interesting
>> rows. And after having the interesting rows: the index is worthless
>
> ... until another similar query comes along, when suddenly it's a massive
> win.
> Why not auto-create indices for some limited period after database load
> (copy?  any large number of inserts from a single connection?), track
> those
> that actually get re-used and remove the rest?   Would this not provide
> a better out-of-the-box experience for neophytes?

Say you have a table with several columns (A,B,C), and the query is using
some of them. What indexes would you create? One index on every column? A
multi-column index on all columns? Indexs for each combination of columns?

There really is no automatic way to solve this puzzle using a single
query. Indexing strategy is a very tough design discipline, and it
requires a complex knowledge of the workload. One slow query does not mean
the index should be created - what if that was just an ad-hoc query and
will not be executed ever again?

So although this 'automatic index creation' seems nice, it really does not
work in practice.

I really don't want to work with products that try to be smarter than me
(and I admit I'm pretty dumb from time to time) and act rather randomly
due to this 'artificial intelligence'. I've already been there and I don't
want to repeat this experience.

>
> [...]
>> Why is the query planner not allowed to create indexes, but only allowed
>> to
>>> use or not use what's available?
>>>
>>
>> as in b): Creating an index is quite expensiv
>
> How much more so than doing that full-table-scan plus sort, which your
> query is doing anyway?

A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
maintenance_work_mem) etc. So imagine a few users, building indices on a
big table simultaneously - no, I really don't want to see this on a
production server.

Building an index is just one side of the problem - maintenance of the
indexes is another thing. Each index has an impact on write operations
(INSERT/UPDATE) and may cause that HOT actually does not work (thus
causing unnecessary bloat of the index). And the indexes may need lot of
space on a disk.

But the real show stopper is probably locking. Building an index takes a
write lock on a table, effectively blocking writes. Sure, you can use a
'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
disadvantages of that (see
http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).

cheers
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Jeremy Harris

On 2010-12-21 10:42, Massa, Harald Armin wrote:

b) creating an index requires to read the data-to-be-indexed. So, to have an
index pointing at the interesting rows for your query, the table has to be
read ... which would be the perfect time to allready select the interesting
rows. And after having the interesting rows: the index is worthless


... until another similar query comes along, when suddenly it's a massive win.
Why not auto-create indices for some limited period after database load
(copy?  any large number of inserts from a single connection?), track those
that actually get re-used and remove the rest?   Would this not provide
a better out-of-the-box experience for neophytes?

[...]

Why is the query planner not allowed to create indexes, but only allowed to

use or not use what's available?



as in b): Creating an index is quite expensiv


How much more so than doing that full-table-scan plus sort, which your
query is doing anyway?

Cheers,
Jeremy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Dario Beraldi
Ok, thanks a lot to all of you for your answers! (Always impressed by  
the prompt feedback you get on this list!)


Quoting Gabriele Bartolini :


Ciao Dario,

On Tue, 21 Dec 2010 09:14:36 +, Dario Beraldi  
 wrote:

the query run better the planner is not able (allowed?) to create such
index, use it, and drop it once the query is done. Why is it so?


Because it is not its responsibility. This is the simplest and most  
rational answer.


I would reply with some questions that hopefully will give you the  
answer. How can you define a 'good' query from a 'bad' query?  
Consider the case when an user launches a wrong query and the  
planner you propose starts creating an index. I believe that would  
be a mess.


Then ... how could you project this scenario in a concurrent context  
where multiple users launch queries that 'need' an index?


I suggest that you look at the documentation for more information.  
Otherwise, I strongly suggest that you read the chapter on the  
planner from Greg's book on High Performance (which you can find  
from here: http://www.postgresql.org/docs/books/)


Merry Christmas to you too!

Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it - www.2ndQuadrant.it






--

Dr. Dario Beraldi
Institute of Evolutionary Biology
University of Edinburgh
West Mains Road
Edinburgh EH9 3JT
Scotland, UK


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Gabriele Bartolini

Hi Harald,

On Tue, 21 Dec 2010 11:42:40 +0100, "Massa, Harald Armin" 
 wrote:

a) There is a proposal (and, at the time being) also some code on
pgfoundry creating "hypothetical indexes"

http://postgresql.1045698.n5.nabble.com/Hypothetical-Indexes-PostgreSQL-extension-PGCON-2010-td3288085.html
[1]
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00087.php [2]


I totally missed this. Thanks for posting it.

Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it - www.2ndQuadrant.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Massa, Harald Armin
Hello Dario,

When an index is available for a query, the planner decides whether to use
> it or not depending on whether it would make the query perform better,
> right? However if an index, which does not exist, would make the query run
> better the planner is not able (allowed?) to create such index, use it, and
> drop it once the query is done. Why is it so?
>

>From my knowledge there is more then one answer:

a) There is a proposal (and, at the time being) also some code on pgfoundry
creating "hypothetical indexes"
http://postgresql.1045698.n5.nabble.com/Hypothetical-Indexes-PostgreSQL-extension-PGCON-2010-td3288085.html
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00087.php

The idea is: to play with "what-would-be-if-there-would-be-an-index".

With keywords "hypothetical index", "Index advisor" and "virtual index"
there is some research googleable

b) creating an index requires to read the data-to-be-indexed. So, to have an
index pointing at the interesting rows for your query, the table has to be
read ... which would be the perfect time to allready select the interesting
rows. And after having the interesting rows: the index is worthless

c) PostgreSQL is in fact doing something quite similiar like "temporary
indexes during the query"; only for the selected rows. The words to google
for is "hash maps", "bitmap access".

Why is the query planner not allowed to create indexes, but only allowed to
> use or not use what's available?
>

as in b): Creating an index is quite expensiv

additionally: having multiple possible plans is also creating a new decision
problem: which of the possible plans will lead to the better result; again
with some meanings of "better": faster result or less processor usage or
less memory usage or less disk accesses. So adding additional indices during
planning would worsen this problem; which has to be balanced against
possible gains.

So: hypothetical indizes are a good idea, BUT current limitations would most
likely force them to be made outside the life query process.

Best wishes,

Harald

-- 
GHUM GmbH
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607

Amtsgericht Stuttgart, HRB 734971
-
persuadere.
et programmare


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Gabriele Bartolini

Ciao Dario,

On Tue, 21 Dec 2010 09:14:36 +, Dario Beraldi 
 wrote:
the query run better the planner is not able (allowed?) to create 
such

index, use it, and drop it once the query is done. Why is it so?


Because it is not its responsibility. This is the simplest and most 
rational answer.


I would reply with some questions that hopefully will give you the 
answer. How can you define a 'good' query from a 'bad' query? Consider 
the case when an user launches a wrong query and the planner you propose 
starts creating an index. I believe that would be a mess.


Then ... how could you project this scenario in a concurrent context 
where multiple users launch queries that 'need' an index?


I suggest that you look at the documentation for more information. 
Otherwise, I strongly suggest that you read the chapter on the planner 
from Greg's book on High Performance (which you can find from here: 
http://www.postgresql.org/docs/books/)


Merry Christmas to you too!

Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it - www.2ndQuadrant.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general