Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-18 Thread Eduardo Morras
On Wed, 16 Jan 2013 23:42:23 +0100
"T. E. Lawrence"  wrote:

> 
> On 15.01.2013, at 17:32, Jeff Janes  wrote:
> > T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
> > etc.), the default server log settings will log both the cancel and
> > the command triggering the cancel.  So if you are running an up to
> > date server, you can just look in the logs to see what is happening.
> > 
> > Cheers,
> > 
> > Jeff
> 
> That's interesting, I'll check it. Thank you.

And now the million dollars question, do you have any transaction in 'IDLE IN 
TRANSACTION' state? 

If yes, there's one possible problem. For example, we used (note the past) a 
message queue middleware which uses a postgres db for message passing, but it 
keeps the transaction in 'IDLE IN TRANSACTION' state, so if i do a select * 
from message_tbl i get 1-5 rows, but the tbl used (note again the past) several 
Gb of hd space because autovacuum couldn't clean. Of course once discovered the 
problem the middleware was changed by another (home-built) one which don't 
keeps the IDLE IN TRANSACTION state.

> T.
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


---   ---
Eduardo Morras 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-16 Thread T. E. Lawrence

On 15.01.2013, at 17:32, Jeff Janes  wrote:
> T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
> etc.), the default server log settings will log both the cancel and
> the command triggering the cancel.  So if you are running an up to
> date server, you can just look in the logs to see what is happening.
> 
> Cheers,
> 
> Jeff

That's interesting, I'll check it. Thank you.
T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-16 Thread T. E. Lawrence

On 15.01.2013, at 16:36, Tom Lane  wrote:
> "T. E. Lawrence"  
> > So, apparently, we need to interrupt the heavy imports on some reasonable 
>> intervals and do manual VACUUM ANALYZE?
> 
> Data import as such, no matter how "heavy", shouldn't be a problem.
> The question is what are you doing that takes access-exclusive table
> locks frequently, and do you really need to do that?
> 
> A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
> would be the most likely candidates for taking exclusive table locks.
> 
>   regards, tom lane

Thank you for this.

We will have to research into this, it will take a while.

T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-15 Thread Jeff Janes
On Tue, Jan 15, 2013 at 7:36 AM, Tom Lane  wrote:
> "T. E. Lawrence"  
>  writes:
>> On 15.01.2013, at 05:45, Jeff Janes  wrote:
 Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?
>
>>> No.  For example, if you constantly do things that need an access exclusive 
>>> lock, then autovac will keep getting interrupted and never finish.
>
>> I see.
>
>> So, apparently, we need to interrupt the heavy imports on some reasonable 
>> intervals and do manual VACUUM ANALYZE?
>
> Data import as such, no matter how "heavy", shouldn't be a problem.
> The question is what are you doing that takes access-exclusive table
> locks frequently, and do you really need to do that?
>
> A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
> would be the most likely candidates for taking exclusive table locks.

But that isn't an exhaustive list--weaker locks will also cancel
autovacuum, for example I think the SHARE lock taken by CREATE INDEX
will and the even weaker one taken by CREATE INDEX CONCURRENTLY will
too.

But will all of those cancel auto-analyze as well as auto-vac?  I
guess they will because they use the same lock level.

T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
etc.), the default server log settings will log both the cancel and
the command triggering the cancel.  So if you are running an up to
date server, you can just look in the logs to see what is happening.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-15 Thread Tom Lane
"T. E. Lawrence"  
 writes:
> On 15.01.2013, at 05:45, Jeff Janes  wrote:
>>> Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

>> No.  For example, if you constantly do things that need an access exclusive 
>> lock, then autovac will keep getting interrupted and never finish.

> I see.

> So, apparently, we need to interrupt the heavy imports on some reasonable 
> intervals and do manual VACUUM ANALYZE?

Data import as such, no matter how "heavy", shouldn't be a problem.
The question is what are you doing that takes access-exclusive table
locks frequently, and do you really need to do that?

A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
would be the most likely candidates for taking exclusive table locks.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-15 Thread T. E. Lawrence

On 15.01.2013, at 05:45, Jeff Janes  wrote:

>> Which makes me think that, as we grew the database more than 250 times in 
>> size over a 2-3 months period, relying on autovacuum (some tables grew from 
>> 200k to 50m records, other from 1m to 500m records), the autovacuum has 
>> either let us down or something has happen to the ANALYZE.
> 
> What do pg_stat_user_tables tell you about last_vacuum, last_autovacuum, 
> last_analyze, last_autoanalyze?

  relname   |  last_vacuum  |
last_autovacuum| last_analyze  |   last_autoanalyze 
   
+---+---+---+---
 elements   | 2013-01-14 16:14:48.963573+00 |   
| 2013-01-14 16:19:48.651155+00 | 2012-12-12 
12:23:31.308877+00

This is the problematic table. I think it is clear. Last autovacuum has been 
never and last autoanalyze has been mid-December.

Thank you!

>> Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?
> 
> No.  For example, if you constantly do things that need an access exclusive 
> lock, then autovac will keep getting interrupted and never finish.

I see.

So, apparently, we need to interrupt the heavy imports on some reasonable 
intervals and do manual VACUUM ANALYZE?

> Cheers,
> 
> Jeff


Thank you very much,
T.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-14 Thread Jeff Janes
On Monday, January 14, 2013, T. E. Lawrence wrote:

> RESOLVED
> --
> Dear all,
>
> Thank you for your great help and multiple advices.
>
> I discovered the problem and I have to say that it is very stupid and
> strange.
>
> Here is what happened.
>
>
...


> So I decided to try the whole thing properly from the beginning. Dropped
> the index, did again VACUUM ANALYZE and tried the queries, in order to
> measure them without and with index. Surprise! - the slow query was blazing
> fast. The previous indexes (not the dropped partial index) were properly
> used. All was fine.
>
> Which makes me think that, as we grew the database more than 250 times in
> size over a 2-3 months period, relying on autovacuum (some tables grew from
> 200k to 50m records, other from 1m to 500m records), the autovacuum has
> either let us down or something has happen to the ANALYZE.
>

What do pg_stat_user_tables tell you
about last_vacuum, last_autovacuum, last_analyze, last_autoanalyze ?


> Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?
>

No.  For example, if you constantly do things that need an access exclusive
lock, then autovac will keep getting interrupted and never finish.

Cheers,

Jeff


Re: [GENERAL] reducing number of ANDs speeds up query RESOLVED

2013-01-14 Thread T. E. Lawrence
RESOLVED
--
Dear all,

Thank you for your great help and multiple advices.

I discovered the problem and I have to say that it is very stupid and strange.

Here is what happened.

>From all advices I tried first partial index. The index was built and there 
>was no change in the speed of the slow query. Which depressed me greatly. In 
>the midst of my depression I ran VACUUM ANALYZE which took about 10 hours (the 
>db is about 170 GB and has more than 500m rows in some tables, running on a 4 
>core, 8 GB RAM dedicated PostgreSQL cloud server). Towards the end of VACUUM 
>ANALYZE I was playing with some queries and suddenly the slow query became 
>fast! (which partially defeated the notion that one does not need ANALYZE upon 
>CREATE INDEX) And I said "Aha!".

So I decided to try the whole thing properly from the beginning. Dropped the 
index, did again VACUUM ANALYZE and tried the queries, in order to measure them 
without and with index. Surprise! - the slow query was blazing fast. The 
previous indexes (not the dropped partial index) were properly used. All was 
fine.

Which makes me think that, as we grew the database more than 250 times in size 
over a 2-3 months period, relying on autovacuum (some tables grew from 200k to 
50m records, other from 1m to 500m records), the autovacuum has either let us 
down or something has happen to the ANALYZE.

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

Tank you and all the best,
T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread Alban Hertroys
On 12 January 2013 12:41, T. E. Lawrence  wrote:

> Hi and thank you for your notes!
>
> > You really ought to include the output of EXPLAIN ANALYZE in cases such
> as these (if it doesn't already point you to the culprit).
>
> I'll do so, it takes quite long...
>
> > Most likely you'll find that the last condition added a sequential scan
> to the query plan,
>
> Exactly! EXPLAIN says so.
>
> > which can have several causes/reasons. Are the estimated #rows close to
> the actual #rows?
>
> Yes, this is the problem. I read that in such cases indexes are not read.
> However if the previous conditions are executed first, the result is zero
> or just a few rows and there is no need seq scan the whole values column.
>

You mean they don't match, do you?

The database doesn't know what you know and its making the wrong decision
based on incorrect data.

The database won't use an index if it thinks that there aren't many rows to
check against a condition. Most likely (the results from explain analyze
would tell) the database thinks there are much fewer rows in table b than
there actually are.

You'll probably want to read about database maintenance for Postgres and
how to keep its statistics up to date. Autovacuum may need some tuning or
you need to run manual VACUUM more frequently.
In fact, run VACUUM now and see if the problem goes away.

You'll usually also want to run VACUUM after a large batch job.


> > Meanwhile, it looks like most of your AND's are involved in joining
> tables a and b. Perhaps it helps to use an explicit join instead of an
> implicit one?
>
> I am not quite sure what this means, but will read about it.
>

You're currently using implicit joins by combining your join conditions in
the WHERE clause of your query, like this:
SELECT *
FROM a, b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND b.value = 'yadayada';

You can also explicitly put your join conditions with the joins, like so:
SELECT *
FROM a INNER JOIN b ON (a.col1 = b.col1 AND a.col2 = b.col2)
WHERE b.value = 'yadayada';

You explicitly tell the database that those are the conditions to be joined
on and that the remaining conditions are filters on the result set. With
just two tables the need for such isn't that obvious, but with more tables
it quickly becomes difficult to see what condition in an implicit join is
part of the joins and which is the result set filter. With explicit joins
that's much clearer.
It wouldn't be the first time that I rewrite a query to use explicit joins,
only to find that the original query was incorrect.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread Tony Theodore

On 12/01/2013, at 12:47 PM, T. E. Lawrence  wrote:

> Hello,
> 
> I have a pretty standard query with two tables:
> 
> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;
> 
> With the last "AND b.value=..." the query is extremely slow (did not wait for 
> it to end, but more than a minute), because the value column is not indexed 
> (contains items longer than 8K).
> 
> However the previous conditions "WHERE ... AND ... AND" should have already 
> reduced the candidate rows to just a few (table_b contains over 50m rows). 
> And indeed, removing the last "AND b.value=..." speeds the query to just a 
> millisecond.
> 
> Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND 
> ... AND" and then the last "AND b.value=..." on the (very small) result?

Have you looked at the WITH clause [1,2]:

WITH filtered as (SELECT table_a.id, b.value as val FROM table_a a, table_b b 
WHERE … AND …)
SELECT * FROM filtered WHERE filtered.val=…

It evaluates the the first SELECT once, then applies the second SELECT to the 
first in memory (at least that's the way I think about them).

Cheers,

Tony


[1] http://www.postgresql.org/docs/9.2/static/queries-with.html
[2] http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-WITH



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread T. E. Lawrence
Hi and thank you!

On 12.01.2013, at 11:52, Eduardo Morras  wrote:

>> With the last "AND b.value=..." the query is extremely slow (did not wait 
>> for it to end, but more than a minute), because the value column is not 
>> indexed (contains items longer than 8K).
> 
> You can construct your own home made index, add a new column in table b, with 
> the first 8-16 bytes/chars of b.value, use this column on your query and 
> refine to a complete b.value. Don't forget tocCreate an index for it too. You 
> can keep this column updated with a trigger.

Yes, I have been considering this in a slightly different way. value contains 
short and long values (mixed). Only the short values are queried directly. The 
long values are queried in a tsearch column or in an external Sphinx Search. So 
probably I should split the short and long values and then index the short 
values.

It is nevertheless slightly annoying that I cannot make the query do the value 
thing last...

> Perhaps you can use a partial index for b.value column, i never used that 
> feature so documentation/others can point you how to do it.

Did not know of them, reading. Thank you!
T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread T. E. Lawrence
Hi and thank you for your notes!

> You really ought to include the output of EXPLAIN ANALYZE in cases such as 
> these (if it doesn't already point you to the culprit).

I'll do so, it takes quite long...

> Most likely you'll find that the last condition added a sequential scan to 
> the query plan,

Exactly! EXPLAIN says so.

> which can have several causes/reasons. Are the estimated #rows close to the 
> actual #rows?

Yes, this is the problem. I read that in such cases indexes are not read. 
However if the previous conditions are executed first, the result is zero or 
just a few rows and there is no need seq scan the whole values column.

> Is b.value indexed?

No, because it contains too long values for indexing.

> How selective is the value you're matching it against (is it uncommon or 
> quite common)? Etc, etc.

Zero to a few.

> Meanwhile, it looks like most of your AND's are involved in joining tables a 
> and b. Perhaps it helps to use an explicit join instead of an implicit one?

I am not quite sure what this means, but will read about it.

There were 2 more suggestions, I'll try now everything and write back.

Thank you very much for your help!
T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread T. E. Lawrence
On 12.01.2013, at 07:10, Amit kapila  wrote:
> You can try once with below query:
> Select * from (SELECT a.id,b.value FROM table_a a, table_b b WHERE ... AND 
> ... ) X where X.value=...;
> 
> If this doesn't work can you send the Explain .. output for both queries(the 
> query you are using and the query I have suggested)
> 
> With Regards,
> Amit Kapila.

Hi and thank you!

I will try this and let you know!

T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread Eduardo Morras
On Sat, 12 Jan 2013 02:47:26 +0100
"T. E. Lawrence"  wrote:

> Hello,
> 
> I have a pretty standard query with two tables:
> 
> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;
> 
> With the last "AND b.value=..." the query is extremely slow (did not wait for 
> it to end, but more than a minute), because the value column is not indexed 
> (contains items longer than 8K).
 
You can construct your own home made index, add a new column in table b, with 
the first 8-16 bytes/chars of b.value, use this column on your query and refine 
to a complete b.value. Don't forget tocCreate an index for it too. You can keep 
this column updated with a trigger.

Perhaps you can use a partial index for b.value column, i never used that 
feature so documentation/others can point you how to do it.


> However the previous conditions "WHERE ... AND ... AND" should have already 
> reduced the candidate rows to just a few (table_b contains over 50m rows). 
> And indeed, removing the last "AND b.value=..." speeds the query to just a 
> millisecond.
> 
> Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND 
> ... AND" and then the last "AND b.value=..." on the (very small) result?
> 
> Thank you and kind regards,
> T.

---   ---
Eduardo Morras 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-12 Thread Alban Hertroys
You really ought to include the output of EXPLAIN ANALYZE in cases such as
these (if it doesn't already point you to the culprit).

Most likely you'll find that the last condition added a sequential scan to
the query plan, which can have several causes/reasons. Are the estimated
#rows close to the actual #rows? Is b.value indexed? How selective is the
value you're matching it against (is it uncommon or quite common)? Etc, etc.

Meanwhile, it looks like most of your AND's are involved in joining tables
a and b. Perhaps it helps to use an explicit join instead of an implicit
one?


On 12 January 2013 02:47, T. E. Lawrence  wrote:

> Hello,
>
> I have a pretty standard query with two tables:
>
> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND
> b.value=...;
>
> With the last "AND b.value=..." the query is extremely slow (did not wait
> for it to end, but more than a minute), because the value column is not
> indexed (contains items longer than 8K).
>
> However the previous conditions "WHERE ... AND ... AND" should have
> already reduced the candidate rows to just a few (table_b contains over 50m
> rows). And indeed, removing the last "AND b.value=..." speeds the query to
> just a millisecond.
>
> Is there a way to instruct PostgreSQL to do first the initial "WHERE ...
> AND ... AND" and then the last "AND b.value=..." on the (very small) result?
>
> Thank you and kind regards,
> T.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: [GENERAL] reducing number of ANDs speeds up query

2013-01-11 Thread Amit kapila

On Saturday, January 12, 2013 7:17 AM T. E. Lawrence wrote:
> Hello,

> I have a pretty standard query with two tables:

> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

> With the last "AND b.value=..." the query is extremely slow (did not wait for 
> it to end, but more than a minute), because the value column is not indexed 
> (contains items longer than 8K).

> However the previous conditions "WHERE ... AND ... AND" should have already 
> reduced the candidate rows to just a few (table_b contains over 50m rows). 
> And indeed, removing the last "AND b.value=..." speeds the query to just a 
> millisecond.

> Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND 
> ... AND" and then the last "AND b.value=..." on the (very small) result?

You can try once with below query:
Select * from (SELECT a.id,b.value FROM table_a a, table_b b WHERE ... AND ... 
) X where X.value=...;

If this doesn't work can you send the Explain .. output for both queries(the 
query you are using and the query I have suggested)


With Regards,
Amit Kapila.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] reducing number of ANDs speeds up query

2013-01-11 Thread T. E. Lawrence
Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for 
it to end, but more than a minute), because the value column is not indexed 
(contains items longer than 8K).

However the previous conditions "WHERE ... AND ... AND" should have already 
reduced the candidate rows to just a few (table_b contains over 50m rows). And 
indeed, removing the last "AND b.value=..." speeds the query to just a 
millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND 
... AND" and then the last "AND b.value=..." on the (very small) result?

Thank you and kind regards,
T.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general