Re: [PERFORM] Sorted group by

2010-08-10 Thread Jonathan Blitz
 
Another couple of possible ways:

Select groupfield,value
>From tbl x1
Where number = (select max(number) from tbl x2 where x2.groupfield=
x1.groupfield)



Select groupfield,value
>From tbl x1
Where (groupfield,number) in (select groupfield,max(number) from tbl group
by groupfield)

Which is quickest?
Probably best to try out and see.

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Kevin Grittner
Sent: Tuesday, August 10, 2010 7:38 PM
To: Matthew Wakeling; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sorted group by

Matthew Wakeling  wrote:
 
> I'm trying to eke a little bit more performance out of an application
 
In addition to the suggestion from Thomas Kellerer, it would be interesting
to try the following and see how performance compares using real data.
 
select group, value from tbl x
  where not exists
(select * from tbl y
  where y.group = x.group and y.number > x.number);
 
We have a lot of code using this general technique, and I'm curious whether
there are big gains to be had by moving to the windowing functions.  (I
suspect there are.)
 
-Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3061 - Release Date: 08/09/10
21:35:00


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is the query not using the index for sorting?

2009-11-23 Thread Jonathan Blitz
Definitely after.

Jonathan 

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Matthew
Wakeling
Sent: Monday, November 23, 2009 1:00 PM
To: Jonathan Blitz
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is the query not using the index for sorting?

On Sun, 22 Nov 2009, Jonathan Blitz wrote:
> I have a table with a number of columns.
>  
> I perform
>  
> Select *
> from table
> order by a,b
>  
> There is an index on a,b which is clustered (as well as indexes on a and b
alone).
> I have issued the cluster and anyalze commands.

Did you analyse *after* creating the index and clustering, or before?

Matthew

--
 [About NP-completeness] These are the problems that make efficient use of
 the Fairy Godmother.-- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.709 / Virus Database: 270.14.76/2517 - Release Date: 11/22/09
21:40:00


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is the query not using the index for sorting?

2009-11-22 Thread Jonathan Blitz
Many thanks.
I'll give it a try and see what happens. 

-Original Message-
From: Craig Ringer [mailto:cr...@postnewspapers.com.au] 
Sent: Sunday, November 22, 2009 3:25 PM
To: Jonathan Blitz
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is the query not using the index for sorting?

On 22/11/2009 8:50 PM, Jonathan Blitz wrote:
> I have a table with a number of columns.
>  
> I perform
>  
> Select *
> from table
> order by a,b
>  
> There is an index on a,b which is clustered (as well as indexes on a 
> and b alone).
> I have issued the cluster and anyalze commands.
>  
> Nevertheless, PostgreSQL performs a Sequential Scan on the table and 
> then performs a sort.

PostgreSQL's query planner probably thinks it'll be faster to read the pages
off the disk sequentially then sort them in memory. To use an index instead,
Pg would have to read the whole index from disk
(sequentially) then fetch all the pages off the disk in a probably
near-random order. So it'd be doing more disk I/O, and much more of it would
be random I/O, which is a LOT slower.

So Pg does it the fast way, reading the table into memory then sorting it
there.

The most important thing to understand is that sometimes, a sequential scan
is just the fastest way to do the job.

I suspect you're working on the assumption that Pg can get all the data it
needs from the index, so it doesn't need to read the tables proper.
In some other database systems this *might* be possible if you had an index
on fields "a" and "b" and issued a "select a,b from table" instead of a
"select *". PostgreSQL, though, can not do this. PostgreSQL's indexes do not
contain all the information required to return values from queries, only
enough information to find the places in the main tables where those values
are to be found.

If you want to know more and understand why that's the case, search for the
phrase "covered index" and the words "index visibility". Suffice it to say
that there are pretty good reasons why it works how it does, and there would
be very large downsides to changing how it works as well as large technical
problems to solve to even make it possible. It's to do with the trade-off
between update/insert/delete speeds and query speeds, the cost of "fatter"
indexes taking longer to read from disk, and lots more.

By the way, if you want to test out different query plans for a query to see
which way is faster, you can use the "enable_" parameters like
"enable_seqscan", "enable_hashjoin" etc to control how PostgreSQL performs
queries. There's *LOTS* to be learned about this in the mailing list
archives. You should also read the following page:

 http://www.postgresql.org/docs/current/static/runtime-config-query.html

but understand that the planner method configuration parameters are intended
mostly for testing and performance analysis, not for production use.

If you find a query that's lots faster with a particular enable_ parameter
set to "off", try increasing your statistics targets on the tables / columns
of interest, re-ANALYZEing, and re-testing. See these pages re statistics:

http://www.postgresql.org/docs/current/static/using-explain.html
http://www.postgresql.org/docs/current/static/planner-stats.html
http://www.postgresql.org/docs/current/static/planner-stats-details.html

If after increasing your stats targets the planner still picks a vastly
slower plan, consider posting to the mailing list with the full output of
"EXPLAIN ANALYZE SELECT myquery", the full exact text of your query, and
your table schema as shown by "\d tablename" in psql. Someone may be able to
help you or at least explain why it's happening.

--
Craig Ringer
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.709 / Virus Database: 270.14.76/2517 - Release Date: 11/21/09
21:41:00


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why is the query not using the index for sorting?

2009-11-22 Thread Jonathan Blitz
I have a table with a number of columns.
 
I perform 
 
Select * 
from table
order by a,b
 
There is an index on a,b which is clustered (as well as indexes on a and b
alone).
I have issued the cluster and anyalze commands.
 
Nevertheless, PostgreSQL performs a Sequential Scan on the table and then
performs a sort.

Am I missing something?
 
Jonathan Blitz


Re: [PERFORM] Propagating outer join conditions

2006-12-03 Thread Jonathan Blitz
How about trying:

Select *
From
(Select * from t28 where t28.0='spec')  t28a
Left out join (t1 JOIN t11 ON
> (t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28a.s = t1.s

In this way, I think, the where clause on t28 would be performed before the
join rather than after.

Jonathan Blitz


> -Original Message-
> From: Aaron Birkland [mailto:[EMAIL PROTECTED]
> Sent: Sunday, December 03, 2006 5:12 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Propagating outer join conditions
> 
> The following left outer join plan puzzles me:
> 
> EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN t11 ON
> (t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28.s = t1.s
> WHERE t28.o = '"spec"';
> 
> t28, t1, and t11 all have indexed columns named 's' and 'o' that contain
'text';
> 
>  Nested Loop Left Join  (cost=794249.26..3289704.61 rows=1 width=301)
> (actual time=581293.390..581293.492 rows=1 loops=1)
>Join Filter: (t28.s = t1.s)
>->  Index Scan using t28_o on t28  (cost=0.00..9.22 rows=1
> width=89) (actual time=0.073..0.077 rows=1 loops=1)
>  Index Cond: (o = '"spec"'::text)
>->  Merge Join  (cost=794249.26..3267020.66 rows=1813979 width=212)
> (actual time=230365.522..577078.266 rows=1894969 loops=1)
>  Merge Cond: (t1.o = t11.s)
>  ->  Index Scan using t1_o on t1  (cost=0.00..2390242.10
> rows=5696 width=109) (actual time=0.209..162586.801 rows=3925
> loops=1)
>  ->  Sort  (cost=794249.26..798784.21 rows=1813979 width=103)
> (actual time=230365.175..237409.474 rows=1894969 loops=1)
>Sort Key: t11.s
>->  Bitmap Heap Scan on t11  (cost=78450.82..605679.55
> rows=1813979 width=103) (actual time=3252.103..22782.271 rows=1894969
> loops=1)
>  Recheck Cond: (o = '<http://example.org>'::text)
>  ->  Bitmap Index Scan on t11_o
> (cost=0.00..78450.82 rows=1813979 width=0) (actual
> time=2445.422..2445.422 rows=1894969 loops=1)
>Index Cond: (o = '<http://example.org>'::text)
> 
> 
> It seems to me that this plan is not very desirable, since the outer
> part of the nested loop left join (the merge join node) is very
> expensive. Is is possible to generate a plan that looks like this:
> 
>  Nested Loop Left Join  (cost=???)
>->  Index Scan using t28_o on t28  (cost=0.00..9.11 rows=1 width=89)
>  Index Cond: (o = '"spec"'::text)
>->  Nested Loop  (cost=???)
>  ->  Index Scan using t1_s on t1  (cost=???)
>Index Cond: (s = t28.s)
>  ->  Bitmap Heap Scan on t11  (cost=???)
>Recheck Cond: (t11.s = t1.o)
>Filter: (o = '<http://example.org>'::text)
>->  Bitmap Index Scan on t11_s  (cost=??? )
>  Index Cond: (t11.s = t1.o)
> 
> I *think* this plan is equivalent to the above if I'm assuming the
> behaviour of the 'nested loop left join' node correctly.  So far, I
> have been tweaking the statistics, cost estimates, and
> enabling.disabling certain plans to see if I can get it to propagate
> the join condition t1.s = t28.s to the outer node of the left join..
> but so far, I cannot.  So, my questions are:
> 
> 1) Is my 'desired' query plan logic correct
> 2) Can the executor execute a plan such as my 'desired' plan
> 3) If (1) and (2) are 'yes', then how may I get the planner to
> generate such a plan, or do I just need to look harder into tweaking
> the statistics and cost estimates
> 
>   -Aaron
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 
> 
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/2/2006
> 
> 
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/02/2006
> 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/02/2006
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Jonathan Blitz
> 
> Most laptop drives are only 5,400 RPM which would make a transaction like
> you are describing likely take a while.

Not sure what my one is but it is new(ish).

> 
> No desktop at home you could try it on?
> I think the problem with the laptop is likely it's drive.

I suppose I could do but I need to install PostgreSQL there and then copy
over the database.
Maybe I will give it a try.

Jonathan

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Jonathan Blitz
> 
> 
> Could you do the updates in batches instead of trying to do them all at
> once?

Nope. Didn't think it would make any difference.
> 
> Have you done a vacuum full on this table ever?

Many times

> 
> What hardware?
> I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA).
> Doing an update on a 5 million record table took quite a while, but it did
> fininish. :-)

I am using a laptop :).
Pentium 4 (not 4M) with 1GB of memory - 2 MHZ

Must do it on  that since the program is aimed for use at home.

Jonathan

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Jonathan Blitz
> > So, I have tried to run the following command. The command never
finishes
> > (I gave up after about and hour and a half!).
> 
> Did you ever find what was the problem?
> Perhaps you needed to run a vacuum full on the table?

Nope.
I just gave up in the end and left it with NULL as the default value.
There were, in fact, over 2 million rows in the table rather than 1/4 of a
million so that was part of the problem.

Jonathan
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] slow query using sub select

2006-05-22 Thread Jonathan Blitz


> -Original Message-
> From: Tim Jones [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 23, 2006 12:11 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] slow query using sub select
> 
> Hi,
>   I am having a problem with a sub select query being kinda slow.  The
> query is as follows:
> 
> select batterycode, batterydescription, observationdate from Battery t1
> where patientidentifier=611802158 and observationdate = (select
> max(observationdate) from Battery t2 where t2.batterycode=t1.batterycode
> and patientidentifier=611802158) order by batterydescription.


How about changing it into a standard join:


select t1.batterycode, t1.batterydescription, t2.observationdate
from Battery t1, 
(Select batterycode ,max(observationdate) from Battery t2 where
patientidentifier=611802158 group by batterycode) AS T2
where t1. batterycode = t2. batterycode

Jonathan Blitz
AnyKey Limited
Israel

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.6.1/344 - Release Date: 05/19/2006
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Adding and filling new column on big table

2006-05-16 Thread Jonathan Blitz








I have a table of about 500,000 rows. 

 

I need to add a new column and populate it.

 

So, I have tried to run the following command. The command
never finishes (I gave up after about and hour and a half!).

Note that none of the columns have indexes.

 

Update mytable set new_column =    

case when column_1 = column_2                        then
1   

when column_1+column_3= column_2 and column_3 > 0              then
2   

when column_1+column_3+column_4 = column_2 and column_4 >
0         then 3   

when column_1+column_3+column_4+column_5 = column_2 and
column_5 > 0     then 4   

else     0


end

 

 

My computer is a Pentium 4 – 2.4 GHZ and 1G RAM – so it
should be fast enough.

 

Any ideas?

 

Jonathan Blitz

 








--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.6/340 - Release Date: 05/15/2006