Re: [GENERAL] Tuning to speed select

2006-08-11 Thread Joshua D. Drake

Roman Neuhauser wrote:

# [EMAIL PROTECTED] / 2006-08-11 10:12:40 -0400:
I think my Dell Precision 650 has SATA on the motherboard. The boss says 
I can order one drive, so what should I get? How much faster is RAID 0+1 
than a single drive?


If you can order one drive, get a drive that is twice the size of what 
you have. Use one partition of that drive to create a raid 1. Use the 
other partition to push off nightly rsyncs :)


Sincerely,

Joshua D. Drake




You need 4 disks for 0+1 (or 1+0, also called 10).
 



--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Tuning to speed select

2006-08-11 Thread Roman Neuhauser
# [EMAIL PROTECTED] / 2006-08-11 10:12:40 -0400:
> I think my Dell Precision 650 has SATA on the motherboard. The boss says 
> I can order one drive, so what should I get? How much faster is RAID 0+1 
> than a single drive?

You need 4 disks for 0+1 (or 1+0, also called 10).
 
-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Tuning to speed select

2006-08-11 Thread Merlin Moncure

On 8/11/06, Tom Laudeman <[EMAIL PROTECTED]> wrote:

Merlin,
The problem is disk. I've got a WD Caviar. hdparm says it does 44MB/sec
(I ran that in single user mode so there was nothing interfering). A WD
Caviar SE SATA in one of my servers at home gets 56MB/sec on a quiescent
system at runlevel 3. What kind of values does hdparm give for a SATA
Raptor?


i dont have one handy on a linux box to test, but all raptors are 10k
drives with 16 mb cache. they feel about twice as fast as 7200 rpm ide
drives in general use.


I think my Dell Precision 650 has SATA on the motherboard. The boss says
I can order one drive, so what should I get? How much faster is RAID 0+1
than a single drive?


depends on a lot of factors, near 100% improvement is realistic even
with software raid.  I would tell your boss that you could buy 2 36g
raptors (110$ each) do a simple raid 0.  just be aware that either
drive failing will take you out.  or, you could do raid 1 for
redundancy.


Aside from size, I can't see much difference between these drives (WD
Raptors at NewEgg):
http://www.newegg.com/Product/ProductList.asp?DEPA=0&type=&Description=raptor&Submit=ENE&Ntk=all&N=0&minPrice=&maxPrice=&Go.x=0&Go.y=0


the retail parts are more expensive as is the silly drive that you can
look into.  buy the cheapest part at the size level you need.


CLUSTER certainly helped. Each of the following queries would have
returned roughly 50,000 records. Note that selecting a single record
from blast_result using an index is plenty fast ( ~ 50 ms), so my
primary concern is pulling back larger subsets of data.


maybe. you may have had table bloat as well, cluster does a full table
rebuild like vacuum fuul.


It appears that count(*) on a CLUSTERed table uses the index (as opposed
to the old way of doing a sequential scan). Count on the table after
CLUSTER appears to be a *lot* faster, maybe almost 100x. I know we
shouldn't count, but we've been too lazy to keep the record counts in
another table, and our customers occasionally want to know how many
records are in a certain subset.


no, afaik count(*) uses the table still (try explain analyze). you
just compacted and optimized the table for efficient sequential scans.
are you vacuuming regulary?

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Tuning to speed select

2006-08-11 Thread Tom Laudeman

Merlin,
The problem is disk. I've got a WD Caviar. hdparm says it does 44MB/sec 
(I ran that in single user mode so there was nothing interfering). A WD 
Caviar SE SATA in one of my servers at home gets 56MB/sec on a quiescent 
system at runlevel 3. What kind of values does hdparm give for a SATA 
Raptor?


I think my Dell Precision 650 has SATA on the motherboard. The boss says 
I can order one drive, so what should I get? How much faster is RAID 0+1 
than a single drive?


Aside from size, I can't see much difference between these drives (WD 
Raptors at NewEgg):

http://www.newegg.com/Product/ProductList.asp?DEPA=0&type=&Description=raptor&Submit=ENE&Ntk=all&N=0&minPrice=&maxPrice=&Go.x=0&Go.y=0


CLUSTER certainly helped. Each of the following queries would have 
returned roughly 50,000 records. Note that selecting a single record 
from blast_result using an index is plenty fast ( ~ 50 ms), so my 
primary concern is pulling back larger subsets of data.


It appears that count(*) on a CLUSTERed table uses the index (as opposed 
to the old way of doing a sequential scan). Count on the table after 
CLUSTER appears to be a *lot* faster, maybe almost 100x. I know we 
shouldn't count, but we've been too lazy to keep the record counts in 
another table, and our customers occasionally want to know how many 
records are in a certain subset.


Before CLUSTER:
explain analyze select * from blast_result where si_fk=11843253;
Total runtime: 16334.539 ms

explain analyze select * from blast_result where si_fk=11843248;
Total runtime: 31406.999 ms

explain analyze select * from blast_result where si_fk=11218929;
Total runtime: 15319.440 ms


After CLUSTER and vacuum analyze:
explain analyze select * from blast_result where si_fk=11843253;
Total runtime: 2343.893 ms

explain analyze select * from blast_result where si_fk=11843248;
Total runtime: 2158.395 ms

explain analyze select * from blast_result where si_fk=11218929;
Total runtime: 1880.586 ms

explain analyze select * from blast_result where si_fk=11843250;
Total runtime: 2085.253 ms


Thanks,
Tom





Are your data structures normalized?  Performance problems queying a
single giganto table is  usually (but not necessirly in your case) a
sign of a poorly designed table structure.

otherwise it's pretty clear you get the most bang for the buck with
hardware.  consider upping ram and/or buying better disks.  you could
buy cheap sata controller and 4 raptors in raid 0+1 configuration for
<1000$ and you will feel like you have supercomputer relative to what
you have now :)

merlin



--
Tom Laudeman
[EMAIL PROTECTED]
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Tuning to speed select

2006-08-10 Thread Tom Lane
Michael Fuhr <[EMAIL PROTECTED]> writes:
> Excerpt from the 8.1 Release Notes:
>   * Allow nonconsecutive index columns to be used in a multicolumn
> index (Tom)

> For example, this allows an index on columns a,b,c to be used in
> a query with WHERE a = 4 and c = 10. 

> If you're querying only on c then an index on (a,b,c) will probably
> be slower than an index on (c), but if you have only (a,b,c) then
> the planner will consider using it.

Note that if your query doesn't have a constraint on the first index
column, it's unlikely that that index will get chosen, because the
entire index would have to be scanned.  Before 8.1 it was guaranteed
not to be chosen.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Tuning to speed select

2006-08-10 Thread Michael Fuhr
On Thu, Aug 10, 2006 at 11:00:00AM -0400, Tom Laudeman wrote:
> As far as I can tell (from running actual tests) Postgres will not use a 
> multi-column index when the SELECT is constraining on only one of the 
> columns in the index.

Excerpt from the 8.1 Release Notes:

  * Allow nonconsecutive index columns to be used in a multicolumn
index (Tom)

For example, this allows an index on columns a,b,c to be used in
a query with WHERE a = 4 and c = 10. 

If you're querying only on c then an index on (a,b,c) will probably
be slower than an index on (c), but if you have only (a,b,c) then
the planner will consider using it.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Tuning to speed select

2006-08-10 Thread Tom Laudeman




Reece,
We have some multi-column indexes. Speed of insert, update, and delete
are not an issue since this data is essentially static: write-once,
read-many.

As far as I can tell (from running actual tests) Postgres will not use
a multi-column index when the SELECT is constraining on only one of the
columns in the index. If I need a single column index, I create one. If
I need three columns in an index, I create a specific 3 column index.
Granted, my memory is fuzzy, and some of my testing was done on version
7.x and there may be improvements on version 8.x

-Tom


...
  
I found multi-column indexes and clustering to be extremely beneficial
in 7.4. I still use them in 8.1, but I haven't compared them
extensively with equivalent queries that use bitmap index scans. The
obvious downside of having more indexes is the additional time and
space overhead during insert, update, or delete.
  
...
  
-Reece
  
  

  

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


  

  


-- 
Tom Laudeman
[EMAIL PROTECTED]
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/





Re: [GENERAL] Tuning to speed select

2006-08-10 Thread Merlin Moncure

On 8/9/06, Tom Laudeman <[EMAIL PROTECTED]> wrote:


 The speed of the query is (as Michael implies) limited to the rate at which
the disk can seek and read.  I have done experiments with views and cursors;
there was no improvement in speed. I've also tried only pulling back
primary keys in the hope that a smaller amount of data would more quickly be
read into memory. No speed increase. I have also raised all the usual memory
limits, with the expected results (slight speed improvements).



Are your data structures normalized?  Performance problems queying a
single giganto table is  usually (but not necessirly in your case) a
sign of a poorly designed table structure.

otherwise it's pretty clear you get the most bang for the buck with
hardware.  consider upping ram and/or buying better disks.  you could
buy cheap sata controller and 4 raptors in raid 0+1 configuration for
<1000$ and you will feel like you have supercomputer relative to what
you have now :)

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Tuning to speed select

2006-08-10 Thread Alban Hertroys

Richard Broersma Jr wrote:

Views certainly help in managing complexity. They do nothing to improve
query-speed.

Querying a view gets rewritten to queries to the underlying tables on the
fly.
(as long as there are no materialized views, which are still on a the TODO
list)


Would partial indexs on the most queried regions of the table help in query 
speed?


They would, as long as they can be used in the client application 
queries. If all the data is often used (partially or not), then there's 
little point to partial indices. But that's a border-case. In common, 
they can make major differences.


Also, make sure you add enough constraints to your query so that the 
partial index constraints match your query constraints, otherwise the 
index will not be used.


Another point to check; make sure you don't cause type casts over 
indexed columns. That might (under certain conditions) cause a type cast 
on the column value of every indexed row, which does slow down things 
significantly. I believe this was solved in PostgreSQL 8.something.


--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Reece Hart




On Wed, 2006-08-09 at 16:54 -0400, Tom Laudeman wrote:

Great suggestion. I've read about CLUSTER, but never had a chance to use it. The only problem is that this table with 9 million records has 5 or 6 indexes. It is hard to pick the most used, but I'll bet CLUSTER will make at least one of the queries run very fast, especially for an index with a small number of distinct values.


Tom-

I found multi-column indexes and clustering to be extremely beneficial in 7.4. I still use them in 8.1, but I haven't compared them extensively with equivalent queries that use bitmap index scans. The obvious downside of having more indexes is the additional time and space overhead during insert, update, or delete.

The approach I took to design multi-column indexes was to run explain on representative queries and look for seq scans. The seq scans indicate which columns /might/ be well-served by indexes. In 7.4, the order of indexed columns was important. (I saw something in the 8.1 release notes that made me think that this was no longer true, but I haven't verified that.) 

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0








Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Michael Fuhr
On Wed, Aug 09, 2006 at 04:54:00PM -0400, Tom Laudeman wrote:
> I'll try CLUSTER (I'm looking forward to that test), but if we really 
> need speed, it will probably be necessary to create copies of the table, 
> or copy portions of the table elsewhere (essentially creating 
> materialized views, I suppose). I'm still trying to get my science 
> compatriot here to tell me which index he most wants to improve, then 
> I'll CLUSTER the table on that index.

If you enable statistics collection then you could use those
statistics to see which indexes are used the most.  Those indexes
might be good candidates for clustering.

http://www.postgresql.org/docs/8.1/interactive/monitoring-stats.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Tom Laudeman




Michael,
Great suggestion. I've read about CLUSTER, but never had a chance to
use it. The only problem is that this table with 9 million records has
5 or 6 indexes. It is hard to pick the most used, but I'll bet CLUSTER
will make at least one of the queries run very fast, especially for an
index with a small number of distinct values.

The speed of the query is (as Michael implies) limited to the rate at
which the disk can seek and read.  I have done experiments with views
and cursors; there was no improvement in speed. I've also tried only
pulling back  primary keys in the hope that a smaller amount of data
would more quickly be read into memory. No speed increase. I have also
raised all the usual memory limits, with the expected results (slight
speed improvements). 

I'll try CLUSTER (I'm looking forward to that test), but if we really
need speed, it will probably be necessary to create copies of the
table, or copy portions of the table elsewhere (essentially creating
materialized views, I suppose). I'm still trying to get my science
compatriot here to tell me which index he most wants to improve, then
I'll CLUSTER the table on that index.

Thanks!
Tom

Michael Fuhr wrote:

  On Wed, Aug 09, 2006 at 03:46:38PM +0200, Martijn van Oosterhout wrote:
  
  
On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote:


  Is there a tuning parameter I can change to increase speed of selects? 
Clearly, there's already some buffering going on since selecting an 
indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 
seconds on the second try (from pgsql).
  

Your OS is probably buffering, 1GB of RAM holds a lot of data. You can
try increasing the shared_buffers parameter, but if the delay is
getting data from the disk, that won't really help you.

  
  
If most of your queries use the same index then clustering on that
index might speed up initial (i.e., not-cached) queries by reducing
the number of disk pages that need to be read.  See the documentation
for more information.

http://www.postgresql.org/docs/8.1/interactive/sql-cluster.html

  


-- 
Tom Laudeman
[EMAIL PROTECTED]
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/





Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Scott Marlowe
Title: Re: [GENERAL] Tuning to speed select






On Wed, 2006-08-09 at 14:58, louis gonzales wrote:
> I'm not so sure about that, when you create a view on a table - at least
> with Oracle - which is a subset(the trivial or 'proper' subset is the
> entire table view) of the information on a table, when a select is
> issued against a table, Oracle at least, determines if there is a view
> already on a the table which potentially has a smaller amount of
> information to process - as long as the view contains the proper
> constraints that meet your 'select' criteria, the RDBMS engine will have
> fewer records to process - which I'd say, certainly constitutes a time
> benefit, in terms of 'performance gain.'
>
> Hence my reasoning behind determining IF there is a subset of the 'big
> table' that is frequented, I'd create a view on this, assuming
> postgresql does this too?  Maybe somebody else can answer that for the
> pgsql-general's general information?
>
> query-speed itself is going to be as fast/slow as your system is
> configured for, however my point was to shave some time off of a 1M+
> record table, but implementing views of 'frequently' visisted/hit
> records meeting the same specifications.

There are basically two ways to do views.  The simple way, is to have a
view represent a query that gets run everytime you call it.  The more
complex way is to "materialize" the view data, and put it into a new
table, and then update that table whenever the source table changes.

PostgreSQL has native support for the first type.  They're cheap and
easy, and work for most of the things people need views for (i.e. hiding
complexity).

PostgreSQL is extensible, and therefore you can institute the second
type (i.e. materialized views) on your own.  Thanksfully, someone else
has already done most of the work for us, by the name of Jonathan
Gardner, and you can find his nifty guide by typing "materialized views
postgresql" into google.

Gardner's materialized views support several update methods depending on
what you need from your mat views.  It's also a danged fine tutorial on
how to write some simple plpgsql functions.






Re: [GENERAL] Tuning to speed select

2006-08-09 Thread louis gonzales
I'm not so sure about that, when you create a view on a table - at least 
with Oracle - which is a subset(the trivial or 'proper' subset is the 
entire table view) of the information on a table, when a select is 
issued against a table, Oracle at least, determines if there is a view 
already on a the table which potentially has a smaller amount of 
information to process - as long as the view contains the proper 
constraints that meet your 'select' criteria, the RDBMS engine will have 
fewer records to process - which I'd say, certainly constitutes a time 
benefit, in terms of 'performance gain.'


Hence my reasoning behind determining IF there is a subset of the 'big 
table' that is frequented, I'd create a view on this, assuming 
postgresql does this too?  Maybe somebody else can answer that for the 
pgsql-general's general information?


query-speed itself is going to be as fast/slow as your system is 
configured for, however my point was to shave some time off of a 1M+ 
record table, but implementing views of 'frequently' visisted/hit 
records meeting the same specifications.


Harald Armin Massa wrote:


Louis,

Views certainly help in managing complexity. They do nothing to 
improve query-speed.


Querying a view gets rewritten to queries to the underlying tables on 
the fly.
(as long as there are no materialized views, which are still on a the 
TODO list)


--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all. 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Richard Broersma Jr
> Views certainly help in managing complexity. They do nothing to improve
> query-speed.
> 
> Querying a view gets rewritten to queries to the underlying tables on the
> fly.
> (as long as there are no materialized views, which are still on a the TODO
> list)

Would partial indexs on the most queried regions of the table help in query 
speed?

Regards,

Richard Broersma Jr.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Harald Armin Massa
Louis,Views certainly help in managing complexity. They do nothing to improve query-speed. Querying a view gets rewritten to queries to the underlying tables on the fly.(as long as there are no materialized views, which are still on a the TODO list)
-- GHUM Harald Massapersuadere et programmareHarald Armin MassaReinsburgstraße 202b70197 Stuttgart0173/9409607-Let's set so double the killer delete select all.


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread louis gonzales
What about creating views on areas of the table that are queried often?  
I don't know if you have access or the ability to find what type of 
trends the table has, in terms of queries, but if you create some views 
on frequently visited information, this could also help.


Tom Laudeman wrote:


Hi,

I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM 
and an IDE hard drive. My big table has around 9 million records.


Is there a tuning parameter I can change to increase speed of selects? 
Clearly, there's already some buffering going on since selecting an 
indexed ~50,000 records takes 17 seconds on the first try, and only 
0.5 seconds on the second try (from pgsql).


cowpea=> explain analyze select bs_fk from blast_result where 
si_fk=11843254;
QUERY 
PLAN 
 

Index Scan using si_fk_index on blast_result  (cost=0.00..22874.87 
rows=58118 width=4) (actual time=112.249..17472.935 rows=50283 loops=1)

  Index Cond: (si_fk = 11843254)
Total runtime: 17642.522 ms
(3 rows)

cowpea=>  explain analyze select bs_fk from blast_result where 
si_fk=11843254;
  QUERY 
PLAN   
 

Index Scan using si_fk_index on blast_result  (cost=0.00..22874.87 
rows=58118 width=4) (actual time=0.178..341.643 rows=50283 loops=1)

  Index Cond: (si_fk = 11843254)
Total runtime: 505.011 ms
(3 rows)

cowpea=> 



Thanks,
Tom





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Michael Fuhr
On Wed, Aug 09, 2006 at 03:46:38PM +0200, Martijn van Oosterhout wrote:
> On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote:
> > Is there a tuning parameter I can change to increase speed of selects? 
> > Clearly, there's already some buffering going on since selecting an 
> > indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 
> > seconds on the second try (from pgsql).
> 
> Your OS is probably buffering, 1GB of RAM holds a lot of data. You can
> try increasing the shared_buffers parameter, but if the delay is
> getting data from the disk, that won't really help you.

If most of your queries use the same index then clustering on that
index might speed up initial (i.e., not-cached) queries by reducing
the number of disk pages that need to be read.  See the documentation
for more information.

http://www.postgresql.org/docs/8.1/interactive/sql-cluster.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Tuning to speed select

2006-08-09 Thread Martijn van Oosterhout
On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote:
> Hi,
> 
> I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM 
> and an IDE hard drive. My big table has around 9 million records.
> 
> Is there a tuning parameter I can change to increase speed of selects? 
> Clearly, there's already some buffering going on since selecting an 
> indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 
> seconds on the second try (from pgsql).

Your OS is probably buffering, 1GB of RAM holds a lot of data. You can
try increasing the shared_buffers parameter, but if the delay is
getting data from the disk, that won't really help you.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature