Re: [GENERAL] Rapid Seek Devices (feature request)

2009-08-17 Thread Jeremy Harris

On 08/17/2009 03:24 AM, Craig Ringer wrote:

On 16/08/2009 9:06 PM, NTPT wrote:

So I suggest we should have random_page_cost and
Sequential_page_cost configurable on per tablespace basis.


That strikes me as a REALLY good idea, personally, though I don't know
enough about the planner to factor in implementation practicalities and
any cost for people _not_ using the feature.


Could not pgsql *measure* these costs (on a sampling basis, and with long
time-constants)?

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rapid Seek Devices (feature request)

2009-08-17 Thread NTPT
AFAIK postgresql measure characteristic of the data distribution in the 
tables and indexes (that is what vacuum ANALYSE does) , but results of 
that  measures  are **weighted by** random_page_cost and 
sequential_page_cost. So  measurements are correct,  but costs (weight) 
should reflect a real speed for sequentional and random operation of the 
storage device(s) (tablespaces) involved.


Jeremy Harris napsal(a):

On 08/17/2009 03:24 AM, Craig Ringer wrote:

On 16/08/2009 9:06 PM, NTPT wrote:

So I suggest we should have random_page_cost and
Sequential_page_cost configurable on per tablespace basis.


That strikes me as a REALLY good idea, personally, though I don't know
enough about the planner to factor in implementation practicalities and
any cost for people _not_ using the feature.


Could not pgsql *measure* these costs (on a sampling basis, and with long
time-constants)?

- Jeremy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rapid Seek Devices (feature request)

2009-08-17 Thread Greg Stark
2009/8/17 Jeremy Harris j...@wizmail.org:
 Could not pgsql *measure* these costs (on a sampling basis, and with long
 time-constants)?

In theory, sure. In practice, well, there are some engineering
challenges to solve.

1) The cost model isn't perfect so the it's not clear exactly what to
measure to get the best result. As the cost model stands now you would
have to include some of the cpu time as well as the i/o time.

2) Timing is expensive. On many platforms getting the current time
takes a long time and would make a big difference to the runtime of
the query.



-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Rapid Seek Devices (feature request)

2009-08-16 Thread NTPT

Hi all

I have some idea/feature request.

Now, there are several devices available, that can be called rapid seek 
devices (RSD in future text). I mean SSD disks, some devices like 
gigabyte I-RAM and other (semi)profesional ram disk like solutions for 
example Acard ANS-9010 . Rapid seek because there are not moving parts, 
thus non-sequential access (seeks) have no penalty (no moving heads 
time). I think  it would be cool to have explicitly support for that 
kind of devices.



1: Postgresql has its tablespaces, so creating a tablespce on RSD and 
put some speed critical indexes/tables there, is optimal from the costs 
point (RSD are not cheap, so use it for the speed critical  tasks within 
single database is reasonable)  but it is only a half of the needed. 
Query planner act strongly pro sequentional reads while it try to 
guess best query plan. This bias is fine for classic disk based storage, 
but is not necesery in RSD. So, for tables/indexes on RSD, a suboptimal 
query plan is likely to be generated  (fix me if I am wrong).


So I suggest we should have random_page_cost and 
Sequential_page_cost configurable  on per tablespace basis. And query 
planner that is aware of it, include situation where tables and 
corresponding indexes are on different speed tablespaces.


Imagine this scenario: Default tablespace on sata disk, 
random_page_cost=8, sequential_page_cost=3 (from config file),


CREATE TABLESPACE fast /path/to/multiple_15k_rpm_SCSI_hw_RAID 
random_page_cost=4 sequential_page_cost=2
CREATE TABLESPACE lightspeed /path/to/SSD_OR_I-RAM random_page_cost=1 
sequential_page_cost=1.2


and now scatter your tables / indexes around new tablespaces :-)


2: Many of that RSD devices are not so much reliable (power outage in 
ramdisk, weak auxillarity battery in i-ram like devices, block wear 
out in SSD). While moving only an indexes to this device (  I found 
this article showing there IS a big difference having only an indexes on 
SSD -  http://linux.com/archive/feature/142658.) may be appropriate, and 
just reindex in worst case, this is not suitable in high availability 
enviroment.


So I suggest to have something like this to solve reliability problems 
on some RSD:


CREATE TABLESPACE lightspeed /path/to/SSD_OR_i-RAM random_page_cost=1 
sequential_page_cost=1.2 TEE name_of_the_slow_tablespace


ie read from fast tablespace, write to both fast and slow, reconstruct 
fast from slow if appropriate.


Thanx for your attention.

PS: Execuse my wrong english


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rapid Seek Devices (feature request)

2009-08-16 Thread Craig Ringer

On 16/08/2009 9:06 PM, NTPT wrote:


So I suggest we should have random_page_cost and
Sequential_page_cost configurable on per tablespace basis.


That strikes me as a REALLY good idea, personally, though I don't know 
enough about the planner to factor in implementation practicalities and 
any cost for people _not_ using the feature.



2: Many of that RSD devices are not so much reliable (power outage in
ramdisk, weak auxillarity battery in i-ram like devices, block wear
out in SSD).


[snip]


ie read from fast tablespace, write to both fast and slow, reconstruct
fast from slow if appropriate.


This can probably be done as well or better at the OS block layer, using 
device-mapper or the `md' driver (on Linux).


What'd be interesting, though, would be if Pg had support for 
auto-rebuilding indexes and (if/when explicit support is added) 
materialized views if it finds the backing files are missing. This would 
be helpful for such transient devices as RAM disks if you didn't want to 
bother having physical storage backing it.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general