Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-12 Thread Michael March
As a postscript to these tests.. I just tried the 500GB Monentus XT hybrid
SSD/HDD drive. I had this fantasy that it would at least do better than the
7200 rpm desktop drive.

Oh lord, my gut was wrong.  The performance was inconsistent and never over
2/3rds the performance of the slowest desktop drive.

On Sat, Aug 7, 2010 at 4:47 PM, Michael March  wrote:

> If anyone is interested I just completed a series of benchmarks of stock
> Postgresql running on a normal HDD vs a SSD.
>
> If you don't want to read the post, the summary is that SSDs are 5 to 7
> times faster than a 7200RPM HDD drive under a pgbench load.
>
>
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
>
> Thanks!
>
> --
> mma...@gmail.com
>



-- 


Michael F. March - mma...@gmail.com


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-12 Thread Arjen van der Meijden

On 13-8-2010 1:40 Scott Carey wrote:

Agreed.  There is a HUGE gap between "ooh ssd's are fast, look!" and
engineering a solution that uses them properly with all their
strengths and faults.  And as 'gnuoytr' points out, there is a big
difference between an Intel SSD and say, this thing:
http://www.nimbusdata.com/products/s-class_overview.html


From the description it sounds as if its either FreeBSD or OpenSolaris 
with ZFS with some webinterface-layer. That's not a bad thing per se, 
but as the site suggests its 'only' $25k for the smallest (2.5TB?) 
device. That makes it very likely that it are "off the shelf" MLC flash 
drives. Given the design of the device and the pricing it probably are 
your average 2.5"-drives with 100, 200 or 400GB capacity (maybe OCZ 
vertex 2 pro, which do have such a capacitor?), similar to the Intel SSD 
you compared it to.
And than we're basically back to square one, unless the devices have a 
capacitor or ZFS works better with SSD-drives to begin with (it will at 
least know silent data corruption did occur).


There are of course devices that are not built on top of normal disk 
form factor SSD-drives like the Ramsan devices or Sun's F5100.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-12 Thread Scott Carey

On Aug 11, 2010, at 9:30 PM, Greg Smith wrote:

> Scott Carey wrote:
>> What is the likelihood that your RAID card fails, or that the battery that 
>> reported 'good health' only lasts 5 minutes and you lose data before power 
>> is restored?   What is the likelihood of human error?
>> 
> 
> These are all things that happen sometimes, sure.  The problem with the 
> cheap SSDs is that they happen downright often if you actually test for 
> it.  If someone is aware of the risk and makes an informed decision, 
> fine.  But most of the time I see articles like the one that started 
> this thread that are oblivious to the issue, and that's really bad.
> 

Agreed.  There is a HUGE gap between "ooh ssd's are fast, look!" and 
engineering a solution that uses them properly with all their strengths and 
faults.  And as 'gnuoytr' points out, there is a big difference between an 
Intel SSD and say, this thing: 
http://www.nimbusdata.com/products/s-class_overview.html 

> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very bad plan when using VIEW and IN (SELECT...*)

2010-08-12 Thread Carlo Stonebanks

Ref these two queries against a view:

-- QUERY 1, executes < 0.5 secs
SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id IN (13083101)

-- QUERY 2, executes > 13.5 secs
SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id IN (SELECT 13083101)

I am using the simple IN (SELECT n) in QUERY 2 to simplify the problem. I 
noticed the oddity of the behaviour when I used a proper "IN (SELECT myId 
FROM myTable)"  but the planner shows the same behaviour even if not 
selecting from a table - just the SELECT keyword is enough.


Plans are below. The view has an internal UNION.
Any explanation as to why this happens?

The actualt view is listed at the very bottom, if relevant.

Carlo



QUERY 1 PLAN
"Unique  (cost=25.48..25.69 rows=2 width=417) (actual time=0.180..0.190
rows=2 loops=1)"
"  ->  Sort  (cost=25.48..25.48 rows=2 width=417) (actual time=0.179..0.180
rows=2 loops=1)"
"Sort Key: "*SELECT* 1".provider_id, (NULL::integer), "*SELECT*
1".master_id, "*SELECT* 1".client_ids, "*SELECT* 1".upin, "*SELECT*
1".medical_education_number, "*SELECT* 1".abmsuid, "*SELECT* 1".npi,
"*SELECT* 1".npi_status_code, "*SELECT* 1".cc_id, "*SELECT* 1".aoa_id,
"*SELECT* 1".last_name, "*SELECT* 1".first_name, "*SELECT* 1".middle_name,
"*SELECT* 1".suffix, "*SELECT* 1".display_name, "*SELECT* 1".display_title,
"*SELECT* 1".nickname, "*SELECT* 1".familiar_name, "*SELECT* 1".pubmed_name,
"*SELECT* 1".master_name, "*SELECT* 1".display_name_orig, (NULL::text),
"*SELECT* 1".gender, "*SELECT* 1".birth_year, "*SELECT* 1".birth_month,
"*SELECT* 1".birth_day, "*SELECT* 1".clinical_interest, "*SELECT*
1".research_interest, "*SELECT* 1".summary, "*SELECT* 1".comments, "*SELECT*
1".degree_types, "*SELECT* 1".provider_type_ids, "*SELECT*
1".provider_status_code, "*SELECT* 1".provider_status_year, "*SELECT*
1".created, "*SELECT* 1".unique_flag, "*SELECT* 1".is_locked, "*SELECT*
1".provider_standing_code, "*SELECT* 1".impt_source_date, "*SELECT*
1".input_resource_id, "*SELECT* 1".input_source_ids"
"Sort Method:  quicksort  Memory: 27kB"
"->  Append  (cost=0.00..25.47 rows=2 width=417) (actual
time=0.078..0.143 rows=2 loops=1)"
"  ->  Subquery Scan "*SELECT* 1"  (cost=0.00..8.59 rows=1
width=408) (actual time=0.078..0.079 rows=1 loops=1)"
"->  Index Scan using provider_provider_id_idx on
provider p  (cost=0.00..8.58 rows=1 width=408) (actual time=0.076..0.077
rows=1 loops=1)"
"  Index Cond: (provider_id = 13083101)"
"  ->  Subquery Scan "*SELECT* 2"  (cost=0.00..16.87 rows=1
width=417) (actual time=0.061..0.062 rows=1 loops=1)"
"->  Nested Loop  (cost=0.00..16.86 rows=1 width=417)
(actual time=0.055..0.056 rows=1 loops=1)"
"  ->  Index Scan using
provider_name_pid_rec_stat_idx on provider_alias pa  (cost=0.00..8.27 rows=1
width=32) (actual time=0.047..0.047 rows=1 loops=1)"
"Index Cond: (provider_id = 13083101)"
"  ->  Index Scan using provider_provider_id_idx on
provider p  (cost=0.00..8.58 rows=1 width=389) (actual time=0.005..0.006
rows=1 loops=1)"
"Index Cond: (p.provider_id = 13083101)"
"Total runtime: 0.371 ms"

QUERY 2 PLAN
"Merge IN Join  (cost=2421241.80..3142039.99 rows=30011 width=2032) (actual
time=13778.400..13778.411 rows=2 loops=1)"
"  Merge Cond: ("*SELECT* 1".provider_id = (13083101))"
"  ->  Unique  (cost=2421241.77..3066486.33 rows=6002275 width=417) (actual
time=13778.119..13778.372 rows=110 loops=1)"
"->  Sort  (cost=2421241.77..2436247.46 rows=6002275 width=417)
(actual time=13778.118..13778.163 rows=110 loops=1)"
"  Sort Key: "*SELECT* 1".provider_id, (NULL::integer),
"*SELECT* 1".master_id, "*SELECT* 1".client_ids, "*SELECT* 1".upin,
"*SELECT* 1".medical_education_number, "*SELECT* 1".abmsuid, "*SELECT*
1".npi, "*SELECT* 1".npi_status_code, "*SELECT* 1".cc_id, "*SELECT*
1".aoa_id, "*SELECT* 1".last_name, "*SELECT* 1".first_name, "*SELECT*
1".middle_name, "*SELECT* 1".suffix, "*SELECT* 1".display_name, "*SELECT*
1".display_title, "*SELECT* 1".nickname, "*SELECT* 1".familiar_name,
"*SELECT* 1".pubmed_name, "*SELECT* 1".master_name, "*SELECT*
1".display_name_orig, (NULL::text), "*SELECT* 1".gender, "*SELECT*
1".birth_year, "*SELECT* 1".birth_month, "*SELECT* 1".birth_day, "*SELECT*
1".clinical_interest, "*SELECT* 1".research_interest, "*SELECT* 1".summary,
"*SELECT* 1".comments, "*SELECT* 1".degree_types, "*SELECT*
1".provider_type_ids, "*SELECT* 1".provider_status_code, "*SELECT*
1".provider_status_year, "*SELECT* 1".created, "*SELECT* 1".unique_flag,
"*SELECT* 1".is_locked, "*SELECT* 1".provider_standing_code, "*SELECT*
1".impt_source_date, "*SELECT* 1".input_resource_id, "*SELECT*
1".input_source_ids"
"  Sort Method:  external merge  Disk: 423352kB"
"  ->  Append  (cost=0.00..596598.30 rows=6002275 width=417)
(actual time=0.039..7879.715 rows=131263

Re: [PERFORM] Slow function in queries SELECT clause.

2010-08-12 Thread Davor J.
> "Tom Lane"  wrote in message 
> news:25116.1277047...@sss.pgh.pa.us...
>> "Davor J."  writes:
>>> Suppose 2 functions: factor(int,int) and offset(int, int).
>>> Suppose a third function: convert(float,int,int) which simply returns
>>> $1*factor($2,$3)+offset($2,$3)
>>> All three functions are IMMUTABLE.
>>
>> You should write the third function as a SQL function, which'd allow it
>> to be inlined.
>>
>>> VERY FAST (half a second):
>>> 
>>> SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;
>>
>> In this case both factor() calls are folded to constants, hence executed
>> only once.
>>
>>> VERY SLOW (a minute):
>>> 
>>> SELECT convert(data, 1, 2) FROM tbl_data;
>>
>> Without inlining, there's no hope of any constant-folding here.
>> The optimizer just sees the plpgsql function as a black box and
>> can't do anything with it.
>>
> Your concepts of "inlining" and "black box" really cleared things up for 
> me. With fnc_unit_convert() written in SQL and declared as STABLE I indeed 
> have fast performance now.

A note on performance here: If I declare the fast SQL function 
fnc_unit_convert() as STRICT or as SECURITY DEFINER, then I suddenly get 
slow performance again (i.e. no apparent inlining). 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-12 Thread Brad Nicholson

 On 10-08-12 03:22 AM, Arjen van der Meijden wrote:

On 12-8-2010 2:53 gnuo...@rcn.com wrote:

- The value of SSD in the database world is not as A Faster HDD(tm).
Never was, despite the naive' who assert otherwise.  The value of SSD
is to enable BCNF datastores.  Period.  If you're not going to do
that, don't bother.  Silicon storage will never reach equivalent
volumetric density, ever.  SSD will never be useful in the byte bloat
world of xml and other flat file datastores (resident in databases or
not).  Industrial strength SSD will always be more expensive/GB, and
likely by a lot.  (Re)factoring to high normalization strips out an
order of magnitude of byte bloat, increases native data integrity by
as much, reduces much of the redundant code, and puts the ACID where
it belongs.  All good things, but not effortless.


It is actually quite common to under-utilize (short stroke) hard 
drives in the enterprise world. Simply because 'they' need more IOps 
per amount of data than a completely utilized disk can offer.
As such the expense/GB can be much higher than simply dividing the 
capacity by its price (and if you're looking at fiber channel disks, 
that price is quite high already). And than it is relatively easy to 
find enterprise SSD's with better pricing for the whole system as soon 
as the IOps are more important than the capacity.


And when you compare the ongoing operational costs of rack space, 
powering and cooling for big arrays full of spinning disks to flash 
based solutions the price comparison evens itself out even more.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-12 Thread Justin Pitts
>>> As others said, RAID6 is RAID5 + a hot spare.
>>
>> No. RAID6 is NOT RAID5 plus a hot spare.
>
> The original phrase was that RAID 6 was like RAID 5 with a hot spare
> ALREADY BUILT IN.

Built-in, or not - it is neither. It is more than that, actually. RAID
6 is like RAID 5 in that it uses parity for redundancy and pays a
write cost for maintaining those parity blocks, but will maintain data
integrity in the face of 2 simultaneous drive failures.

In terms of storage cost, it IS like paying for RAID5 + a hot spare,
but the protection is better.

A RAID 5 with a hot spare built in could not survive 2 simultaneous
drive failures.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 32 vs 64 bit build on Solaris Sparc

2010-08-12 Thread Dimitri
Hi Joe,

the general rule on Solaris SPARC is:
  - if you need to address a big size of memory (over 4G): compile in 64bit
  - otherwise: compile in 32bit ;-)

It's true that 32bit code will run faster comparing to 64bit ont the
64bit SPARC - you'll operate with 2 times shorter addresses, and in
some cases SPARC will be able to execute 2 operations in parallel on
32bit code, while it'll be still one operation on 64bit code.. - But
it's all about the code, because once you start to do I/O requests all
kind of optimization on the instructions will be lost due I/O latency
;-))

So, as usual, a real answer in each case may be obtained only by a real test..
Just test both versions and you'll see yourself what is a valid in
your case :-))

Same problem regarding compilers: in some cases GCC4 will give a
better result, in some cases Sun Studio will be better (there are many
posts in blogs about optimal compiler options to use).. - don't
hesitate to try and don't forget to share here with others :-))

Rgds,
-Dimitri


On 8/11/10, Joseph Conway  wrote:
> With a 16 CPU, 32 GB Solaris Sparc server, is there any conceivable
> reason to use a 32 bit build rather than a 64 bit build? Apparently the
> Sun PostgreSQL package includes a README that indicates you might want
> to think twice about using 64 bit because it is slower -- this seems
> like outdated advice, but I was looking for confirmation one way or the
> other.
>
> Also semi-related question: when building from source, using gcc,
> enabling debug (but *not* assert) is normally not much of a performance
> hit. Is the same true if you build with the Sun CC?
>
> Thanks in advance for any thoughts/experiences.
>
> Joe
>
>
>
> --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-12 Thread Arjen van der Meijden

On 12-8-2010 2:53 gnuo...@rcn.com wrote:

- The value of SSD in the database world is not as A Faster HDD(tm).
Never was, despite the naive' who assert otherwise.  The value of SSD
is to enable BCNF datastores.  Period.  If you're not going to do
that, don't bother.  Silicon storage will never reach equivalent
volumetric density, ever.  SSD will never be useful in the byte bloat
world of xml and other flat file datastores (resident in databases or
not).  Industrial strength SSD will always be more expensive/GB, and
likely by a lot.  (Re)factoring to high normalization strips out an
order of magnitude of byte bloat, increases native data integrity by
as much, reduces much of the redundant code, and puts the ACID where
it belongs.  All good things, but not effortless.


It is actually quite common to under-utilize (short stroke) hard drives 
in the enterprise world. Simply because 'they' need more IOps per amount 
of data than a completely utilized disk can offer.
As such the expense/GB can be much higher than simply dividing the 
capacity by its price (and if you're looking at fiber channel disks, 
that price is quite high already). And than it is relatively easy to 
find enterprise SSD's with better pricing for the whole system as soon 
as the IOps are more important than the capacity.


So in the current market, you may already be better off, price-wise, 
with (expensive) SSD if you need IOps rather than huge amounts of 
storage. And while you're in both cases not comparing separate disks to 
SSD, you're replacing a 'disk based storage system' with a '(flash) 
memory based storage system' and it basically becomes 'A Faster HDD' ;)
But you're right, that for data-heavy applications, completely replacing 
HDD's with some form of SSD is not going to happen soon, maybe never.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance