Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-10 Thread Merlin Moncure
On Thu, Jul 5, 2012 at 10:33 PM, Reza Taheri rtah...@vmware.com wrote:
 Just to be clear, we have a number of people from different companies working 
 on the kit. This is not a VMware project, it is a TPC project. But I hear you 
 regarding coming in from the cold and asking for a major db engine feature. I 
 know that I have caused a lot of rolling eyes. Believe me, I have had the 
 same (no, worse!) reaction from every one of the commercial database 
 companies in response to similar requests over the past 25 years.

No rolling of eyes from me.  Clustered indexes work and if your table
access mainly hits the table through that index you'll see enormous
reductions in i/o.  Index only scans naturally are a related
optimization in the same vein.  Denying that is just silly.  BTW,
putting postgres through a standard non trivial benchmark suite over
reasonable hardware, reporting results, identifying bottlenecks, etc.
is incredibly useful.  Please keep it up, and don't be afraid to ask
for help here.  (one thing I'd love to see is side by side results
comparing 8.4 to 9.1 to 9.2).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-10 Thread Reza Taheri
Hi Merlin,
We are moving up to a larger testbed, and are planning to use 9.2. But the 
results will not comparable to our 8.4 results due to differences in hardware. 
But that comparison is a useful one. I'll try for a quick test on the new 
hardware with 8.4 before moving to 9.2.

Thanks,
Reza

 -Original Message-
 From: Merlin Moncure [mailto:mmonc...@gmail.com]
 Sent: Tuesday, July 10, 2012 12:06 PM
 To: Reza Taheri
 Cc: Greg Smith; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Thu, Jul 5, 2012 at 10:33 PM, Reza Taheri rtah...@vmware.com wrote:
  Just to be clear, we have a number of people from different companies
 working on the kit. This is not a VMware project, it is a TPC project. But I
 hear you regarding coming in from the cold and asking for a major db engine
 feature. I know that I have caused a lot of rolling eyes. Believe me, I have
 had the same (no, worse!) reaction from every one of the commercial
 database companies in response to similar requests over the past 25 years.
 
 No rolling of eyes from me.  Clustered indexes work and if your table access
 mainly hits the table through that index you'll see enormous reductions in
 i/o.  Index only scans naturally are a related optimization in the same vein.
 Denying that is just silly.  BTW, putting postgres through a standard non
 trivial benchmark suite over reasonable hardware, reporting results,
 identifying bottlenecks, etc.
 is incredibly useful.  Please keep it up, and don't be afraid to ask for help
 here.  (one thing I'd love to see is side by side results comparing 8.4 to 
 9.1 to
 9.2).
 
 merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Robert Klemme
On Wed, Jul 4, 2012 at 1:13 AM, Reza Taheri rtah...@vmware.com wrote:

 Checking online, the subject of clustered indexes for PostgreSQL comes up
 often. PGSQL does have a concept called “clustered table”, which means a
 table has been organized in the order of an index. This would help with
 sequential accesses to a table, but has nothing to do with this problem.
 PGSQL folks sometimes refer to what we want as “integrated index”.

I do understand this correctly that we are speaking about the concept
which is known under the term index organized table (IOT) in Oracle
land, correct?

http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#CBBJEBIH

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Daniel,
Yes, it sounds like GIT will take us half the way there by getting rid of much 
of the index I/O if we cluster the tables. We can set the fillfactor parameter 
to keep tables sorted after updates. I am not sure what impact inserts will 
have since the primary key keeps growing with new inserts, so perhaps the table 
will maintain the cluster order and the benefits of GIT for new rows, too. GIT 
won't save CPU cycles the way a clustered/integrated index would, and actually 
adds to the CPU cost since the data page has to be searched for the desired 
tuple.

Thanks,
Reza

 -Original Message-
 From: Daniel Farina [mailto:dan...@heroku.com]
 Sent: Wednesday, July 04, 2012 6:40 AM
 To: Craig Ringer
 Cc: Reza Taheri; pgsql-performance@postgresql.org; Robert Haas
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer ring...@ringerc.id.au wrote:
  On 07/04/2012 07:13 AM, Reza Taheri wrote:
 
  Following the earlier email introducing the TPC-V benchmark, and that
  we are developing an industry standard benchmarking kit for TPC-V
  using PostgreSQL, here is a specific performance issue we have run into.
 
 
  Which version of PostgreSQL are you using?
 
  How has it been tuned beyond the defaults - autovacuum settings,
  shared_buffers, effective_cache_size, WAL settings, etc?
 
  How much RAM is on the blade? What OS and version are on the blade?
 
 
  Comparing the table sizes, we are close to 2X larger (more on this in
  a later note). But the index size is what stands out. Our overall
  index usage (again, after accounting for different numbers of rows) is
  4.8X times larger. 35% of our I/Os are to the index space. I am
  guessing that the 4.8X ballooning has something to do with this, and
  that in itself explains a lot about our high I/O rate, as well as
  higher CPU/tran cycles compared to MS SQL (we are  2.5-3 times slower).
 
  This is making me wonder about bloat issues and whether proper
  vacuuming is being done. If the visibility map and free space map
  aren't maintained by proper vaccum operation everything gets messy,
 fast.
 
  Well, MS SQL used a clustered index for CT, i.e., the data is held
  in the leaf pages of the index B-Tree. The data and index are in one
  data structure. Once you lookup the index, you also have the data at
  zero additional cost.
 
  [snip]
 
 
 
  Is the PGSQL community willing to invest in a feature that a) has been
  requested by many others already; and b) can make a huge difference in
  a benchmark that can lend substantial credibility to PGSQL performance?
 
 
  while PostgreSQL doesn't support covering indexes or clustered indexes
  at this point, 9.2 has added support for index-only scans, which are a
  half-way point of sorts. See:
 
http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-
 em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
 
  http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9
  337a21f98ac4ce850bb4145acf47ca27
 
  If at all possible please see how your test is affected by this
  PostgreSQL
  9.2 enhancement. It should make a big difference, and if it doesn't
  it's important to know why.
 
  (CC'd Robert Haas)
 
  I'm not sure what the best option for getting a 9.2 beta build for
  Windows is.
 
 
  As for the invest side - that's really a matter for EnterpriseDB,
  Command Prompt, Red Hat, and the other backers who're employing
 people
  to work on the DB. Consider asking on pgsql-hackers, too; if nothing
  else you'll get a good explanation of the current state and progress toward
 clustered indexes.
 
  Some links that may be useful to you are:
 
http://wiki.postgresql.org/wiki/Todo
Things that it'd be good to support/implement at some point.
  Surprisingly, covering/clustered indexes aren't on there or at least aren't
 easily found.
  It's certainly a much-desired feature despite its apparent absence
  from the TODO.
 
 I think there is, deservingly, a lot of hesitation to implement a strictly
 ordered table construct.  A similar feature that didn't quite get finished --
 but maybe can be beaten into shape -- is the grouped-index-tuple
 implementation:
 
 http://community.enterprisedb.com/git/
 
 It is mentioned on the TODO page.  It's under the category that is perhaps
 poorly syntactically overloaded in the world cluster.
 
 --
 fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Robert,
Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks.

Thanks,
Reza

 -Original Message-
 From: Robert Klemme [mailto:shortcut...@googlemail.com]
 Sent: Thursday, July 05, 2012 5:30 AM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Wed, Jul 4, 2012 at 1:13 AM, Reza Taheri rtah...@vmware.com wrote:
 
  Checking online, the subject of clustered indexes for PostgreSQL comes
  up often. PGSQL does have a concept called clustered table, which
  means a table has been organized in the order of an index. This would
  help with sequential accesses to a table, but has nothing to do with this
 problem.
  PGSQL folks sometimes refer to what we want as integrated index.
 
 I do understand this correctly that we are speaking about the concept which
 is known under the term index organized table (IOT) in Oracle land,
 correct?
 
 http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#CBB
 JEBIH
 
 Kind regards
 
 robert
 
 --
 remember.guy do |as, often| as.you_can - without end
 http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Samuel Gendler
On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri rtah...@vmware.com wrote:

 Hi Robert,
 Yes, the same concept. Oracle's IOT feature is used often with TPC
 benchmarks.


Reza, it would be very helpful if you were to provide the list with a lot
more information about your current software and hardware configuration
before coming to the conclusion that the only possible way forward is with
a significant architectural change to the db engine itself.  Not only is it
not at all clear that you are extracting maximum performance from your
current hardware and software, but I doubt anyone is particularly
interested in doing a bunch of development purely to game a benchmark.
 There has been significant discussion of the necessity and viability of
the feature you are requesting in the past, so you should probably start
where those discussions left off rather than starting the discussion all
over again from the beginning.  Of course, if vmware were to sponsor
development of the feature in question, it probably wouldn't require nearly
as much buy-in from the wider community.

Getting back to the current performance issues -  I have little doubt that
the MS SQL benchmark was set up and run by people who were intimately
familiar with MS SQL performance tuning.  You stated in your earlier email
that your team doesn't have significant postgresql-specific experience, so
it isn't necessarily surprising that your first attempt at tuning didn't
get the results that you are looking for. You stated that you have 14 SSDs
and 90 spinning drives, but you don't specify how they are combined and how
the database is laid out on top of them.  There is no mention of how much
memory is available to the system. We don't know how you've configured
postgresql's memory allocation or how your config weights the relative
costs of index lookups, sequential scans, etc.  The guidelines for this
mailing list include instructions for what information should be provided
when asking about performance improvements.
http://archives.postgresql.org/pgsql-performance/  Let's start by
ascertaining how your benchmark results can be improved without engaging in
a significant development effort on the db engine itself.


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Samuel,
The SSDs were used as a cache for the spinning drives. Here is a 30-second 
iostat sample representative of the whole run:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  24.870.00   12.54   62.390.000.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdd   0.00   137.37 3058.40  106.17 34691.60   974.1322.54
15.754.98   0.32 100.00
sde   0.00   136.07 3063.37  107.70 35267.07   975.0722.86
15.584.92   0.32 100.00
sdf   0.00   135.37 3064.23  109.53 35815.60   979.6023.19
15.824.99   0.32 100.00
sdg   0.00   136.97 3066.57  116.67 35196.53  1014.5322.75
15.874.99   0.31 100.00
sdi   0.00  2011.030.00   87.90 0.00  8395.73   191.03 
0.131.45   1.42  12.51
sdk   0.00   136.63 3066.83  107.53 35805.07   976.6723.17
16.015.04   0.32 100.00
sdm   0.00   138.50 3054.40  111.10 34674.27   998.4022.54
15.524.91   0.32 100.00
sdj   0.00   136.73 3058.70  118.20 35227.20  1019.7322.82
15.814.98   0.31 100.00
sdl   0.00   137.53 3044.97  109.33 34448.00   987.4722.47
15.785.00   0.32 100.00

The data and index tablespaces were striped across the 8 LUNs, and saw an 
average 5ms response. We can beef up the storage to handle more I/Os so that 
our utilization doesn't stay below 40%, but that misses the point: we have an 
I/O rate twice the commercial database because they used clustered indexes.

I provided more config details in an earlier email.

As for asking for development to game a benchmark, no one is asking for 
benchmark specials. The question of enhancements in response to benchmark needs 
is an age old question. We can get into that, but it's really a different 
discussion. Let me just expose the flip side of it: are we willing to watch 
people use other databases to run benchmarks but feel content that no features 
were developed specifically in response to benchmark results?

I am trying to engage with the community. We can drown the mailing list with 
details. So I decided to open the discussion with the high level points, and we 
will give you all the details that you want as we move forward.

Thanks,
Reza

From: Samuel Gendler [mailto:sgend...@ideasculptor.com]
Sent: Thursday, July 05, 2012 12:46 PM
To: Reza Taheri
Cc: Robert Klemme; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance


On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri 
rtah...@vmware.commailto:rtah...@vmware.com wrote:
Hi Robert,
Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks.

Reza, it would be very helpful if you were to provide the list with a lot more 
information about your current software and hardware configuration before 
coming to the conclusion that the only possible way forward is with a 
significant architectural change to the db engine itself.  Not only is it not 
at all clear that you are extracting maximum performance from your current 
hardware and software, but I doubt anyone is particularly interested in doing a 
bunch of development purely to game a benchmark.  There has been significant 
discussion of the necessity and viability of the feature you are requesting in 
the past, so you should probably start where those discussions left off rather 
than starting the discussion all over again from the beginning.  Of course, if 
vmware were to sponsor development of the feature in question, it probably 
wouldn't require nearly as much buy-in from the wider community.

Getting back to the current performance issues -  I have little doubt that the 
MS SQL benchmark was set up and run by people who were intimately familiar with 
MS SQL performance tuning.  You stated in your earlier email that your team 
doesn't have significant postgresql-specific experience, so it isn't 
necessarily surprising that your first attempt at tuning didn't get the results 
that you are looking for. You stated that you have 14 SSDs and 90 spinning 
drives, but you don't specify how they are combined and how the database is 
laid out on top of them.  There is no mention of how much memory is available 
to the system. We don't know how you've configured postgresql's memory 
allocation or how your config weights the relative costs of index lookups, 
sequential scans, etc.  The guidelines for this mailing list include 
instructions for what information should be provided when asking about 
performance improvements.  http://archives.postgresql.org/pgsql-performance/  
Let's start by ascertaining how your benchmark results can be improved without 
engaging in a significant development effort on the db engine itself.





Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Samuel Gendler
On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com wrote:


 I provided more config details in an earlier email.

 ** **



I hate to disagree, but unless I didn't get a message sent to the list, you
haven't provided any details about your postgresql config or otherwise
adhered to the guidelines for starting a discussion of a performance
problem around here.  I just searched my mailbox and no email from you has
any such details.  Several people have asked for them, including myself.
 You say you will give any details we want, but this is at least the 3rd or
4th request for such details and they have not yet been forthcoming.


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer
First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out 
as a final release soon enough, and index only scans may make a /big/ 
difference for the problem you're currently having.


Looking at your configuration I have a few comments, but it's worth 
noting that I don't work with hardware at that scale, and I'm more used 
to tuning I/O bottlenecked systems with onboard storage rather than 
CPU-bottlenecked ones on big SANs. Hopefully now that you've posted your 
configuration and setup there might be interest from others.


If you're able to post an EXPLAIN ANALYZE or two for a query you feel is 
slow that certainly won't hurt. Using http://explain.depesz.com/ saves 
you the hassle of dealing with word-wrapping when posting them, btw.


As for your config:

I notice that your autovacuum settings are at their defaults. With heavy 
UPDATE / DELETE load this'll tend to lead to table and index bloat, so 
the DB has to scan more useless data to get what it needs. It also means 
table stats won't be maintained as well, potentially leading to poor 
planner decisions. The following fairly scary query can help identify 
bloat, as the database server doesn't currently have anything much built 
in to help you spot such issues:


http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and 
effective_io_concurrency so Pg has more idea of the scale of your 
hardware. The defaults are very conservative - it's supposed to be easy 
for people to use for simple things without melting their systems, and 
it's expected that anyone doing bigger work will tune the database.


http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big 
installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I 
lack the big hardware experience to know if they're appropriate, but 
they're not the extremely conservative defaults, which is a start.


Your random_page_cost and seq_page_cost are probably dead wrong for a 
SAN with RAM and SSD cache in front of fast disks. Their defaults are 
for local uncached spinning HDD media where seeks are expensive. The 
typical advice on such hardware is to set them to something more like 
seq_page_cost = 0.1  random_page_cost  = 0.15 - ie cheaper relative to 
the cpu cost, and with random I/O only a little more expensive than 
sequential I/O. What's right for your situation varies a bit based on DB 
size vs hardware size, etc; Greg discusses this more in his book.


What isolation level do your transactions use? This is significant 
because of the move to true serializable isolation with predicate 
locking in 9.0; it made serializable transactions a bit slower in some 
circumstances in exchange for much stronger correctness guarantees. The 
READ COMMITTED default was unchanged.




It also looks like you might not have seen the second part of my earlier 
reply:



while PostgreSQL doesn't support covering indexes or clustered indexes 
at this point, 9.2 has added support for index-only scans, which are a 
half-way point of sorts. See:


http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this 
PostgreSQL 9.2 enhancement. It should make a big difference, and if it 
doesn't it's important to know why.


(CC'd Robert Haas)



As for the invest side - that's really a matter for EnterpriseDB, 
Command Prompt, Red Hat, and the other backers who're employing people 
to work on the DB. Consider asking on pgsql-hackers, too; if nothing 
else you'll get a good explanation of the current state and progress 
toward clustered indexes.


Some links that may be useful to you are:

http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. 
Surprisingly, covering/clustered indexes aren't on there or at least 
aren't easily found. It's certainly a much-desired feature despite its 
apparent absence from the TODO.


http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer






Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer

On 07/06/2012 04:52 AM, Samuel Gendler wrote:



On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com 
mailto:rtah...@vmware.com wrote:



I provided more config details in an earlier email.



I hate to disagree, but unless I didn't get a message sent to the list


It looks like that might be the case. I got a message with Message-ID 
66ce997fb523c04e9749452273184c6c137cb88...@exch-mbx-113.vmware.com sent 
at Thu, 5 Jul 2012 11:33:46 -0700 that contained the basic info, 
postgresql.conf, etc. Belated, but it was sent. I can't find this 
message in the archives and the copy I got came direct to me via cc, so 
I suspect our friendly mailing list system has silently held it for 
moderation due to size/attachment.



I'll reproduce the content below, followed by an inline copy of the 
postgresql.conf with only changed lines:


On 07/06/2012 02:33 AM, Reza Taheri wrote:


OK, some config details.

We are using:

·Two blades of an HP BladeSystem c-Class c7000 with 2-socket Intel 
E5520 (Nehalem-EP) processors and 48GB of memory per blade


o8 cores, 16 threads per blade

o48GB of RAM per blade

·Storage was an EMC VNX5700 with 14 SSDs fronting 32 15K RPM drives

·The Tier B database VM was alone on a blade with 16 vCPUs, 40GB of 
memory, 4 virtual drives with various RAID levels


·The driver and Tier A VMs were on the second blade

oSo we set PGHOST on the client system to point to the server

·RHEL 6.1

·PostgreSQL 8.4

·unixODBC 2.3.2

We stuck with PGSQL 8.4 since it is the stock version shipped with 
RHEL 6. I am building a new, larger testbed, and will switch to PGSQL 
9 with that.





postgresql.conf:

[craig@ayaki ~]$ egrep -v '(^\s*#)|(^\s*$)' /tmp/postgresql2.conf  | cut 
-d '#' -f 1

listen_addresses = '*'
max_connections = 320
shared_buffers = 28GB
temp_buffers = 200MB
work_mem = 10MB
maintenance_work_mem = 10MB
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 20
wal_buffers = 16MB
checkpoint_segments = 128
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
default_statistics_target = 1
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'




Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Well, I keep failing to send an email with an attachment. Do I need a 
moderator's approval?

Yes, running on VMs and a lower bin processor. With the virtualization 
overhead, etc., I figure we would be running right around 2/3 of the Dell 
throughput if we were running the same DBMS.

I sent the following message twice today with attachments (postgresql.conf, 
etc.), and it hasn't been posted yet. Here it is without an attachment.



From: Reza Taheri 
Sent: Thursday, July 05, 2012 11:34 AM
To: 'Craig Ringer'
Cc: pgsql-performance@postgresql.org; Robert Haas
Subject: RE: [PERFORM] The need for clustered indexes to boost TPC-V performance

OK, some config details.
We are using:

*   Two blades of an HP BladeSystem c-Class c7000 with 2-socket Intel E5520 
(Nehalem-EP) processors and 48GB of memory per blade
o   8 cores, 16 threads per blade
o   48GB of RAM per blade
*   Storage was an EMC VNX5700 with 14 SSDs fronting 32 15K RPM drives
*   The Tier B database VM was alone on a blade with 16 vCPUs, 40GB of 
memory, 4 virtual drives with various RAID levels
*   The driver and Tier A VMs were on the second blade
o   So we set PGHOST on the client system to point to the server
*   RHEL 6.1
*   PostgreSQL 8.4
*   unixODBC 2.3.2

We stuck with PGSQL 8.4 since it is the stock version shipped with RHEL 6. I am 
building a new, larger testbed, and will switch to PGSQL 9 with that.

Postgres.conf is attached.

Thanks,
Reza

 -Original Message-
 From: Andy Colson [mailto:a...@squeakycode.net]
 Sent: Thursday, July 05, 2012 5:42 PM
 To: Samuel Gendler
 Cc: Reza Taheri; Robert Klemme; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On 07/05/2012 03:52 PM, Samuel Gendler wrote:
 
 
  On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com
 mailto:rtah...@vmware.com wrote:
 
 
  I provided more config details in an earlier email.
 
  __ __
 
 
 
  I hate to disagree, but unless I didn't get a message sent to the list, you
 haven't provided any details about your postgresql config or otherwise
 adhered to the guidelines for starting a discussion of a performance
 problem around here.  I just searched my mailbox and no email from you
 has any such details.  Several people have asked for them, including myself.
 You say you will give any details we want, but this is at least the 3rd or 4th
 request for such details and they have not yet been forthcoming.
 
 
 Reza, I went back and looked myself.  I see no specs on OS, or hardware
 unless you mean this:
 
 
  http://bit.ly/QeWXhE. This was run on a similar server, and the database
 size is close to ours.
 
 
 You're running on windows then?  Server is 96Gig ram, 8 cores, (dell
 poweredge T610).
 with two powervault MD1120 NAS's?
 
 But then I assume you were not running on that, were you.  You were
 running vmware on it, probably?
 
 
 -Andy

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer

On 07/06/2012 08:41 AM, Andy Colson wrote:

You're running on windows then?  Server is 96Gig ram, 8 cores, (dell 
poweredge T610).

with two powervault MD1120 NAS's?


Thankfully they're running Pg on Linux (RHEL 6) . It seems that tests to 
date have been run against 8.4 which is pretty elderly, but hopefully 
it'll be brought up to 9.1 or 9.2beta soon.


While the original poster should've given a reasonable amount of 
information to start with when asking performance questions - as per the 
mailing list guidance and plain common sense - more info /was/ sent 
later on /but the lists.postgresql.org mailman ate it /- or held it for 
moderation, anyway. The OP can't be blamed when Pg's mailing list 
manager eats mesages with attachments! Also, remember that not everyone 
uses community mailing lists regularly; it takes a little learning to 
get used to keeping track of conversations, to inline reply style, etc.


--
Craig Ringer


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Greg Smith

On 07/03/2012 07:13 PM, Reza Taheri wrote:

Is the PGSQL community willing to invest in a feature that a) has been
requested by many others already; and b) can make a huge difference in a
benchmark that can lend substantial credibility to PGSQL performance?


Larger PostgreSQL features usually get built because companies sponsor 
their development, they pass review as both useful  correct, and then 
get committed.  Asking the community to invest in a new feature isn't 
quite the right concept.  Yes, everyone would like one of the smaller 
index representations.  I'm sure we can find reviewers willing to look 
at such a feature and committers who would also be interested enough to 
commit it, on a volunteer basis.  But a feature this size isn't going to 
spring to life based just on volunteer work.  The most useful questions 
would be who would be capable of writing that feature? and how can we 
get them sponsored to focus on it?  I can tell from your comments yet 
what role(s) in that process VMWare wants to take on internally, and 
which it's looking for help with.  The job of convincing people it's a 
useful feature isn't necessary--we know that's true.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer

On 07/06/2012 09:00 AM, Reza Taheri wrote:

Well, I keep failing to send an email with an attachment. Do I need a 
moderator's approval?


Probably. If so, it's really annoying that mailman isn't telling you 
this via a held for moderation auto-reply. It should be.



We stuck with PGSQL 8.4 since it is the stock version shipped with RHEL 6. I am 
building a new, larger testbed, and will switch to PGSQL 9 with that.


Just so you know, as per PostgreSQL versioning policy major releases are 
x.y, eg 8.4, 9.0 and 9.1 are distinct major releases.


http://www.postgresql.org/support/versioning/

I've always found that pretty odd and wish major releases would just 
increment the first version part, but the policy states how it's being 
done. It's important to realize this when you're talking about Pg 
releases, because 8.4, 9.0, 9.1 and 9.2 are distinct releases with 
different feature sets, so postgresql 9 doesn't mean much.


--
Craig Ringer



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Just to be clear, we have a number of people from different companies working 
on the kit. This is not a VMware project, it is a TPC project. But I hear you 
regarding coming in from the cold and asking for a major db engine feature. I 
know that I have caused a lot of rolling eyes. Believe me, I have had the same 
(no, worse!) reaction from every one of the commercial database companies in 
response to similar requests over the past 25 years.

We have our skin in the game, and as long as the community values the benchmark 
and wants to support us, we will figure out the details as we go forward.

Thanks,
Reza

 -Original Message-
 From: Greg Smith [mailto:g...@2ndquadrant.com]
 Sent: Thursday, July 05, 2012 6:42 PM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On 07/03/2012 07:13 PM, Reza Taheri wrote:
  Is the PGSQL community willing to invest in a feature that a) has been
  requested by many others already; and b) can make a huge difference in
  a benchmark that can lend substantial credibility to PGSQL performance?
 
 Larger PostgreSQL features usually get built because companies sponsor
 their development, they pass review as both useful  correct, and then get
 committed.  Asking the community to invest in a new feature isn't quite the
 right concept.  Yes, everyone would like one of the smaller index
 representations.  I'm sure we can find reviewers willing to look at such a
 feature and committers who would also be interested enough to commit it,
 on a volunteer basis.  But a feature this size isn't going to spring to life 
 based
 just on volunteer work.  The most useful questions would be who would
 be capable of writing that feature? and how can we get them sponsored
 to focus on it?  I can tell from your comments yet what role(s) in that
 process VMWare wants to take on internally, and which it's looking for help
 with.  The job of convincing people it's a useful feature isn't necessary--we
 know that's true.
 
 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Craig,
I used the tool at depesz.com extensively during our early prototyping. It 
helped uncover ~10 problems that we solved by fixing issues in the code, adding 
or changing indexes, etc. Right now, I believe all our query plans look like 
what I would expect.

Yes, you are right, I did miss the link to the index-only scans. From what I 
can tell, it will do exactly what we want, but only as long as the index has 
all the columns in the query. I don't know what percentage of our queries have 
this property. But it does help.

The two main kit developers are out this week. We'll put our heads together 
next week to see what version to use when I switch to a larger testbed I am 
preparing.

Thanks,
Reza

From: Craig Ringer [mailto:ring...@ringerc.id.au]
Sent: Thursday, July 05, 2012 5:46 PM
To: Reza Taheri
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out as a 
final release soon enough, and index only scans may make a big difference for 
the problem you're currently having.

Looking at your configuration I have a few comments, but it's worth noting that 
I don't work with hardware at that scale, and I'm more used to tuning I/O 
bottlenecked systems with onboard storage rather than CPU-bottlenecked ones on 
big SANs. Hopefully now that you've posted your configuration and setup there 
might be interest from others.

If you're able to post an EXPLAIN ANALYZE or two for a query you feel is slow 
that certainly won't hurt. Using http://explain.depesz.com/ saves you the 
hassle of dealing with word-wrapping when posting them, btw.

As for your config:

I notice that your autovacuum settings are at their defaults. With heavy UPDATE 
/ DELETE load this'll tend to lead to table and index bloat, so the DB has to 
scan more useless data to get what it needs. It also means table stats won't be 
maintained as well, potentially leading to poor planner decisions. The 
following fairly scary query can help identify bloat, as the database server 
doesn't currently have anything much built in to help you spot such issues:

   http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and effective_io_concurrency so 
Pg has more idea of the scale of your hardware. The defaults are very 
conservative - it's supposed to be easy for people to use for simple things 
without melting their systems, and it's expected that anyone doing bigger work 
will tune the database.

http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big 
installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I lack 
the big hardware experience to know if they're appropriate, but they're not the 
extremely conservative defaults, which is a start.

Your random_page_cost and seq_page_cost are probably dead wrong for a SAN with 
RAM and SSD cache in front of fast disks. Their defaults are for local uncached 
spinning HDD media where seeks are expensive. The typical advice on such 
hardware is to set them to something more like seq_page_cost = 0.1  
random_page_cost  = 0.15 - ie cheaper relative to the cpu cost, and with random 
I/O only a little more expensive than sequential I/O. What's right for your 
situation varies a bit based on DB size vs hardware size, etc; Greg discusses 
this more in his book.

What isolation level do your transactions use? This is significant because of 
the move to true serializable isolation with predicate locking in 9.0; it made 
serializable transactions a bit slower in some circumstances in exchange for 
much stronger correctness guarantees. The READ COMMITTED default was unchanged.



It also looks like you might not have seen the second part of my earlier reply:

while PostgreSQL doesn't support covering indexes or clustered indexes at this 
point, 9.2 has added support for index-only scans, which are a half-way point 
of sorts. See:

  http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
  http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
  
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this PostgreSQL 9.2 
enhancement. It should make a big difference, and if it doesn't it's important 
to know why.

(CC'd Robert Haas)



As for the invest side - that's really a matter for EnterpriseDB, Command 
Prompt, Red Hat, and the other backers who're employing people to work on the 
DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good 
explanation of the current state and progress toward clustered indexes.

Some links that may be useful to you are:

  http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. Surprisingly, 
covering/clustered 

Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-04 Thread Thomas Kellerer

Craig Ringer, 04.07.2012 07:43:


I'm not sure what the best option for getting a 9.2 beta build for Windows is.


Download the ZIP from here:

http://www.enterprisedb.com/products-services-training/pgbindownload

Unzip, initdb, pg_ctl start

Regards
Thomas




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-04 Thread Craig Ringer

On 07/04/2012 03:57 PM, Thomas Kellerer wrote:

Craig Ringer, 04.07.2012 07:43:

I'm not sure what the best option for getting a 9.2 beta build for 
Windows is.


Download the ZIP from here:

http://www.enterprisedb.com/products-services-training/pgbindownload


Gah, I'm blind. I looked at that page twice and failed to see the 
entries for the beta. Sorry.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-04 Thread Daniel Farina
On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/04/2012 07:13 AM, Reza Taheri wrote:

 Following the earlier email introducing the TPC-V benchmark, and that we are
 developing an industry standard benchmarking kit for TPC-V using PostgreSQL,
 here is a specific performance issue we have run into.


 Which version of PostgreSQL are you using?

 How has it been tuned beyond the defaults - autovacuum settings,
 shared_buffers, effective_cache_size, WAL settings, etc?

 How much RAM is on the blade? What OS and version are on the blade?


 Comparing the table sizes, we are close to 2X larger (more on this in a
 later note). But the index size is what stands out. Our overall index usage
 (again, after accounting for different numbers of rows) is 4.8X times
 larger. 35% of our I/Os are to the index space. I am guessing that the 4.8X
 ballooning has something to do with this, and that in itself explains a lot
 about our high I/O rate, as well as higher CPU/tran cycles compared to MS
 SQL (we are  2.5-3 times slower).

 This is making me wonder about bloat issues and whether proper vacuuming is
 being done. If the visibility map and free space map aren't maintained by
 proper vaccum operation everything gets messy, fast.

 Well, MS SQL used a “clustered index” for CT, i.e., the data is held in the
 leaf pages of the index B-Tree. The data and index are in one data
 structure. Once you lookup the index, you also have the data at zero
 additional cost.

 [snip]



 Is the PGSQL community willing to invest in a feature that a) has been
 requested by many others already; and b) can make a huge difference in a
 benchmark that can lend substantial credibility to PGSQL performance?


 while PostgreSQL doesn't support covering indexes or clustered indexes at
 this point, 9.2 has added support for index-only scans, which are a half-way
 point of sorts. See:

   http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
   http://rhaas.blogspot.com.au/2010/11/index-only-scans.html

 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

 If at all possible please see how your test is affected by this PostgreSQL
 9.2 enhancement. It should make a big difference, and if it doesn't it's
 important to know why.

 (CC'd Robert Haas)

 I'm not sure what the best option for getting a 9.2 beta build for Windows
 is.


 As for the invest side - that's really a matter for EnterpriseDB, Command
 Prompt, Red Hat, and the other backers who're employing people to work on
 the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a
 good explanation of the current state and progress toward clustered indexes.

 Some links that may be useful to you are:

   http://wiki.postgresql.org/wiki/Todo
   Things that it'd be good to support/implement at some point. Surprisingly,
 covering/clustered indexes aren't on there or at least aren't easily found.
 It's certainly a much-desired feature despite its apparent absence from the
 TODO.

I think there is, deservingly, a lot of hesitation to implement a
strictly ordered table construct.  A similar feature that didn't quite
get finished -- but maybe can be beaten into shape -- is the
grouped-index-tuple implementation:

http://community.enterprisedb.com/git/

It is mentioned on the TODO page.  It's under the category that is
perhaps poorly syntactically overloaded in the world cluster.

-- 
fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-03 Thread Craig Ringer

On 07/04/2012 07:13 AM, Reza Taheri wrote:


Following the earlier email introducing the TPC-V benchmark, and that 
we are developing an industry standard benchmarking kit for TPC-V 
using PostgreSQL, here is a specific performance issue we have run into.




Which version of PostgreSQL are you using?

How has it been tuned beyond the defaults - autovacuum settings, 
shared_buffers, effective_cache_size, WAL settings, etc?


How much RAM is on the blade? What OS and version are on the blade?

Comparing the table sizes, we are close to 2X larger (more on this in 
a later note). But the index size is what stands out. Our overall 
index usage (again, after accounting for different numbers of rows) is 
4.8X times larger. 35% of our I/Os are to the index space. I am 
guessing that the 4.8X ballooning has something to do with this, and 
that in itself explains a lot about our high I/O rate, as well as 
higher CPU/tran cycles compared to MS SQL (we are  2.5-3 times slower).


This is making me wonder about bloat issues and whether proper vacuuming 
is being done. If the visibility map and free space map aren't 
maintained by proper vaccum operation everything gets messy, fast.


Well, MS SQL used a clustered index for CT, i.e., the data is held 
in the leaf pages of the index B-Tree. The data and index are in one 
data structure. Once you lookup the index, you also have the data at 
zero additional cost.


[snip]

Is the PGSQL community willing to invest in a feature that a) has been 
requested by many others already; and b) can make a huge difference in 
a benchmark that can lend substantial credibility to PGSQL performance?




while PostgreSQL doesn't support covering indexes or clustered indexes 
at this point, 9.2 has added support for index-only scans, which are a 
half-way point of sorts. See:


http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this 
PostgreSQL 9.2 enhancement. It should make a big difference, and if it 
doesn't it's important to know why.


(CC'd Robert Haas)

I'm not sure what the best option for getting a 9.2 beta build for 
Windows is.



As for the invest side - that's really a matter for EnterpriseDB, 
Command Prompt, Red Hat, and the other backers who're employing people 
to work on the DB. Consider asking on pgsql-hackers, too; if nothing 
else you'll get a good explanation of the current state and progress 
toward clustered indexes.


Some links that may be useful to you are:

http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. 
Surprisingly, covering/clustered indexes aren't on there or at least 
aren't easily found. It's certainly a much-desired feature despite its 
apparent absence from the TODO.


http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer