Re: [PERFORM] Slow functional indexes?

2006-10-24 Thread Stuart Bishop
Tom Lane wrote:
 Stuart Bishop [EMAIL PROTECTED] writes:
 I would like to understand what causes some of my indexes to be slower to
 use than others with PostgreSQL 8.1.
 
 I was about to opine that it was all about different levels of
 correlation between the index order and physical table order ... but
 your experiments with freshly clustered indexes seem to cast doubt
 on that idea.  Are you sure your function is really immutable?  A buggy
 function could possibly lead to a clustered index not being in
 physical order.

Definitely immutable. Here is the function definition:


CREATE OR REPLACE FUNCTION person_sort_key(displayname text, name text)
RETURNS text
LANGUAGE plpythonu IMMUTABLE RETURNS NULL ON NULL INPUT AS
$$
# NB: If this implementation is changed, the person_sort_idx needs to be
# rebuilt along with any other indexes using it.
import re

try:
strip_re = SD[strip_re]
except KeyError:
strip_re = re.compile((?:[^\w\s]|[\d_]), re.U)
SD[strip_re] = strip_re

displayname, name = args

# Strip noise out of displayname. We do not have to bother with
# name, as we know it is just plain ascii.
displayname = strip_re.sub('', displayname.decode('UTF-8').lower())
return (%s, %s % (displayname.strip(), name)).encode('UTF-8')
$$;


-- 
Stuart Bishop [EMAIL PROTECTED]
http://www.stuartbishop.net/



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Index on two columns not used

2006-10-24 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

 You're wrong.  An UPDATE always writes a new version of the row (if it
 overwrote the row in-place, it wouldn't be rollback-able).  The new
 version has a different TID and therefore the index entry must change.
 To support MVCC, our approach is to always insert a new index entry
 pointing at the new TID --- the old one remains in place so that the old
 version can still be found by transactions that need it.

OK, good you corrected me.

I had the weird impression that both row versions have the same tuple ID
(as they are different versions of the same tuple), and so an index
change is not necessary when both versions fit on the same page.

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Copy database performance issue

2006-10-24 Thread Jignesh Shah

Steve,

Are you using the latest update release of Solaris 10 ?

When you are doing the copy, did you check with prstat -amL to see if it 
is saturating on any CPU?


If it is saturating on a CPU then atleast it will narrow down that you 
need to improve the CPU utilization of the copy process.


Brendan Greg's hotuser script which uses DTrace and Pearl post 
processing will help you to figure out which functions is causing the 
high CPU utilization and then maybe somebody from the PostgreSQL team 
can figure out what's happening that is causing the slow copy.


If none of the cores show up as near 100% then the next step is to 
figure out if any disk is 100% utilized via iostat -xczmP .


With this information it might help to figure out the next steps in your 
case.


Regards,
Jignesh


Steve wrote:

Hello there;

I've got an application that has to copy an existing database to a new 
database on the same machine.


I used to do this with a pg_dump command piped to psql to perform the 
copy; however the database is 18 gigs large on disk and this takes a 
LONG time to do.


So I read up, found some things in this list's archives, and learned 
that I can use createdb --template=old_database_name to do the copy in 
a much faster way since people are not accessing the database while 
this copy happens.



The problem is, it's still too slow.  My question is, is there any way 
I can use 'cp' or something similar to copy the data, and THEN after 
that's done modify the database system files/system tables to 
recognize the copied database?


For what it's worth, I've got fsync turned off, and I've read every 
tuning thing out there and my settings there are probably pretty 
good.  It's a Solaris 10 machine (V440, 2 processor, 4 Ultra320 
drives, 8 gig ram) and here's some stats:


shared_buffers = 30
work_mem = 102400
maintenance_work_mem = 1024000

bgwriter_lru_maxpages=0
bgwriter_lru_percent=0

fsync = off
wal_buffers = 128
checkpoint_segments = 64


Thank you!


Steve Conley

---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Optimizing disk throughput on quad Opteron

2006-10-24 Thread Bucky Jordan
 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of John Philips
 Sent: Monday, October 23, 2006 8:17 AM
 To: Ben Suffolk
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Optimizing disk throughput on quad Opteron
 
  The thing I would ask is would you not be better
  with SAS drives?
 
  Since the comments on Dell, and the highlighted
  issues I have been
  looking at HP and the the Smart Array P600
  controller with 512 BBWC.
  Although I am looking to stick with the 8 internal
  disks, rather than
  use external ones.
 
  The HP Smart Array 50 is the external array for SAS
  drives. Not
  really looked into it much though.
 
 Ben,
 
 The Smart Array 50 supports a maximum of 10 disks and
 has a single I/O module, while the Smart Array 30
 supports up to 14 disks and can be configured with a
 dual I/O module.
 
 I was under the assumption that SAS runs at the same
 speed as Ultra320, in which case the Smart Array 30 is
 a better bet...
 
 Thanks for your feedback.

The drives might be about the same speed, but SAS is a completely
different bus architecture from SCSI. U320 is a parallel interface
limited to 320 MB/s for the total bus (160 MB/s per channel, so be
careful here). SAS is a 3.0Gbps direct serial interface to the drive.
So, after 5-6 drives, SAS will definitely start to pay off. Take a look
at Dell's MD1000 external enclosure vs the previous version. The MD1000
offers much better performance (not saying to go with dell, just giving
an example of SCSI vs. SAS from a vendor I'm familiar with). Oh, and if
you're not completely against dell, you can daisy chain 3 of the MD1000
enclosures together off one of their new 6850 (Quad Woodcrest) or 6950
(Quad Operton). 

At the moment, the Woodcrests seem to be outperforming the Opteron in
server benchmarks, I have a quad core (dual cpu) 2950 I'd be happy to
run some pg_benches (or other preferred benchmark) if someone has a
similar opteron so we can get some relevant comparisons on the list.

Also, here's a link that was posted a while back on opteron vs.
woodcrest:
http://tweakers.net/reviews/646

HTH,

Bucky

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Best COPY Performance

2006-10-24 Thread Worky Workerson

Markus,


Could you COPY one of your tables out to disk via psql, and then COPY it
back into the database, to reproduce this measurement with your real data?


$ psql -c COPY my_table TO STDOUT  my_data
$ ls my_data
2018792 edgescape_pg_load
$ time cat my_data | psql -c COPY mytable FROM STDIN
real5m43.194s
user0m35.412s
sys 0m9.567s


Also, how much is the disk load, and CPU usage?


 When I am loading via the perl (which I've established is a
bottleneck), the one CPU core is at 99% for the perl and another is at
30% for a postmaster, vs about 90% for the postmaster when going
through psql.

The disk load is where I start to get a little fuzzy, as I haven't
played with iostat to figure what is normal.  The local drives
contain PG_DATA as well as all the log files, but there is a
tablespace on the FibreChannel SAN that contains the destination
table.  The disk usage pattern that I see is that there is a ton of
consistent activity on the local disk, with iostat reporting an
average of 30K Blk_wrtn/s, which I assume is the log files.  Every
several seconds there is a massive burst of activity on the FC
partition, to the tune of 250K Blk_wrtn/s.


On a table with no indices, triggers and contstraints, we managed to
COPY about 7-8 megabytes/second with psql over our 100 MBit network, so
here the network was the bottleneck.


hmm, this makes me think that either my PG config is really lacking,
or that the SAN is badly misconfigured, as I would expect it to
outperform a 100Mb network.  As it is, with a straight pipe to psql
COPY, I'm only working with a little over 5.5 MB/s.  Could this be due
to the primary key index updates?

Thanks!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Is ODBC that slow?

2006-10-24 Thread Carlo Stonebanks
 Try Command Prompt's ODBC driver.  Lately it has been measured to be
 consistently faster than psqlODBC.

 http://projects.commandprompt.com/public/odbcng

Thanks,

I tried this, but via Access it always reports a login (username/password) 
to db failure.  However, this a an Alpha - is there an official release I 
should be waiting for? It's not clear to me whether this is a commercial 
product or not.

Carlo 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best COPY Performance

2006-10-24 Thread Jim C. Nasby
On Mon, Oct 23, 2006 at 03:37:47PM -0700, Craig A. James wrote:
 Jim C. Nasby wrote:
 http://stats.distributed.net used to use a perl script to do some
 transformations before loading data into the database. IIRC, when we
 switched to using C we saw 100x improvement in speed, so I suspect that
 if you want performance perl isn't the way to go. I think you can
 compile perl into C, so maybe that would help some.
 
 I use Perl extensively, and have never seen a performance problem.  I 
 suspect the perl-to-C 100x improvement was due to some other factor, like 
 a slight change in the schema, indexes, or the fundamental way the client 
 (C vs Perl) handled the data during the transformation, or just plain bad 
 Perl code.
 
 Modern scripting languages like Perl and Python make programmers far, far 
 more productive than the bad old days of C/C++.  Don't shoot yourself in 
 the foot by reverting to low-level languages like C/C++ until you've 
 exhausted all other possibilities.  I only use C/C++ for intricate 
 scientific algorithms.
 
 In many cases, Perl is *faster* than C/C++ code that I write, because I 
 can't take the time (for example) to write the high-performance string 
 manipulation that have been fine-tuned and extensively optimized in Perl.

Well, the code is all at
http://cvs.distributed.net/viewcvs.cgi/stats-proc/hourly/ (see logmod
directory and logmod_*.pl). There have been changes made to the C code
since we changed over, but you can find the appropriate older versions
in there. IIRC, nothing in the database changed when we went from perl
to C (it's likely that was the *only* change that happened anywhere
around that time).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Best COPY Performance

2006-10-24 Thread Jim C. Nasby
On Tue, Oct 24, 2006 at 09:17:08AM -0400, Worky Workerson wrote:
 http://stats.distributed.net used to use a perl script to do some
 transformations before loading data into the database. IIRC, when we
 switched to using C we saw 100x improvement in speed, so I suspect that
 if you want performance perl isn't the way to go. I think you can
 compile perl into C, so maybe that would help some.
 
 Like Craig mentioned, I have never seen those sorts of improvements
 going from perl-C, and developer efficiency is primo for me.  I've
 profiled most of the stuff, and have used XS modules and Inline::C on
 the appropriate, often used functions, but I still think that it comes
 down to my using CSV and Text::CSV_XS.  Even though its XS, CSV is
 still a pain in the ass.
 
 Ultimately, you might be best of using triggers instead of rules for the
 partitioning since then you could use copy. Or go to raw insert commands
 that are wrapped in a transaction.
 
 Eh, I've put the partition loading logic in the loader, which seems to
 work out pretty well, especially since I keep things sorted and am the
 only one inserting into the DB and do so with bulk loads.  But I'll
 keep this in mind for later use.

Well, given that perl is using an entire CPU, it sounds like you should
start looking either at ways to remove some of the overhead from perl,
or to split that perl into multiple processes.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Problems using a function in a where clause

2006-10-24 Thread Jim C. Nasby
On Mon, Oct 23, 2006 at 04:54:00PM -0300, Mara Dalponte wrote:
 Hello,
 
 I have a query with several join operations and applying the same
 filter condition over each involved table. This condition is a complex
 predicate over an indexed  timestamp field, depending on some
 parameters.
 To factorize code,  I wrote the filter into a plpgsql function, but
 the resulting query is much more slower than the first one!

A view would probably be a better idea... or create some code that
generates the code for you.

 The explain command over the original query gives the following info
 for the WHERE clause that uses the filter:
 
 ...
 Index Cond: ((_timestamp = '2006-02-23 03:00:00'::timestamp without
 time zone) AND (_timestamp = '2006-02-27 20:00:00.98'::timestamp
 without time zone))
 ...
 
 The explain command for the WHERE clause using the filtering function is:
 
 ...
 Filter: include_time_date('2006-02-23'::date, '2006-02-27'::date,
 '03:00:00'::time without time zone, '20:00:00'::time without time
 zone, (_timestamp)::timestamp without time zone)
 ...
 
 It seems to not be using the index, and I think this is the reason of
 the performance gap between both solutions.
 
Well, it looks like include_time_date just returns a boolean, so how
could it use the index?

 How can I explicitly use this index? which type of functions shall I
 use (VOLATILE | INMUTABLE | STABLE)?

That depends on what exactly the function does. There's a pretty good
description in the CREATE FUNCTION docs.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Copy database performance issue

2006-10-24 Thread Jim C. Nasby
On Mon, Oct 23, 2006 at 05:51:40PM -0400, Steve wrote:
 Hello there;
 
 I've got an application that has to copy an existing database to a new 
 database on the same machine.
 
 I used to do this with a pg_dump command piped to psql to perform the 
 copy; however the database is 18 gigs large on disk and this takes a LONG 
 time to do.
 
 So I read up, found some things in this list's archives, and learned that 
 I can use createdb --template=old_database_name to do the copy in a much 
 faster way since people are not accessing the database while this copy 
 happens.
 
 
 The problem is, it's still too slow.  My question is, is there any way I 
 can use 'cp' or something similar to copy the data, and THEN after that's 
 done modify the database system files/system tables to recognize the 
 copied database?
 
AFAIK, that's what initdb already does... it copies the database,
essentially doing what cp does.

 For what it's worth, I've got fsync turned off, and I've read every tuning 
 thing out there and my settings there are probably pretty good.  It's a 
 Solaris 10 machine (V440, 2 processor, 4 Ultra320 drives, 8 gig ram) and 
 here's some stats:

I don't think any of the postgresql.conf settings will really come into
play when you're doing this...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best COPY Performance

2006-10-24 Thread Worky Workerson

http://stats.distributed.net used to use a perl script to do some
transformations before loading data into the database. IIRC, when we
switched to using C we saw 100x improvement in speed, so I suspect that
if you want performance perl isn't the way to go. I think you can
compile perl into C, so maybe that would help some.


Like Craig mentioned, I have never seen those sorts of improvements
going from perl-C, and developer efficiency is primo for me.  I've
profiled most of the stuff, and have used XS modules and Inline::C on
the appropriate, often used functions, but I still think that it comes
down to my using CSV and Text::CSV_XS.  Even though its XS, CSV is
still a pain in the ass.


Ultimately, you might be best of using triggers instead of rules for the
partitioning since then you could use copy. Or go to raw insert commands
that are wrapped in a transaction.


Eh, I've put the partition loading logic in the loader, which seems to
work out pretty well, especially since I keep things sorted and am the
only one inserting into the DB and do so with bulk loads.  But I'll
keep this in mind for later use.

Thanks!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Best COPY Performance

2006-10-24 Thread Craig A. James

Jim C. Nasby wrote:

Well, given that perl is using an entire CPU, it sounds like you should
start looking either at ways to remove some of the overhead from perl,
or to split that perl into multiple processes.


I use Perl for big database copies (usually with some processing/transformation 
along the way) and I've never seen 100% CPU usage except for brief periods, 
even when copying BLOBS and such.  My typical copy divides operations into 
blocks, for example doing

 N = 0
 while (more rows to go) {
begin transaction
select ... where primary_key  N order by primary_key limit 1000
while (fetch a row)
   insert into ...
N = (highest value found in last block)
commit
  }

Doing it like this in Perl should keep Postgres busy, with Perl using only 
moderate resources.  If you're seeing high Perl CPU usage, I'd look first at 
the Perl code.

Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly