Re: [GENERAL] PostgreSQL vs. InnoDB performance
Scott Marlowe <[EMAIL PROTECTED]> writes: > Also, I wonder how well both databases will survive having power removed > while under heavy load... It depends more on the underlying hardware setup (disk/raid array) than on the any other aspect (like OS). Assuming you have fsync enabled, of course. There is a very interesting test that you can do (if you have two machines) to see what happens if one of your machines suddenly loses power. You can read about that here: http://www.livejournal.com/users/brad/2116715.html Most of todays IDE disks comes with write caching turned on by default, and if you lose power, you'll lose some of unwriten data for sure. Turn it off, or if you're using ext3 on Linux 2.6, you can mount your partitions with barrier=1 option which will make your fsyncs safe and still let you get some benefits from write caching. Of course, your production quality database would be on the powerful SCSI disk array behind a good RAID controller. Question there is have you bought the (often) additional battery backup for your RAID card? If notm turn the write caching off, once again, or you WILL lose your data and corrupt your database if you suddenly lose power. -- Zlatko ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] PostgreSQL vs. InnoDB performance
Christopher Browne <[EMAIL PROTECTED]> writes: > After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Marco Colombo) > belched out: >> On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote: >>> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut: >>> > On a particular system, loading 1 million rows (100 bytes, nothing >>> > fancy) into PostgreSQL one transaction at a time takes about 90 >>> > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB >>> > is supposed to have a similar level of functionality as far as the >>> > storage manager is concerned, so I'm puzzled about how this can be. >>> > Does anyone know whether InnoDB is taking some kind of questionable >>> > shortcuts it doesn't tell me about? >>> >>> So here's another little gem about our friends from Uppsala: If you create >>> a >>> table with InnoDB storage and your server does not have InnoDB configured, >>> it >>> falls back to MyISAM without telling you. >> >> Silently falling back to something unexpected seems to be quite common >> there. For sure it's not the only case. :-| >> >>> As it turns out, the test done with PostgreSQL vs. real InnoDB results in >>> just >>> about identical timings (90 min). The test done using PostgreSQL with >>> fsync >>> off vs. MyISAM also results in about identical timings (3 min). >> >> The hardware seems to be the bottleneck. Try improving the performance >> of your disk systems. It's very unlikely to get _exactly_ the same >> figures from such two different RDBMS. You expect them to be close, but >> not identical. > > If the bottleneck is in the identical place, and they are otherwise > well-tuned, it is actually *not* that surprising that the timings for > "PostgreSQL vs real InnoDB" would be pretty close. > > If both are being bottlenecked by the same notion of "how fast does > the disk spin," then the differences in performance won't be dramatic. Yes, I also think so. One transaction is one transaction, so if neither database is lying, they really should come out with similar results. Having said that, I'm getting much better speed doing very simple transactions, and that is on the low end hardware (Dual PIII 1GHz, IDE disk 7200rpm, Linux 2.6, ext3fs with barrier=1 mount option - so the disk cache can safely be left turned on). I'm getting around 950 transactions with the attached app. Also, observing the output of the iostat utility, it can be seen that disk is quite busy and that it is running with the number of writes comparable to the number of transactions (and the average size of one write operation is near 8KB, which is the default PostgreSQL's block size). extended device statistics device mgr/s mgw/sr/sw/skr/skw/s size queue wait svc_t %b hda0 9950.4 951.3 1.7 7785.38.2 4.24.4 0.6 59 zcalusic=# \d words Table "public.words" Column | Type | Modifiers ++--- word | character varying(256) | #! /usr/bin/perl use DBI; use strict; use warnings; $| = 1; my $dbh = DBI->connect('dbi:Pg:dbname=zcalusic', 'zcalusic', 'useyours', {PrintError => 1, RaiseError => 1, AutoCommit => 1}); my $sth = $dbh->prepare("INSERT INTO words VALUES (?)"); $dbh->do("TRUNCATE TABLE words"); open(WORDS, ") { chomp $word; $sth->execute($word); $sofar++; if (($time = time()) > $oldtime) { print int($sofar / ($time - $start)), " inserts/second \r"; $oldtime = $time; } } print int($sofar / ($time - $start)), " inserts/second\n"; close(WORDS); exit 0; -- Zlatko ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Large selects handled inefficiently?
"Andrew Snow" <[EMAIL PROTECTED]> writes: > > I believe I can work around this problem using cursors (although I > > don't know how well DBD::Pg copes with cursors). However, that > > doesn't seem right -- cursors should be needed to fetch a large query > > without having it all in memory at once... > Yes, I have noticed that particular bad behaviour, too. With DBD::Pg and DBD::mysql. At the same time, DBD::Oracle, DBD::InterBase and DBD::Sybase work as expected. Rows are fetched with fetchrow...() functions instead of all being sucked up into memory at the time execute() is called. Anybody know why is that happening? > Actually, I think thats why cursors were invented in the first place ;-) A > cursor is what you are using if you're not fetching all the results of a > query. > What bothers me is different behaviour of different DBD drivers. But, yes, I have just subscribed to dbi-users list which is the right place to ask that question. -- Zlatko
Re: [GENERAL] Indexes not working (bug in 7.0.2?)
Tom Lane <[EMAIL PROTECTED]> writes: > Zlatko Calusic <[EMAIL PROTECTED]> writes: > > It is now NOT using the index, and I don't understand why? Queries are > > practically the same, tables are practically the same, why is postgres > > using indexes in the first case and not in the second? > > Because it has substantially different ideas about the sizes of the > two tables --- notice the different estimated row counts. If you > haven't "vacuum analyzed" these tables recently, do so to bring the > planner's statistics up-to-date, and then see what you get. You may > also care to read the user's manual chapter about EXPLAIN, > http://www.postgresql.org/users-lounge/docs/7.0/postgres/c4888.htm > Yes, thanks to all who helped. 'vacuum analyze' was the magical incantation that helped. I still have one uncertainty. Is it possible that after some time postgres once again decides not to use indices (assuming I haven't run 'vacuum analyze' again)? -- Zlatko
[GENERAL] Indexes not working (bug in 7.0.2?)
Hi! This is my first post (not counting those failed because I posted them from the other email address) so please be gentle. I have recently started playing with PostgreSQL and found what I think is a bug in postgres. I'm using 7.0.2 from Debian Linux package (woody) if its of any difference. I have a table like this: imenik=# \d subscriber Table "subscriber" Attribute | Type | Modifier ---+--+-- id| integer | not null prefix| char(2) | not null name | varchar(128) | not null number| varchar(8) | not null title | integer | street| integer | not null num | varchar(8) | not null city | integer | not null Index: idx_number First I populated the table, then I created index on the number field with "CREATE INDEX idx_number on subscriber(number);" The trouble is when I do SELECT specifying a number, postgres reads a whole table ie. it is not using the index. EXPLAIN reveals it is indeed doing a sequential scan. test=# EXPLAIN SELECT * from subscriber where number = '123456'; NOTICE: QUERY PLAN: Seq Scan on subscriber (cost=0.00..38677.28 rows=15564 width=64) EXPLAIN What am I doing wrong??? Second example: Today I stumbled upon a similar problem with completely different set of data, but even more confusing. Consider two tables 'filenew' and 'fileold' that have same fields and indices: filedb=# \d fileold Table "fileold" Attribute | Type | Modifier ---+---+-- file | varchar(1024) | not null mode | integer | not null uid | integer | not null gid | integer | not null size | bigint| not null mtime | integer | not null ctime | integer | not null Index: fileold_file_key s/fileold/filenew and you know the schema for filenew. Idea is to populate fileold once with filesystem information and then later, when things on FS change, populate filenew with a new data and search for differences. As you see, tables are almost the same, but... *** Looking for new files: filedb=# explain select file from filenew where not exists (select 1 from fileold where filenew.file = fileold.file); NOTICE: QUERY PLAN: Seq Scan on filenew (cost=0.00..0.00 rows=1 width=12) SubPlan -> Index Scan using fileold_file_key on fileold (cost=0.00..2935.96 rows=1329 width=4) EXPLAIN *** Looking for deleted files: filedb=# explain select file from fileold where not exists (select 1 from filenew where fileold.file = filenew.file); NOTICE: QUERY PLAN: Seq Scan on fileold (cost=0.00..3155.26 rows=1 width=12) SubPlan -> Seq Scan on filenew (cost=0.00..0.00 rows=1 width=4) EXPLAIN It is now NOT using the index, and I don't understand why? Queries are practically the same, tables are practically the same, why is postgres using indexes in the first case and not in the second? TIA, -- Zlatko