Re: [GENERAL] PostgreSQL vs. InnoDB performance

2005-06-04 Thread Zlatko Calusic
Christopher Browne [EMAIL PROTECTED] writes:

 After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Marco Colombo) 
 belched out:
 On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote:
 Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut:
  On a particular system, loading 1 million rows (100 bytes, nothing
  fancy) into PostgreSQL one transaction at a time takes about 90
  minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
  is supposed to have a similar level of functionality as far as the
  storage manager is concerned, so I'm puzzled about how this can be.
  Does anyone know whether InnoDB is taking some kind of questionable
  shortcuts it doesn't tell me about?
 
 So here's another little gem about our friends from Uppsala: If you create 
 a 
 table with InnoDB storage and your server does not have InnoDB configured, 
 it 
 falls back to MyISAM without telling you.

 Silently falling back to something unexpected seems to be quite common
 there. For sure it's not the only case.  :-|

 As it turns out, the test done with PostgreSQL vs. real InnoDB results in 
 just 
 about identical timings (90 min).  The test done using PostgreSQL with 
 fsync 
 off vs. MyISAM also results in about identical timings (3 min).

 The hardware seems to be the bottleneck. Try improving the performance
 of your disk systems. It's very unlikely to get _exactly_ the same
 figures from such two different RDBMS. You expect them to be close, but
 not identical.

 If the bottleneck is in the identical place, and they are otherwise
 well-tuned, it is actually *not* that surprising that the timings for
 PostgreSQL vs real InnoDB would be pretty close.

 If both are being bottlenecked by the same notion of how fast does
 the disk spin, then the differences in performance won't be dramatic.

Yes, I also think so. One transaction is one transaction, so if
neither database is lying, they really should come out with similar
results.

Having said that, I'm getting much better speed doing very simple
transactions, and that is on the low end hardware (Dual PIII 1GHz, IDE
disk 7200rpm, Linux 2.6, ext3fs with barrier=1 mount option - so the
disk cache can safely be left turned on). I'm getting around 950
transactions with the attached app.

Also, observing the output of the iostat utility, it can be seen that
disk is quite busy and that it is running with the number of writes
comparable to the number of transactions (and the average size of one
write operation is near 8KB, which is the default PostgreSQL's block
size).

 extended device statistics   
device mgr/s mgw/sr/sw/skr/skw/s   size queue   wait svc_t  %b 
hda0   9950.4  951.3 1.7  7785.38.2   4.24.4   0.6  59 

zcalusic=# \d words
Table public.words
 Column |  Type  | Modifiers 
++---
 word   | character varying(256) | 

#! /usr/bin/perl

use DBI;
use strict;
use warnings;

$| = 1;

my $dbh = DBI-connect('dbi:Pg:dbname=zcalusic',
   'zcalusic', 'useyours',
   {PrintError = 1, RaiseError = 1, AutoCommit = 1});

my $sth = $dbh-prepare(INSERT INTO words VALUES (?));

$dbh-do(TRUNCATE TABLE words);

open(WORDS, /usr/share/dict/words)
  or die can't opet words file for reading: $!\n;

my $sofar;
my $start = time();
my $time = $start;
my $oldtime = $start;

while (my $word = WORDS) {
  chomp $word;
  $sth-execute($word);
  $sofar++;
  if (($time = time())  $oldtime) {
print int($sofar / ($time - $start)),  inserts/second \r;
$oldtime = $time;
  }
}
print int($sofar / ($time - $start)),  inserts/second\n;

close(WORDS);

exit 0;

-- 
Zlatko

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] PostgreSQL vs. InnoDB performance

2005-06-04 Thread Zlatko Calusic
Scott Marlowe [EMAIL PROTECTED] writes:

 Also, I wonder how well both databases will survive having power removed
 while under heavy load...

It depends more on the underlying hardware setup (disk/raid array)
than on the any other aspect (like OS). Assuming you have fsync
enabled, of course.

There is a very interesting test that you can do (if you have two
machines) to see what happens if one of your machines suddenly loses
power. You can read about that here:
   http://www.livejournal.com/users/brad/2116715.html

Most of todays IDE disks comes with write caching turned on by
default, and if you lose power, you'll lose some of unwriten data for
sure. Turn it off, or if you're using ext3 on Linux 2.6, you can mount
your partitions with barrier=1 option which will make your fsyncs safe
and still let you get some benefits from write caching.

Of course, your production quality database would be on the powerful
SCSI disk array behind a good RAID controller. Question there is have
you bought the (often) additional battery backup for your RAID card?
If notm turn the write caching off, once again, or you WILL lose your
data and corrupt your database if you suddenly lose power.
-- 
Zlatko

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Large selects handled inefficiently?

2000-09-06 Thread Zlatko Calusic

"Andrew Snow" [EMAIL PROTECTED] writes:

  I believe I can work around this problem using cursors (although I
  don't know how well DBD::Pg copes with cursors).  However, that
  doesn't seem right -- cursors should be needed to fetch a large query
  without having it all in memory at once...


Yes, I have noticed that particular bad behaviour, too.
With DBD::Pg and DBD::mysql.

At the same time, DBD::Oracle, DBD::InterBase and DBD::Sybase work as
expected. Rows are fetched with fetchrow...() functions instead of all
being sucked up into memory at the time execute() is called.

Anybody know why is that happening?

 Actually, I think thats why cursors were invented in the first place ;-)  A
 cursor is what you are using if you're not fetching all the results of a
 query.
 

What bothers me is different behaviour of different DBD drivers. But,
yes, I have just subscribed to dbi-users list which is the right place
to ask that question.
-- 
Zlatko



Re: [GENERAL] Indexes not working (bug in 7.0.2?)

2000-09-04 Thread Zlatko Calusic

Tom Lane [EMAIL PROTECTED] writes:

 Zlatko Calusic [EMAIL PROTECTED] writes:
  It is now NOT using the index, and I don't understand why? Queries are
  practically the same, tables are practically the same, why is postgres
  using indexes in the first case and not in the second?
 
 Because it has substantially different ideas about the sizes of the
 two tables --- notice the different estimated row counts.  If you
 haven't "vacuum analyzed" these tables recently, do so to bring the
 planner's statistics up-to-date, and then see what you get.  You may
 also care to read the user's manual chapter about EXPLAIN,
 http://www.postgresql.org/users-lounge/docs/7.0/postgres/c4888.htm
 

Yes, thanks to all who helped. 'vacuum analyze' was the magical
incantation that helped. 

I still have one uncertainty. Is it possible that after some time
postgres once again decides not to use indices (assuming I haven't run
'vacuum analyze' again)?

-- 
Zlatko



[GENERAL] Indexes not working (bug in 7.0.2?)

2000-09-03 Thread Zlatko Calusic

Hi!

This is my first post (not counting those failed because I posted them
from the other email address) so please be gentle.

I have recently started playing with PostgreSQL and found what I think
is a bug in postgres. I'm using 7.0.2 from Debian Linux package
(woody) if its of any difference.

I have a table like this:

imenik=# \d subscriber
 Table "subscriber"
 Attribute | Type | Modifier 
---+--+--
 id| integer  | not null
 prefix| char(2)  | not null
 name  | varchar(128) | not null
 number| varchar(8)   | not null
 title | integer  | 
 street| integer  | not null
 num   | varchar(8)   | not null
 city  | integer  | not null
Index: idx_number

First I populated the table, then I created index on the number field
with "CREATE INDEX idx_number on subscriber(number);"

The trouble is when I do SELECT specifying a number, postgres reads a
whole table ie. it is not using the index. EXPLAIN reveals it is
indeed doing a sequential scan.

test=# EXPLAIN SELECT * from subscriber where number = '123456';
NOTICE:  QUERY PLAN:

Seq Scan on subscriber  (cost=0.00..38677.28 rows=15564 width=64)

EXPLAIN

What am I doing wrong???

Second example:

Today I stumbled upon a similar problem with completely different set
of data, but even more confusing.

Consider two tables 'filenew' and 'fileold' that have same fields and
indices:

filedb=# \d fileold
   Table "fileold"
 Attribute | Type  | Modifier 
---+---+--
 file  | varchar(1024) | not null
 mode  | integer   | not null
 uid   | integer   | not null
 gid   | integer   | not null
 size  | bigint| not null
 mtime | integer   | not null
 ctime | integer   | not null
Index: fileold_file_key

s/fileold/filenew and you know the schema for filenew. Idea is to
populate fileold once with filesystem information and then later, when
things on FS change, populate filenew with a new data and search for
differences.

As you see, tables are almost the same, but...

*** Looking for new files:

filedb=# explain select file from filenew where not exists (select 1 from fileold 
where filenew.file = fileold.file);
NOTICE:  QUERY PLAN:

Seq Scan on filenew  (cost=0.00..0.00 rows=1 width=12)
  SubPlan
-  Index Scan using fileold_file_key on fileold  (cost=0.00..2935.96 rows=1329 
width=4)

EXPLAIN

*** Looking for deleted files:

filedb=# explain select file from fileold where not exists (select 1 from filenew 
where fileold.file = filenew.file);
NOTICE:  QUERY PLAN:

Seq Scan on fileold  (cost=0.00..3155.26 rows=1 width=12)
  SubPlan
-  Seq Scan on filenew  (cost=0.00..0.00 rows=1 width=4)

EXPLAIN

It is now NOT using the index, and I don't understand why? Queries are
practically the same, tables are practically the same, why is postgres
using indexes in the first case and not in the second?

TIA,
-- 
Zlatko