[PERFORM] Insert performance

2003-08-16 Thread Shridhar Daithankar
Hi all,

Couple of days ago, one of my colleague, Rahul Iyer posted a query regarding 
insert performance of 5M rows. A common suggestion was to use copy.

Unfortunately he can not use copy due to some constraints. I was helping him to 
get maximum out of it. We were playing with a data set of 500K rows on SunOS5.6 
and postgresql 7.3.3

Best we could get was 500K records in 355 sec. That's roughly 1400 inserts per 
sec. This was with default settings and around 10K inserts per transaction. 
Postgresql was hogging all the CPU during load.

Troubled by this, I set up a similar database at home. That is a Athlon 
XP2000+/512MB machine with a 40GB seagate disk. It is running slackware 9.0 
with 2.4.20(IIRC).

I have attached the results of my experiements and the code I used to 
benchmark. It was a simple table with an integer and a varchar(30) field.

I was really amazed to see the numbers. First of all, it beat the sunOS machine 
 left and right. Bruce posted some numbers of 9K inserts/sec. Here we see the 
same.

Secondly I also played with filesystems. Ext3 does not seem to be performing as 
good. Reiser and ext2 did fine. Unfortunately the kernel didn't support XFS/JFS 
so could not test them.

I have also attached the raw benchmark data in kspread format, for the curious. 
Didn't exported to anything else because kspread had troubles with exporting 
formula values.

I also noticed that reiser takes hell lot more CPU than ext2 and ext3. It 
nearly peaks out all CPU capacity. Not so with ext2.

Comments? One thing I can't help to notice is sunOs is not on same scale. The 
sunOS machine is a 1GB RAM machine. It has oracle and mysql running on it and 
have 300MB swap in use but I am sure it has SCSI disk and in all respect I 
would rather expect a RISC machine to perform better than an athlon XP machine, 
at least for an IO.

If you want me to post details of sparc machine, please let me know how do I 
find it. I have never worked with sparcs earlier and have no intention of doing 
this again..:-)

Bye
 Shridhar

--
Fourth Law of Applied Terror:   The night before the English History mid-term, 
your Biologyinstructor will assign 200 pages on planaria.Corollary: Every 
instructor assumes that you have nothing else to do except  study for that 
instructor's course.



insert.c
Description: Binary data


Results.zip
Description: Zip archive


pgbenchmark.zip
Description: Zip archive

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Insert performance

2003-08-16 Thread Josh Berkus
Shridhar,

> Unfortunately he can not use copy due to some constraints. 

Why not use COPY to load the table, and then apply the constraints by query 
afterwords?  It might not be faster, but then again it might.

> I was really amazed to see the numbers. First of all, it beat the sunOS
> machine left and right. Bruce posted some numbers of 9K inserts/sec. Here
> we see the same.

> Comments? One thing I can't help to notice is sunOs is not on same scale.
> The sunOS machine is a 1GB RAM machine. It has oracle and mysql running on
> it and have 300MB swap in use but I am sure it has SCSI disk and in all
> respect I would rather expect a RISC machine to perform better than an
> athlon XP machine, at least for an IO.

It's been reported on this list several times that Solaris is the *worst* of 
the *nixes for PostgreSQL performance.   No analysis has been posted as to 
why; my own thoughts are:
- Solaris' multi-threaded architecture which imposes a hefty per-process 
overhead, about triple that of Linux, slowing new connections and large 
multi-user activity;
- Poor filesystem management; Sun simply hasn't kept up with IBM, Reiser, Red 
Hat and BSD in developing filesystems.
... but that's based on inadequate experimentation, just a few tests on 
Bonnie++ on a Netra running Solaris 8.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Insert performance

2003-08-16 Thread Christopher Browne
Martha Stewart called it a Good Thing [EMAIL PROTECTED] (Josh Berkus)wrote:
> Shridhar,
>> Unfortunately he can not use copy due to some constraints. 

> Why not use COPY to load the table, and then apply the constraints
> by query afterwords?  It might not be faster, but then again it
> might.

If you can transform the information into COPYable form, that's
certainly a good thing.  Whether it will be workable or not is another
question.

> ... but that's based on inadequate experimentation, just a few tests
> on Bonnie++ on a Netra running Solaris 8.

As far as the filesystem issues are concerned, you're probably using
an "old, standard" version of UFS.  The "high performance" option on
Solaris involves using third-party Veritas software.

The persistence of this is somewhat surprising; I'd somewhat have
expected Sun to have bought out Veritas or some such thing, as it's a
pretty vital technology that _isn't_ totally under their control.
Actually, an entertaining option would be for them to buy out SGI, as
that would get them control of XFS and a number of other interesting
technologies.

My expectations of a Netra wouldn't be terribly high, either; they
seem to exist as a product so that people that need a cheap Sun box
have an option.  They are mostly running IDE disk, and the latest
IA-32 hardware is likely to have newer faster interface options.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "ntlug.org")
http://cbbrowne.com/info/advocacy.html
"Windows NT was designed to be administered by an idiot and usually
is..."  -- Chris Adams" <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Insert performance

2003-08-17 Thread Shridhar Daithankar
On 16 Aug 2003 at 11:40, Josh Berkus wrote:

> Shridhar,
> 
> > Unfortunately he can not use copy due to some constraints. 
> 
> Why not use COPY to load the table, and then apply the constraints by query 
> afterwords?  It might not be faster, but then again it might.

Lol.. The constraints I mentioned weren't database constraints.. It need to 
interface with another database module.. The data import can not be text.. 
that's a project requirement..

I should have been clearer in first place..

Bye
 Shridhar

--
O'Toole's commentary on Murphy's Law:   Murphy was an optimist.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Insert performance

2003-08-18 Thread Shridhar Daithankar
On 18 Aug 2003 at 18:52, Hannu Krosing wrote:
> My own experimentation also got numbers in 9k/sec range (on a quad
> 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
> ~1000 inserts/transaction.
> 
> Performance dropped to ~300/sec (at about 60M rows) when I added an
> index (primary key) - as I did random inserts, the hit rates for index
> pages were probably low.

I was loading a geographic data couple of months back.. It was 3GB data when 
loaded in postgresql.

I tried loading data first and creating index later. It ran out of available 
9GB space. So I created index on an empty table and started loading it. It was 
slow but at least finished after 3 hours... Co-incidentally oracle had same 
problems as well. So creating index beforehand remains only option at times, it 
seems. Tom remarked that it shouldn't have made difference but apparently it 
does..

You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or 
per connection? If it is former, probably WAL is hit too hard. You could do 
some additional testing by having WALit's own disk.

I was plannning to do the multiple writers test. But since objective was to 
find out why postgresql was so slow and it tunred out to be slowaris in first 
place, didn't have any enthu. left.

I recommended my colleague to move to linux. But apparently this product is a 
part of suit which runs on some HUGE solaris machines. So if it has to run 
postgresql, it has to run sunos. I hope they are faster with SCSI..

Bye
 Shridhar

--
On my planet, to rest is to rest -- to cease using energy.  To me, itis quite 
illogical to run up and down on green grass, using energy,instead of saving it. 
-- Spock, "Shore Leave", stardate 3025.2


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Insert performance

2003-08-18 Thread Josh Berkus
Chris,

> My expectations of a Netra wouldn't be terribly high, either; they
> seem to exist as a product so that people that need a cheap Sun box
> have an option.  They are mostly running IDE disk, and the latest
> IA-32 hardware is likely to have newer faster interface options.

Neither are ours ... we bought an array of 1U Netras as a dot-com closeout.  
At $330 per machine including OS, they were a darned good deal, and with 4 of 
them the redundancy makes up for the lack of individual server performance.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Insert performance

2003-08-18 Thread Hannu Krosing
Shridhar Daithankar kirjutas E, 18.08.2003 kell 09:21:
> On 16 Aug 2003 at 11:40, Josh Berkus wrote:
> 
> > Shridhar,
> > 
> > > Unfortunately he can not use copy due to some constraints. 
> > 
> > Why not use COPY to load the table, and then apply the constraints by query 
> > afterwords?  It might not be faster, but then again it might.
> 
> Lol.. The constraints I mentioned weren't database constraints.. 

Can't you still apply them later ;)

My own experimentation also got numbers in 9k/sec range (on a quad
1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
~1000 inserts/transaction.

Performance dropped to ~300/sec (at about 60M rows) when I added an
index (primary key) - as I did random inserts, the hit rates for index
pages were probably low.

--
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Insert performance

2003-08-18 Thread Hannu Krosing
Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02:
> On 18 Aug 2003 at 18:52, Hannu Krosing wrote:
> > My own experimentation also got numbers in 9k/sec range (on a quad
> > 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
> > ~1000 inserts/transaction.
> > 
> > Performance dropped to ~300/sec (at about 60M rows) when I added an
> > index (primary key) - as I did random inserts, the hit rates for index
> > pages were probably low.
> 
> I was loading a geographic data couple of months back.. It was 3GB data when 
> loaded in postgresql.

With or without indexes ?

> I tried loading data first and creating index later. It ran out of available 
> 9GB space. So I created index on an empty table and started loading it. It was 
> slow but at least finished after 3 hours... Co-incidentally oracle had same 
> problems as well. So creating index beforehand remains only option at times, it 
> seems. Tom remarked that it shouldn't have made difference but apparently it 
> does..

Tom just fixed some memory leaks on array indexing the other day. Could
there be something like that on geographic types ?

> You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or 
> per connection?

Overall. But notice that my setup was (a little) slower per processor.

>  If it is former, probably WAL is hit too hard. You could do 
> some additional testing by having WALit's own disk.

I guess that todays IDE disks are about the same speed (~50MB/sec) as my
test RAID was.

I run multiple parallel runs to have a chance to use all 4 processors
(but IIRC it was heavyly IO-bound) as well as to better use writing time
on WAL platters (not to wait for full rotation on each platter)

--
Hannu





---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Insert performance

2003-08-18 Thread Shridhar Daithankar
On 18 Aug 2003 at 9:57, Josh Berkus wrote:

> Chris,
> 
> > My expectations of a Netra wouldn't be terribly high, either; they
> > seem to exist as a product so that people that need a cheap Sun box
> > have an option.  They are mostly running IDE disk, and the latest
> > IA-32 hardware is likely to have newer faster interface options.
> 
> Neither are ours ... we bought an array of 1U Netras as a dot-com closeout.  
> At $330 per machine including OS, they were a darned good deal, and with 4 of 
> them the redundancy makes up for the lack of individual server performance.

I am sure they would run much better with linux on them rather than solaris..

Bye
 Shridhar

--
Shannon's Observation:  Nothing is so frustrating as a bad situation that is 
beginning toimprove.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Insert performance

2003-08-18 Thread Shridhar Daithankar
On 19 Aug 2003 at 1:16, Hannu Krosing wrote:

> Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02:
> > I was loading a geographic data couple of months back.. It was 3GB data when 
> > loaded in postgresql.
> 
> With or without indexes ?

Without index. Index was another 3 GB with 50% utilisation. It was 81M rows 
with 3 floats each..

> 
> > I tried loading data first and creating index later. It ran out of available 
> > 9GB space. So I created index on an empty table and started loading it. It was 
> > slow but at least finished after 3 hours... Co-incidentally oracle had same 
> > problems as well. So creating index beforehand remains only option at times, it 
> > seems. Tom remarked that it shouldn't have made difference but apparently it 
> > does..
> 
> Tom just fixed some memory leaks on array indexing the other day. Could
> there be something like that on geographic types ?

Dunno.. This was 7.3.2 or earlier.. Later the project abandoned all types of 
databases and went to in memory structures since flat data was of the order of 
200MB. Now they aer returning to databases as flat data is approaching excess 
of 3 GB..

God knows what will they do next. It's an ideal example what schedule pressure 
can do to architecture design of a software..


Bye
 Shridhar

--
Air Force Inertia Axiom:Consistency is always easier to defend than 
correctness.


---(end of broadcast)---
TIP 8: explain analyze is your friend