Re: [HACKERS] Improving speed of copy
On Fri, 20 Sep 2002, Shridhar Daithankar wrote: On 20 Sep 2002 at 21:22, Shridhar Daithankar wrote: Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around 11.5K rows per second. Each tuple has 23 fields with fixed length of around 100 bytes Yes, postgres is much slower than MySQL for doing bulk loading of data. There's not much, short of hacking on the code, that can be done about this. One more issue is time taken for composite index creation. It's 4341 sec as opposed to 436 sec for mysql. These are three non-unique character fields where the combination itself is not unique as well. Setting sort_mem appropriately makes a big difference here. I generally bump it up to 2-8 MB for everyone, and when I'm building a big index, I set it to 32 MB or so just for that session. But make sure you don't set it so high you drive your system into swapping, or it will kill your performance. Remember also, that in 7.2.x, postgres will actually use almost three times the value you give sort_mem (i.e., sort_mem of 32 MB will actually allocate close to 96 MB of memory for the sort). cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Improving speed of copy
Have you tried this with Oracle or similar commercial database? I have timed COPY/LOAD times for Postgresql/Mysql/Oracle/Db2 - the rough comparison is : Db2 and Mysql fastest (Db2 slightly faster) Oracle approx twice as slow as Db2 Postgresql about 3.5-4 times slower than Db2 However Postgresql can sometimes create indexes faster than Mysql so that the total time of COPY + CREATE INDEX can be smaller for Postgresql than Mysql. Oracle an Db2 seemed similarish to Postgresql with respect to CREATE INDEX regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Improving speed of copy
On 20 Sep 2002 at 18:41, Nigel J. Andrews wrote: On Fri, 20 Sep 2002, Shridhar Daithankar wrote: In select test where approx. 15 rows where reported with query on index field, mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues eclipse the result.. I don't know about anyone else but I find this aspect strange. That's 1 second (approx.) per row retrieved. That is pretty dire for an index scan. The data/index must be very non unique. Sorry for late reply.. The numbers were scaled off.. Actually my fiend forgot to add units to those number.. The actual numbers are 140ms for mysql and 17 5ms for postgresql.. Further since result are obtained via 'time psql' higher overhead of postgres connection establishement is factored in.. Neck to neck I would say.. Bye Shridhar -- Steele's Law: There exist tasks which cannot be done by more than ten men or fewer than one hundred. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Improving speed of copy
On 20 Sep 2002 at 10:27, Mike Benoit wrote: On Fri, 2002-09-20 at 08:52, Shridhar Daithankar wrote: Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase in size. Vacuum was run on database. How did you calculate the size of database? If you used du make sure you do it in the data/base directory as to not include the WAL files. OK latest experiments, I turned number of buffers 15K and fsync is disabled.. Load time is now 1250 sec. I noticed lots of notices in log saying, XLogWrite: new log files created.. I am pushing wal_buffers to 1000 and wal_files to 40 to test again.. I hope it gives me some required boost.. And BTW about disk space usage, it's 2.6G with base pg_xlog taking 65M. still not good.. Will keep you guys updated.. Bye Shridhar -- It is necessary to have purpose.-- Alice #1, I, Mudd, stardate 4513.3 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Improving speed of copy
Hi all, While testing for large databases, I am trying to load 12.5M rows of data from a text file and it takes lot longer than mysql even with copy. Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around 11.5K rows per second. Each tuple has 23 fields with fixed length of around 100 bytes I wrote a programs which does inserts in batches but none of thme reaches performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can not cross 2.5K rows/sec. The machine is 800MHz, P-III/512MB/IDE disk. Postmaster is started with 30K buffers i.e. around 235MB buffers. Kernel caching paramaters are defaults. Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase in size. Vacuum was run on database. Any further help? Especially if batch inserts could be speed up, that would be great.. Bye Shridhar -- Alone, adj.:In bad company. -- Ambrose Bierce, The Devil's Dictionary ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Improving speed of copy
Are you using copy within a transaction? I don't know how to explain the size difference tho. I have never seen an overhead difference that large. What type of MySQL tables were you using and what version? Have you tried this with Oracle or similar commercial database? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Shridhar Daithankar Sent: Friday, September 20, 2002 9:52 AM To: [EMAIL PROTECTED] Subject: [HACKERS] Improving speed of copy Hi all, While testing for large databases, I am trying to load 12.5M rows of data from a text file and it takes lot longer than mysql even with copy. Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around 11.5K rows per second. Each tuple has 23 fields with fixed length of around 100 bytes I wrote a programs which does inserts in batches but none of thme reaches performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can not cross 2.5K rows/sec. The machine is 800MHz, P-III/512MB/IDE disk. Postmaster is started with 30K buffers i.e. around 235MB buffers. Kernel caching paramaters are defaults. Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase in size. Vacuum was run on database. Any further help? Especially if batch inserts could be speed up, that would be great.. Bye Shridhar -- Alone, adj.:In bad company. -- Ambrose Bierce, The Devil's Dictionary ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Improving speed of copy
On 20 Sep 2002 at 21:22, Shridhar Daithankar wrote: Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around 11.5K rows per second. Each tuple has 23 fields with fixed length of around 100 bytes I wrote a programs which does inserts in batches but none of thme reaches performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can not cross 2.5K rows/sec. 1121 sec. was time with postgres default of 64 buffers. With 30K buffers it has degraded to 1393 sec. One more issue is time taken for composite index creation. It's 4341 sec as opposed to 436 sec for mysql. These are three non-unique character fields where the combination itself is not unique as well. Will doing a R-Tree index would be a better choice? In select test where approx. 15 rows where reported with query on index field, mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues eclipse the result.. TIA once again.. Bye Shridhar -- revolutionary, adj.:Repackaged. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Improving speed of copy
Also, did you disable fsync? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Jonah H. Harris Sent: Friday, September 20, 2002 10:15 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [HACKERS] Improving speed of copy Are you using copy within a transaction? I don't know how to explain the size difference tho. I have never seen an overhead difference that large. What type of MySQL tables were you using and what version? Have you tried this with Oracle or similar commercial database? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Shridhar Daithankar Sent: Friday, September 20, 2002 9:52 AM To: [EMAIL PROTECTED] Subject: [HACKERS] Improving speed of copy Hi all, While testing for large databases, I am trying to load 12.5M rows of data from a text file and it takes lot longer than mysql even with copy. Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around 11.5K rows per second. Each tuple has 23 fields with fixed length of around 100 bytes I wrote a programs which does inserts in batches but none of thme reaches performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can not cross 2.5K rows/sec. The machine is 800MHz, P-III/512MB/IDE disk. Postmaster is started with 30K buffers i.e. around 235MB buffers. Kernel caching paramaters are defaults. Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase in size. Vacuum was run on database. Any further help? Especially if batch inserts could be speed up, that would be great.. Bye Shridhar -- Alone, adj.:In bad company. -- Ambrose Bierce, The Devil's Dictionary ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Improving speed of copy
On Fri, 2002-09-20 at 08:52, Shridhar Daithankar wrote: Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase in size. Vacuum was run on database. How did you calculate the size of database? If you used du make sure you do it in the data/base directory as to not include the WAL files. -- Best Regards, Mike Benoit NetNation Communication Inc. Systems Engineer Tel: 604-684-6892 or 888-983-6600 --- Disclaimer: Opinions expressed here are my own and not necessarily those of my employer ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Improving speed of copy
On Fri, 20 Sep 2002, Shridhar Daithankar wrote: In select test where approx. 15 rows where reported with query on index field, mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues eclipse the result.. I don't know about anyone else but I find this aspect strange. That's 1 second (approx.) per row retrieved. That is pretty dire for an index scan. The data/index must be very non unique. -- Nigel J. Andrews ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Improving speed of copy
Nigel J. Andrews wrote: On Fri, 20 Sep 2002, Shridhar Daithankar wrote: In select test where approx. 15 rows where reported with query on index field, mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues eclipse the result.. I don't know about anyone else but I find this aspect strange. That's 1 second (approx.) per row retrieved. That is pretty dire for an index scan. The data/index must be very non unique. Yeah, I'd agree that is strange. Can we see EXPLAIN ANALYZE for that query. Also, in one of your ealier posts you mentioned a slowdown after raising shared buffers from the default 64 to 3. You might have driven the machine into swapping. Maybe try something more like 1 - 15000. HTH, Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html