Doing it at low scales is not attractive.

Commercial databases are publishing at scale factor of 1000(about 1TB) to 10000(10TB) with one in 30TB space. So ideally right now tuning should start at 1000 scale factor.

Unfortunately I have tried that before with PostgreSQL the few of the problems are as follows:

Single stream loader of PostgreSQL takes hours to load data. (Single stream load... wasting all the extra cores out there)

Multiple table loads ( 1 per table) spawned via script is bit better but hits wal problems.

To avoid wal problems, I had created tables and load statements within the same transaction, faster but cannot create index before load or it starts writing to wal... AND if indexes are created after load, it takes about a day or so to create all the indices required. (Its single threaded and creating multiple indexes/indices at the same time could result in running out of temporary "DISK" space since the tables are so big. Which means 1 thread at a time is the answer for creating tables that are really big. It is slow.

Boy, by this time most people doing TPC-H in high end give up on PostgreSQL.

I have not even started Partitioning of tables yet since with the current framework, you have to load the tables separately into each tables which means for the TPC-H data you need "extra-logic" to take that table data and split it into each partition child table. Not stuff that many people want to do by hand.

Then for the power run that is essentially running one query at a time should essentially be able to utilize the full system (specially multi-core systems), unfortunately PostgreSQL can use only one core. (Plus since this is read only and there is no separate disk reader all other processes are idle) and system is running at 1/Nth capacity (where N is the number of cores/threads)

(I am not sure here with Partitioned tables, do you get N processes running in the system when you scan the partitioned table?)

Even off-loading work like "fetching the data into bufferpool" into separate processes will go big time with this type of workloads.

I would be happy to help out if folks here want to do work related to it. Infact if you have time, I can request a project in one of the Sun Benchmarking center to see what we can learn with community members interested in understanding where PostgreSQL performs and fails.

Regards,
Jignesh

Greg Smith wrote:
On Mon, 4 Feb 2008, Simon Riggs wrote:

Would anybody like to repeat these tests with the latest production
versions of these databases (i.e. with PGSQL 8.3)

Do you have any suggestions on how people should run TPC-H? It looked like a bit of work to sort through how to even start this exercise.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to