Re: [PERFORM] Benchmark Data requested

Jignesh K. Shah Mon, 04 Feb 2008 15:02:11 -0800

Doing it at low scales is not attractive.

Commercial databases are publishing at scale factor of 1000(about 1TB)to 10000(10TB) with one in 30TB space. So ideally right now tuningshould start at 1000 scale factor.

Unfortunately I have tried that before with PostgreSQL the few of theproblems are as follows:

Single stream loader of PostgreSQL takes hours to load data. (Singlestream load... wasting all the extra cores out there)

Multiple table loads ( 1 per table) spawned via script is bit betterbut hits wal problems.

To avoid wal problems, I had created tables and load statements withinthe same transaction, faster but cannot create index before load or itstarts writing to wal... AND if indexes are created after load, it takesabout a day or so to create all the indices required. (Its singlethreaded and creating multiple indexes/indices at the same time couldresult in running out of temporary "DISK" space since the tables are sobig. Which means 1 thread at a time is the answer for creating tablesthat are really big. It is slow.

Boy, by this time most people doing TPC-H in high end give up onPostgreSQL.

I have not even started Partitioning of tables yet since with thecurrent framework, you have to load the tables separately into eachtables which means for the TPC-H data you need "extra-logic" to takethat table data and split it into each partition child table. Not stuffthat many people want to do by hand.

Then for the power run that is essentially running one query at a timeshould essentially be able to utilize the full system (speciallymulti-core systems), unfortunately PostgreSQL can use only one core.(Plus since this is read only and there is no separate disk reader allother processes are idle) and system is running at 1/Nth capacity (whereN is the number of cores/threads)

(I am not sure here with Partitioned tables, do you get N processesrunning in the system when you scan the partitioned table?)

Even off-loading work like "fetching the data into bufferpool" intoseparate processes will go big time with this type of workloads.

I would be happy to help out if folks here want to do work related toit. Infact if you have time, I can request a project in one of the SunBenchmarking center to see what we can learn with community membersinterested in understanding where PostgreSQL performs and fails.


Regards,
Jignesh

Greg Smith wrote:

On Mon, 4 Feb 2008, Simon Riggs wrote:

Would anybody like to repeat these tests with the latest production
versions of these databases (i.e. with PGSQL 8.3)

Do you have any suggestions on how people should run TPC-H? It lookedlike a bit of work to sort through how to even start this exercise.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [PERFORM] Benchmark Data requested

Reply via email to