>From: Josh Berkus <josh@agliodbs.com> >Sent: Sep 29, 2005 12:54 PM >Subject: Re: [HACKERS] [PERFORM] A Better External Sort? > >The biggest single area where I see PostgreSQL external >sort sucking is on index creation on large tables. For >example, for free version of TPCH, it takes only 1.5 hours to >load a 60GB Lineitem table on OSDL's hardware, but over 3 >hours to create each index on that table. This means that >over all our load into TPCH takes 4 times as long to create >the indexes as it did to bulk load the data. > Hmmm. 60GB/5400secs= 11MBps. That's ssllooww. So the first problem is evidently our physical layout and/or HD IO layer sucks.
Creating the table and then creating the indexes on the table is going to require more physical IO than if we created the table and the indexes concurrently in chunks and then combined the indexes on the chunks into the overall indexes for the whole table, so there's a potential speed-up. The method I've been talking about is basically a recipe for creating indexes as fast as possible with as few IO operations, HD or RAM, as possible and nearly no random ones, so it could help as well. OTOH, HD IO rate is the fundamental performance metric. As long as our HD IO rate is pessimal, so will the performance of everything else be. Why can't we load a table at closer to the peak IO rate of the HDs? >Anyone restoring a large database from pg_dump is in the >same situation. Even worse, if you have to create a new >index on a large table on a production database in use, >because the I/O from the index creation swamps everything. > Fix for this in the works ;-) >Following an index creation, we see that 95% of the time >required is the external sort, which averages 2mb/s. > Assuming decent HD HW, this is HORRIBLE. What's kind of instrumenting and profiling has been done of the code involved? >This is with seperate drives for the WAL, the pg_tmp, the table >and the index. I've confirmed that increasing work_mem >beyond a small minimum (around 128mb) had no benefit on >the overall index creation speed. > No surprise. The process is severely limited by the abyssmally slow HD IO. Ron ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend