Also, thanks for the advice on batching my queries. I am now using a very efficient bulk data read and write methods for Postgres.
My program bulk reads 100,000 rows, processes those rows (during which it does a few SELECTs), and then writes 100,000 rows at a time. It cycles through this until it has processed all 12,000,000 rows. This, plus the parallelism fix, will probably convert this 30 hour program to a <2 hour program. Aren On Sun, May 22, 2011 at 9:08 AM, Aren Cambre <a...@arencambre.com> wrote: > Just wanted to again say thanks for everyone's help. > > The main problem was that my program was running in serial, not parallel, > even though I thought I used a textbook example of PLINQ. Your assistance > helped me get to the point where I could conclusively determine everything > was running in serial. It was more obvious than I realized. > > Thanks to help through > http://stackoverflow.com/questions/6086111/plinq-on-concurrentqueue-isnt-multithreading, > I have switched to the .NET Framework's Task Parallel Library, and it's > slamming the 8 cores hard now! And there's a bunch of concurrent connections > to Postgres. :-) > > Aren > > On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <a...@arencambre.com> wrote: > >> I have a multi-threaded app. It uses ~22 threads to query Postgres. >> >> Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% >> CPU utilization because it's mostly waiting on Postgres. >> >> Here's the details: >> >> The app has a "main" thread that reads table A's 11,000,000 rows, one at a >> time. The main thread spawns a new thread for each row in table A's data. >> This new thread: >> >> 1. Opens a connection to the DB. >> 2. Does some calculations on the data, including 1 to 102 SELECTs on >> table B. >> 3. With an INSERT query, writes a new row to table C. >> 4. Closes the connection. >> 5. Thread dies. Its data is garbage collected eventually. >> >> Physical/software details: >> >> - Core i7 processor--4 physical cores, but OS sees 8 cores >> via hyper-threading >> - 7200 RPM 500 GB HDD >> - About 1/2 total RAM is free during app execution >> - Windows 7 x64 >> - Postgres 9.0.4 32-bit (32-bit required for PostGIS) >> - App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres >> connection tool. >> >> At first, the app pounds all 8 cores. But it quickly tapers off, and only >> 1 core that's busy. The other 7 cores are barely doing a thing. >> >> Postgres has 9 open processes. 1 process was slamming that 1 busy core. >> The other 8 Postgres processes were alive but idle. >> >> Each thread creates its own connection. It's not concurrently shared with >> the main thread or any other threads. I haven't disabled connection pooling; >> when a thread closes a connection, it's technically releasing it into a pool >> for later threads to use. >> >> Disk utilization is low. The HDD light is off much more than it is on, and >> a review of total HDD activity put it between 0% and 10% of total capacity. >> The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds. >> >> The app runs 2 different queries on table B. The 1st query is run once, >> the 2nd query can be run up to 101 times. Table C has redundant indexes: >> every column referenced in the SQL WHERE clauses for both queries are >> indexed separately and jointly. E.g., if query X references columns Y and Z, >> there are 3 indexes: >> >> 1. An index for Y >> 2. An index for Z >> 3. An index for Y and Z >> >> Table C is simple. It has four columns: two integers, a varchar(18), and a >> boolean. It has no indexes. A primary key on the varchar(18) column is its >> only constraint. >> >> A generalized version of my INSERT command for table C is: >> *INSERT INTO raw.C VALUES (:L, :M, :N, :P)* >> >> I am using parameters to fill in the 4 values. >> >> I have verified table C manually, and correct data is being stored in it. >> >> Several Google searches suggest Postgres should use multiple cores >> automatically. I've consulted with Npgsql's developer, and he didn't see how >> Npgsql itself could force Postgres to one core. (See >> http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.) >> >> What can I do to improve this? Could I be inadvertently limiting Postgres >> to one core? >> >> Aren Cambre >> > >