Also, thanks for the advice on batching my queries. I am now using a very
efficient bulk data read and write methods for Postgres.

My program bulk reads 100,000 rows, processes those rows (during which it
does a few SELECTs), and then writes 100,000 rows at a time.

It cycles through this until it has processed all 12,000,000 rows.

This, plus the parallelism fix, will probably convert this 30 hour program
to a <2 hour program.

Aren

On Sun, May 22, 2011 at 9:08 AM, Aren Cambre <a...@arencambre.com> wrote:

> Just wanted to again say thanks for everyone's help.
>
> The main problem was that my program was running in serial, not parallel,
> even though I thought I used a textbook example of PLINQ. Your assistance
> helped me get to the point where I could conclusively determine everything
> was running in serial. It was more obvious than I realized.
>
> Thanks to help through
> http://stackoverflow.com/questions/6086111/plinq-on-concurrentqueue-isnt-multithreading,
> I have switched to the .NET Framework's Task Parallel Library, and it's
> slamming the 8 cores hard now! And there's a bunch of concurrent connections
> to Postgres. :-)
>
> Aren
>
> On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <a...@arencambre.com> wrote:
>
>> I have a multi-threaded app. It uses ~22 threads to query Postgres.
>>
>> Postgres won't use more than 1 CPU core. The 22-threaded app only has 3%
>> CPU utilization because it's mostly waiting on Postgres.
>>
>> Here's the details:
>>
>> The app has a "main" thread that reads table A's 11,000,000 rows, one at a
>> time. The main thread spawns a new thread for each row in table A's data.
>> This new thread:
>>
>>    1. Opens a connection to the DB.
>>    2. Does some calculations on the data, including 1 to 102 SELECTs on
>>    table B.
>>    3. With an INSERT query, writes a new row to table C.
>>    4. Closes the connection.
>>    5. Thread dies. Its data is garbage collected eventually.
>>
>> Physical/software details:
>>
>>    - Core i7 processor--4 physical cores, but OS sees 8 cores
>>    via hyper-threading
>>    - 7200 RPM 500 GB HDD
>>    - About 1/2 total RAM is free during app execution
>>    - Windows 7 x64
>>    - Postgres 9.0.4 32-bit (32-bit required for PostGIS)
>>    - App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres
>>    connection tool.
>>
>> At first, the app pounds all 8 cores. But it quickly tapers off, and only
>> 1 core that's busy. The other 7 cores are barely doing a thing.
>>
>> Postgres has 9 open processes. 1 process was slamming that 1 busy core.
>> The other 8 Postgres processes were alive but idle.
>>
>> Each thread creates its own connection. It's not concurrently shared with
>> the main thread or any other threads. I haven't disabled connection pooling;
>> when a thread closes a connection, it's technically releasing it into a pool
>> for later threads to use.
>>
>> Disk utilization is low. The HDD light is off much more than it is on, and
>> a review of total HDD activity put it between 0% and 10% of total capacity.
>> The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.
>>
>> The app runs 2 different queries on table B. The 1st query is run once,
>> the 2nd query can be run up to 101 times. Table C  has redundant indexes:
>> every column referenced in the SQL WHERE clauses for both queries are
>> indexed separately and jointly. E.g., if query X references columns Y and Z,
>> there are 3 indexes:
>>
>>    1. An index for Y
>>    2. An index for Z
>>    3. An index for Y and Z
>>
>> Table C is simple. It has four columns: two integers, a varchar(18), and a
>> boolean. It has no indexes. A primary key on the varchar(18) column is its
>> only constraint.
>>
>> A generalized version of my INSERT command for table C is:
>> *INSERT INTO raw.C VALUES (:L, :M, :N, :P)*
>>
>> I am using parameters to fill in the 4 values.
>>
>> I have verified table C manually, and correct data is being stored in it.
>>
>> Several Google searches suggest Postgres should use multiple cores
>> automatically. I've consulted with Npgsql's developer, and he didn't see how
>> Npgsql itself could force Postgres to one core. (See
>> http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)
>>
>> What can I do to improve this? Could I be inadvertently limiting Postgres
>> to one core?
>>
>> Aren Cambre
>>
>
>

Reply via email to