Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-03 Thread Anne Rosset
We saw a little bit improvement by increasing the min_pool_size but again I see 
a bigvariation in the time the query is executed. Here is the query:

srdb=> explain analyze SELECT
psrdb->artifact.id AS id,
psrdb->artifact.priority AS priority,
psrdb->project.path AS projectPathString,
psrdb->project.title AS projectTitle,
psrdb->folder.project_id AS projectId,   
psrdb->folder.title AS folderTitle,  
psrdb->item.folder_id AS folderId,   
psrdb->item.title AS title,  
psrdb->item.name AS name,
psrdb->field_value2.value AS status, 
psrdb->field_value3.value AS category,   
psrdb->sfuser.username AS submittedByUsername,
psrdb->sfuser.full_name AS submittedByFullname,
psrdb->sfuser2.username AS assignedToUsername, 
psrdb->sfuser2.full_name AS assignedToFullname,
psrdb->item.version AS version,
psrdb->CASE when ((SELECT  
psrdb(>mntr_subscription.user_id AS userId 
psrdb(> FROM   
psrdb(>mntr_subscription mntr_subscription 
psrdb(> WHERE  
psrdb(>artifact.id=mntr_subscription.object_key
psrdb(> AND mntr_subscription.user_id='user1439'
psrdb(> )= 'user1439') THEN 'user1439' ELSE null END AS monitoringUserId,
psrdb->tracker.icon AS trackerIcon,  
psrdb->tracker.remaining_effort_disabled AS remainingEffortDisabled,
psrdb->tracker.actual_effort_disabled AS actualEffortDisabled,  
psrdb->tracker.estimated_effort_disabled AS estimatedEffortDisabled 
psrdb-> FROM
psrdb->field_value field_value2,
psrdb->field_value field_value, 
psrdb->sfuser sfuser2,  
psrdb->field_value field_value3,
psrdb->field_value field_value4,
psrdb->item item,   
psrdb->project project, 
psrdb->relationship relationship,   
psrdb->tracker tracker, 
psrdb->artifact artifact,   
psrdb->sfuser sfuser,   
psrdb->folder folder
psrdb-> WHERE   
psrdb->artifact.id=item.id  
psrdb-> AND item.folder_id=folder.id
psrdb-> AND folder.project_id=project.id
psrdb-> AND artifact.group_fv=field_value.id
psrdb-> AND artifact.status_fv=field_value2.id  
psrdb-> AND artifact.category_fv=field_value3.id
psrdb-> AND artifact.customer_fv=field_value4.id
psrdb-> AND item.created_by_id=sfuser.id
psrdb-> AND relationship.is_deleted=false   
psrdb-> AND relationship.relationship_type_name='ArtifactAssignment'
psrdb-> AND relationship.origin_id=sfuser2.id   
psrdb-> AND artifact.id=relationship.target_id  
psrdb-> AND item.is_deleted=false   
psrdb-> AND ((artifact.priority=3)) 
psrdb-> AND (project.path='projects.psr-pub-13')
psrdb-> AND item.folder_id=tracker.id   
psrdb-> ;   

QUERY PLAN  
   

---
 Nested Loop  (cost=0.00..272.62 rows=1 width=181) (actual 
time=805.934..1792.596 rows=177 loops=1) 
   
   ->  Nested Loop  (cost=0.00..263.87 rows=1 width=167) (actual 
time=707.739..1553.348 rows=177 loops=1)   
   
 ->  Nested Loop  (cost=0.00..263.58 rows=1 width=1

Re: [PERFORM] Hardware suggestions for maximum read performance

2013-05-03 Thread Julien Cigar

On 05/03/2013 01:11, Mike McCann wrote:

Hello,



Hello,

We are in the fortunate situation of having more money than time to 
help solve our PostgreSQL 9.1 performance problem.


Our server hosts databases that are about 1 GB in size with the 
largest tables having order 10 million 20-byte indexed records. The 
data are loaded once and then read from a web app and other client 
programs.  Some of the queries execute ORDER BY on the results. There 
are typically less than a dozen read-only concurrent connections to 
any one database.




I would first check the spurious queries .. 10 millions rows isn't that 
huge. Perhaps you could paste your queries and an explain analyze of 
them ..? You could also log slow queries and use the auto_explain module


SELECTs for data are taking 10s of seconds. We'd like to reduce this 
to web app acceptable response times (less than 1 second). If this is 
successful then the size of the database will grow by a factor of ten 
- we will still want sub-second response times.  We are in the process 
of going through the excellent suggestions in the "PostgreSQL 9.0 High 
Performance" book to identify the bottleneck (we have reasonable 
suspicions that we are I/O bound), but would also like to place an 
order soon for the dedicated server which will host the production 
databases. Here are the specs of a server that we are considering with 
a budget of $13k US:


HP ProLiant DL360p Gen 8
Dual Intel Xeon 2.4GHz 4-core E5-2609 CPUs
64GB RAM
2x146GB 15K SAS hard drives
3x200GB SATA SLC SSDs
+ the usual accessories (optical drive, rail kit, dual power supplies)

Opinions?

Thanks in advance for any suggestions you have.

-Mike

--
Mike McCann
Software Engineer
Monterey Bay Aquarium Research Institute
7700 Sandholdt Road
Moss Landing, CA 95039-9644
Voice: 831.775.1769  Fax: 831.775.1736 http://www.mbari.org




--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.



Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-03 Thread Simon Riggs
On 2 May 2013 23:19,   wrote:
>> On 2 May 2013 01:49, Mark Kirkwood  wrote:
>>
>> I think we need a problem statement before we attempt a solution,
>> which is what Tom is alluding to.
>>
>
> Actually no - I think Tom (quite correctly) was saying that the patch was
> not a viable solution. With which I agree.
>
> I believe the title of this thread is the problem statement.
>
>> ISTM that you've got a case where the plan is very sensitive to a
>> table load. Which is a pretty common situation and one that can be
>> solved in various ways. I don't see much that Postgres can do because
>> it can't know ahead of time you're about to load rows. We could
>> imagine an optimizer that set thresholds on plans that caused the
>> whole plan to be recalculated half way thru a run, but that would be a
>> lot of work to design and implement and even harder to test. Having
>> static plans at least allows us to discuss what it does after the fact
>> with some ease.
>>
>> The plan is set using stats that are set when there are very few
>> non-NULL rows, and those increase massively on load. The way to cope
>> is to run the ANALYZE immediately after the load and then don't allow
>> auto-ANALYZE to reset them later.
>
> No. We do run analyze immediately after the load. The surprise was that
> this was not sufficient - the (small) amount of time where non optimal
> plans were being used due to the in progress row activity was enough to
> cripple the system - that is the problem. The analysis of why not led to
> the test case included in the original email. And sure it is deliberately
> crafted to display the issue, and is therefore open to criticism for being
> artificial. However it was purely meant to make it easy to see what I was
> talking about.

I had another look at this and see I that I read the second explain incorrectly.

The amount of data examined and returned is identical in both plans.
The only difference is the number of in-progress rows seen by the
second query. Looking at the numbers some more, it looks like 6000
in-progress rows are examined in addition to the data. It might be
worth an EXPLAIN patch to put instrumentation in to show that, but its
not that interesting.

It would be useful to force the indexscan into a bitmapscan to check
that the cost isn't attributable to the plan but to other overheads.

What appears to be happening is we're spending a lot of time in
TransactionIdIsInProgress() so we can set hints and then when we find
it is still in progress we then spend more time in XidIsInSnapshot()
while we check that it is still invisible to us. Even if the
transaction we see repeatedly ends, we will still pay the cost in
XidIsInSnapshot repeatedly as we execute.

Given that code path, I would expect it to suck worse on a live system
with many sessions, and even worse with many subtransactions.

(1) A proposed fix is attached, but its only a partial one and barely tested.

Deeper fixes might be

(2)  to sort the xid array if we call XidIsInSnapshot too many times
in a transaction. I don't think that is worth it, because a long
running snapshot may be examined many times, but is unlikely to see
multiple in-progress xids repeatedly. Whereas your case seems
reasonably common.

(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.

I think we need both (1) and (3) but the attached patch does just (1).

This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


xid_in_snapshot_cache.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware suggestions for maximum read performance

2013-05-03 Thread Scott Marlowe
Note that with linux (and a few other OSes) you can use RAID-1E
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#RAID_1E
with an odd number of drives.

On Fri, May 3, 2013 at 12:16 AM, Arjen van der Meijden
 wrote:
> 3x200GB suggests you want to use RAID5?
>
> Perhaps you should just pick 2x200GB and set them to RAID1. With roughly
> 200GB of storage, that should still easily house your "potentially
> 10GB"-database with ample of room to allow the SSD's to balance the writes.
> But you save the investment and its probably a bit faster with writes
> (although your raid-card may reduce or remove the differences with your
> workload).
>
> You can then either keep the money or invest in faster cpu's. With few
> concurrent connections the E5-2643 (also a quad core, but with 3.3GHz cores
> rather than 2.4GHz) may be interesting.
> Its obviously a bit of speculation to see whether that would help, but it
> should speed up sorts and other in-memory/cpu-operations (even if you're not
> - and never will be - cpu-bound right now).
>
> Best regards,
>
> Arjen
>
>
> On 3-5-2013 1:11 Mike McCann wrote:
>>
>> Hello,
>>
>> We are in the fortunate situation of having more money than time to help
>> solve our PostgreSQL 9.1 performance problem.
>>
>> Our server hosts databases that are about 1 GB in size with the largest
>> tables having order 10 million 20-byte indexed records. The data are
>> loaded once and then read from a web app and other client programs.
>>   Some of the queries execute ORDER BY on the results. There are
>> typically less than a dozen read-only concurrent connections to any one
>> database.
>>
>> SELECTs for data are taking 10s of seconds. We'd like to reduce this to
>> web app acceptable response times (less than 1 second). If this is
>> successful then the size of the database will grow by a factor of ten -
>> we will still want sub-second response times.  We are in the process of
>> going through the excellent suggestions in the "PostgreSQL 9.0 High
>> Performance" book to identify the bottleneck (we have reasonable
>> suspicions that we are I/O bound), but would also like to place an order
>> soon for the dedicated server which will host the production databases.
>> Here are the specs of a server that we are considering with a budget of
>> $13k US:
>>
>> HP ProLiant DL360p Gen 8
>> Dual Intel Xeon 2.4GHz 4-core E5-2609 CPUs
>> 64GB RAM
>> 2x146GB 15K SAS hard drives
>> 3x200GB SATA SLC SSDs
>> + the usual accessories (optical drive, rail kit, dual power supplies)
>>
>> Opinions?
>>
>> Thanks in advance for any suggestions you have.
>>
>> -Mike
>>
>> --
>> Mike McCann
>> Software Engineer
>> Monterey Bay Aquarium Research Institute
>> 7700 Sandholdt Road
>> Moss Landing, CA 95039-9644
>> Voice: 831.775.1769  Fax: 831.775.1736 http://www.mbari.org
>>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
To understand recursion, one must first understand recursion.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance