Re: [PERFORM] Hardware suggestions for maximum read performance

2013-05-06 Thread Yuri Levinsky
Mike,

According to your budget the following or similar might be useful for
you:

HP 365GB Multi Level Cell G2 PCIe ioDrive2 for ProLiant Servers

 

This PCIe card-based direct-attach solid state storage technology
solutions for application performance enhancement. I believe you can
find cheaper solutions on the market that will provide same performance
characteristics (935,000 write IOPS, up to 892,000 read IOPS, up to 3
GB/s Bandwidth). 

 

 

Sincerely yours,

 

 

Yuri Levinsky, DBA

Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel

Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

 

From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Mike McCann
Sent: Friday, May 03, 2013 2:11 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Hardware suggestions for maximum read performance

 

Hello,

 

We are in the fortunate situation of having more money than time to help
solve our PostgreSQL 9.1 performance problem.

 

Our server hosts databases that are about 1 GB in size with the largest
tables having order 10 million 20-byte indexed records. The data are
loaded once and then read from a web app and other client programs.
Some of the queries execute ORDER BY on the results. There are typically
less than a dozen read-only concurrent connections to any one database.

 

SELECTs for data are taking 10s of seconds. We'd like to reduce this to
web app acceptable response times (less than 1 second). If this is
successful then the size of the database will grow by a factor of ten -
we will still want sub-second response times.  We are in the process of
going through the excellent suggestions in the PostgreSQL 9.0 High
Performance book to identify the bottleneck (we have reasonable
suspicions that we are I/O bound), but would also like to place an order
soon for the dedicated server which will host the production databases.
Here are the specs of a server that we are considering with a budget of
$13k US:

 

HP ProLiant DL360p Gen 8

Dual Intel Xeon 2.4GHz 4-core E5-2609 CPUs

64GB RAM

2x146GB 15K SAS hard drives

3x200GB SATA SLC SSDs

+ the usual accessories (optical drive, rail kit, dual power
supplies)

 

Opinions?

 

Thanks in advance for any suggestions you have.


-Mike

 

--
Mike McCann
Software Engineer
Monterey Bay Aquarium Research Institute
7700 Sandholdt Road
Moss Landing, CA 95039-9644
Voice: 831.775.1769  Fax: 831.775.1736 http://www.mbari.org 

 



This mail was received via Mail-SeCure System.

= 

image002.jpg

Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Simon Riggs
On 6 May 2013 02:51, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:
 On 05/05/13 00:49, Simon Riggs wrote:

 On 3 May 2013 13:41, Simon Riggs si...@2ndquadrant.com wrote:

 (3) to make the check on TransactionIdIsInProgress() into a heuristic,
 since we don't *need* to check that, so if we keep checking the same
 xid repeatedly we can reduce the number of checks or avoid xids that
 seem to be long running. That's slightly more coding than my quick
 hack here but seems worth it.

 I think we need both (1) and (3) but the attached patch does just (1).

 This is a similar optimisation to the one I introduced for
 TransactionIdIsKnownCompleted(), except this applies to repeated
 checking of as yet-incomplete xids, and to bulk concurrent
 transactions.


 ISTM we can improve performance of TransactionIdIsInProgress() by
 caching the procno of our last xid.

 Mark, could you retest with both these patches? Thanks.


 Thanks Simon, will do and report back.

OK, here's a easily reproducible test...

Prep:
DROP TABLE IF EXISTS plan;
CREATE TABLE plan
(
  id INTEGER NOT NULL,
  typ INTEGER NOT NULL,
  dat TIMESTAMP,
  val TEXT NOT NULL
);
insert into plan select generate_series(1,10), 0,
current_timestamp, 'some texts';
CREATE UNIQUE INDEX plan_id ON plan(id);
CREATE INDEX plan_dat ON plan(dat);

testcase.pgb
select count(*) from plan where dat is null and typ = 3;

Session 1:
pgbench -n -f testcase.pgb -t 100

Session 2:
BEGIN; insert into plan select 100 + generate_series(1, 10),
3, NULL, 'b';

Transaction rate in Session 1: (in tps)
(a) before we run Session 2:
Current: 5600tps
Patched: 5600tps

(b) after Session 2 has run, yet before transaction end
Current: 56tps
Patched: 65tps

(c ) after Session 2 has aborted
Current/Patched: 836, 1028, 5400tps
VACUUM improves timing again

New version of patch attached which fixes a few bugs.

Patch works and improves things, but we're still swamped by the block
accesses via the index.

Which brings me back to Mark's original point, which is that we are
x100 times slower in this case and it *is* because the choice of
IndexScan is a bad one for this situation.

After some thought on this, I do think we need to do something about
it directly, rather than by tuning infrastructire (as I just
attempted). The root cause here is that IndexScan plans are sensitive
to mistakes in data distribution, much more so than other plan types.

The two options, broadly, are to either

1. avoid IndexScans in the planner unless they have a *significantly*
better cost. At the moment we use IndexScans if cost is lowest, even
if that is only by a whisker.

2. make IndexScans adaptive so that they switch to other plan types
mid-way through execution.

(2) seems fairly hard generically, since we'd have to keep track of
the tids returned from the IndexScan to allow us to switch to a
different plan and avoid re-issuing rows that we've already returned.
But maybe if we adapted the IndexScan plan type so that it adopted a
more page oriented approach internally, it could act like a
bitmapscan. Anyway, that would need some proof that it would work and
sounds like a fair task.

(1) sounds more easily possible and plausible. At the moment we have
enable_indexscan = off. If we had something like
plan_cost_weight_indexscan = N, we could selectively increase the cost
of index scans so that they would be less likely to be selected. i.e.
plan_cost_weight_indexscan = 2 would mean an indexscan would need to
be half the cost of any other plan before it was selected. (parameter
name selected so it could apply to all parameter types). The reason to
apply this weighting would be to calculate risk adjusted cost not
just estimated cost.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


cache_TransactionIdInProgress.v2.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


 -Original Message-
 From: Anne Rosset [mailto:aros...@collab.net]
 Sent: Friday, May 03, 2013 4:52 PM
 To: Igor Neyman; k...@rice.edu
 Cc: pgsql-performance@postgresql.org
 Subject: RE: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 We saw a little bit improvement by increasing the min_pool_size but
 again I see a bigvariation in the time the query is executed. Here is
 the query:
 
 srdb= explain analyze SELECT
 psrdb-artifact.id AS id,
 psrdb-artifact.priority AS priority,
 psrdb-project.path AS projectPathString,
 psrdb-project.title AS projectTitle,
 psrdb-folder.project_id AS projectId,
 psrdb-folder.title AS folderTitle,
 psrdb-item.folder_id AS folderId,
 psrdb-item.title AS title,
 psrdb-item.name AS name,
 psrdb-field_value2.value AS status,
 psrdb-field_value3.value AS category,
 psrdb-sfuser.username AS submittedByUsername,
 psrdb-sfuser.full_name AS submittedByFullname,
 psrdb-sfuser2.username AS assignedToUsername,
 psrdb-sfuser2.full_name AS assignedToFullname,
 psrdb-item.version AS version,
 psrdb-CASE when ((SELECT
 psrdb(mntr_subscription.user_id AS userId
 psrdb( FROM
 psrdb(mntr_subscription mntr_subscription
 psrdb( WHERE
 psrdb(artifact.id=mntr_subscription.object_key
 psrdb( AND mntr_subscription.user_id='user1439'
 psrdb( )= 'user1439') THEN 'user1439' ELSE null END AS
 monitoringUserId,
 psrdb-tracker.icon AS trackerIcon,
 psrdb-tracker.remaining_effort_disabled AS
 remainingEffortDisabled,
 psrdb-tracker.actual_effort_disabled AS actualEffortDisabled,
 psrdb-tracker.estimated_effort_disabled AS
 estimatedEffortDisabled
 psrdb- FROM
 psrdb-field_value field_value2,
 psrdb-field_value field_value,
 psrdb-sfuser sfuser2,
 psrdb-field_value field_value3,
 psrdb-field_value field_value4,
 psrdb-item item,
 psrdb-project project,
 psrdb-relationship relationship,
 psrdb-tracker tracker,
 psrdb-artifact artifact,
 psrdb-sfuser sfuser,
 psrdb-folder folder
 psrdb- WHERE
 psrdb-artifact.id=item.id
 psrdb- AND item.folder_id=folder.id
 psrdb- AND folder.project_id=project.id
 psrdb- AND artifact.group_fv=field_value.id
 psrdb- AND artifact.status_fv=field_value2.id
 psrdb- AND artifact.category_fv=field_value3.id
 psrdb- AND artifact.customer_fv=field_value4.id
 psrdb- AND item.created_by_id=sfuser.id
 psrdb- AND relationship.is_deleted=false
 psrdb- AND
 relationship.relationship_type_name='ArtifactAssignment'
 psrdb- AND relationship.origin_id=sfuser2.id
 psrdb- AND artifact.id=relationship.target_id
 psrdb- AND item.is_deleted=false
 psrdb- AND ((artifact.priority=3))
 psrdb- AND (project.path='projects.psr-pub-13')
 psrdb- AND item.folder_id=tracker.id
 psrdb- ;
 
 QUERY PLAN
 
 ---
 -
 ---
 
  Nested Loop  (cost=0.00..272.62 rows=1 width=181) (actual
 time=805.934..1792.596 rows=177 loops=1)
 
-  Nested Loop  (cost=0.00..263.87 rows=1 width=167) (actual
 time=707.739..1553.348 rows=177 loops=1)
 
  -  Nested Loop  (cost=0.00..263.58 rows=1 width=153) (actual
 time=653.053..1496.839 rows=177 loops=1)
 
-  Nested Loop  (cost=0.00..262.50 rows=1 width=154)
 (actual time=565.627..1385.667 rows=177 loops=1)
 
  -  Nested Loop  (cost=0.00..262.08 rows=1
 width=163) (actual time=565.605..1383.686 rows=177 loops
 =1)
-  Nested Loop  (cost=0.00..261.67 rows=1
 width=166) (actual time=530.928..1347.053 rows=177
  loops=1)
  -  Nested Loop  (cost=0.00..261.26
 rows=1 width=175) (actual time=530.866..1345.032
 rows=177 loops=1)
-  Nested Loop
 (cost=0.00..260.84 rows=1 width=178) (actual time=372.825..1184.
 668 rows=177 loops=1)
  -  Nested Loop
 (cost=0.00..250.33 rows=29 width=128) (actual time=317.897
 ..534.645 rows=1011 loops=1)
-  Nested Loop
 (cost=0.00..207.56 rows=3 width=92) (actual time=251
 .014..408.868 rows=10 loops=1)
  -  Nested
 Loop  (cost=0.00..163.54 rows=155 width=65) (actual
 time=146.176..382.023 rows=615 loops=1)
-
 Index Scan using project_path on project  (cost=0.00.
 .8.27 rows=1 width=42) (actual time=76.581..76.583 rows=1 loops=1)
 
 Index Cond: ((path)::text = 

Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Merlin Moncure
On Fri, May 3, 2013 at 3:52 PM, Anne Rosset aros...@collab.net wrote:
 We saw a little bit improvement by increasing the min_pool_size but again I 
 see a bigvariation in the time the query is executed. Here is the query:

 srdb= explain analyze SELECT
 psrdb-artifact.id AS id,
 psrdb-artifact.priority AS priority,
 psrdb-project.path AS projectPathString,
 psrdb-project.title AS projectTitle,
 psrdb-folder.project_id AS projectId,
 psrdb-folder.title AS folderTitle,
 psrdb-item.folder_id AS folderId,
 psrdb-item.title AS title,
 psrdb-item.name AS name,
 psrdb-field_value2.value AS status,
 psrdb-field_value3.value AS category,
 psrdb-sfuser.username AS submittedByUsername,
 psrdb-sfuser.full_name AS submittedByFullname,
 psrdb-sfuser2.username AS assignedToUsername,
 psrdb-sfuser2.full_name AS assignedToFullname,
 psrdb-item.version AS version,
 psrdb-CASE when ((SELECT
 psrdb(mntr_subscription.user_id AS userId
 psrdb( FROM
 psrdb(mntr_subscription mntr_subscription
 psrdb( WHERE
 psrdb(artifact.id=mntr_subscription.object_key
 psrdb( AND mntr_subscription.user_id='user1439'
 psrdb( )= 'user1439') THEN 'user1439' ELSE null END AS monitoringUserId,
 psrdb-tracker.icon AS trackerIcon,
 psrdb-tracker.remaining_effort_disabled AS remainingEffortDisabled,
 psrdb-tracker.actual_effort_disabled AS actualEffortDisabled,
 psrdb-tracker.estimated_effort_disabled AS estimatedEffortDisabled
 psrdb- FROM
 psrdb-field_value field_value2,
 psrdb-field_value field_value,
 psrdb-sfuser sfuser2,
 psrdb-field_value field_value3,
 psrdb-field_value field_value4,
 psrdb-item item,
 psrdb-project project,
 psrdb-relationship relationship,
 psrdb-tracker tracker,
 psrdb-artifact artifact,
 psrdb-sfuser sfuser,
 psrdb-folder folder
 psrdb- WHERE
 psrdb-artifact.id=item.id
 psrdb- AND item.folder_id=folder.id
 psrdb- AND folder.project_id=project.id
 psrdb- AND artifact.group_fv=field_value.id
 psrdb- AND artifact.status_fv=field_value2.id
 psrdb- AND artifact.category_fv=field_value3.id
 psrdb- AND artifact.customer_fv=field_value4.id
 psrdb- AND item.created_by_id=sfuser.id
 psrdb- AND relationship.is_deleted=false
 psrdb- AND relationship.relationship_type_name='ArtifactAssignment'
 psrdb- AND relationship.origin_id=sfuser2.id
 psrdb- AND artifact.id=relationship.target_id
 psrdb- AND item.is_deleted=false
 psrdb- AND ((artifact.priority=3))
 psrdb- AND (project.path='projects.psr-pub-13')
 psrdb- AND item.folder_id=tracker.id
 psrdb- ;


(*please* stop top-posting).

What is the cpu profile of the machine while you are threading the
query out?  if all cpu peggged @ or near 100%, it's possible seeing
spinlock contention on some of the key index buffers -- but that's a
long shot.  More likely it's planner malfeasance.  Are you running
this *exact* query across all threads or are the specific parameters
changing (and if so, maybe instead the problem is that specific
arguments sets providing bad plans?)

This is a classic case of surrogate key design run amok, leading to
bad performance via difficult to plan queries and/or poorly utilized
indexes.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1

2013-05-06 Thread Christoph Berg
Hi,

this is more of a report than a question, because we thought this
would be interesting to share.

We recently (finally) migrated an Request Tracker 3.4 database running
on 8.1.19 to 9.2.4. The queries used by rt3.4 are sometimes weird, but
8.1 coped without too much tuning. The schema looks like this:

http://bestpractical.com/rt/3.4-schema.png

One query that took about 80ms on 8.1.19 took 8s on 9.2.4:

SELECT DISTINCT main.* FROM Users main , Principals Principals_1, ACL ACL_2, 
Groups Groups_3, CachedGroupMembers CachedGroupMembers_4
  WHERE ((ACL_2.RightName = 'OwnTicket'))
AND ((CachedGroupMembers_4.MemberId = Principals_1.id))
AND ((Groups_3.id = CachedGroupMembers_4.GroupId))
AND ((Principals_1.Disabled = '0') OR (Principals_1.Disabled = '0'))
AND ((Principals_1.id != '1'))
AND ((main.id = Principals_1.id))
AND  (
  ( ACL_2.PrincipalId = Groups_3.id AND ACL_2.PrincipalType = 'Group'
  AND (   Groups_3.Domain = 'SystemInternal' OR Groups_3.Domain = 
'UserDefined' OR Groups_3.Domain = 'ACLEquivalence'))
 OR ( ( (Groups_3.Domain = 'RT::Queue-Role' AND Groups_3.Instance = 10) 
 OR ( Groups_3.Domain = 'RT::Ticket-Role' AND Groups_3.Instance = 999028)  )  
AND Groups_3.Type = ACL_2.PrincipalType)
)
AND (ACL_2.ObjectType = 'RT::System' OR  (ACL_2.ObjectType = 'RT::Queue' 
AND ACL_2.ObjectId = 10) )
ORDER BY main.Name ASC;


8.1 plan: (http://explain.depesz.com/s/gZ6)

 Unique  (cost=1117.67..1118.46 rows=9 width=1115) (actual time=82.646..85.695 
rows=439 loops=1)
   -  Sort  (cost=1117.67..1117.70 rows=9 width=1115) (actual 
time=82.645..82.786 rows=1518 loops=1)
 Sort Key: main.name, main.id, main.password, main.comments, 
main.signature, main.emailaddress, main.freeformcontactinfo, main.organization, 
main.realname, main.nickname, main.lang, main.emailencoding, main.webencoding, 
main.externalcontactinfoid, main.contactinfosystem, main.externalauthid, 
main.authsystem, main.gecos, main.homephone, main.workphone, main.mobilephone, 
main.pagerphone, main.address1, main.address2, main.city, main.state, main.zip, 
main.country, main.timezone, main.pgpkey, main.creator, main.created, 
main.lastupdatedby, main.lastupdated
 -  Nested Loop  (cost=10.51..1117.53 rows=9 width=1115) (actual 
time=0.205..23.688 rows=1518 loops=1)
   -  Nested Loop  (cost=10.51..1087.81 rows=9 width=1119) (actual 
time=0.193..13.495 rows=1600 loops=1)
 -  Nested Loop  (cost=10.51..1060.15 rows=9 width=4) 
(actual time=0.175..3.307 rows=1635 loops=1)
   -  Nested Loop  (cost=10.51..536.13 rows=4 width=4) 
(actual time=0.161..1.057 rows=23 loops=1)
 Join Filter: (((outer.principalid = 
inner.id) AND ((outer.principaltype)::text = 'Group'::text) AND 
(((inner.domain)::text = 'SystemInternal'::text) OR 
((inner.domain)::text = 'UserDefined'::text) OR ((inner.domain)::text = 
'ACLEquivalence'::text))) OR (inner.domain)::text = 
'RT::Queue-Role'::text) AND (inner.instance = 10)) OR 
(((inner.domain)::text = 'RT::Ticket-Role'::text) AND (inner.instance = 
999028))) AND ((inner.type)::text = (outer.principaltype)::text)))
 -  Bitmap Heap Scan on acl acl_2  
(cost=4.24..61.15 rows=33 width=13) (actual time=0.107..0.141 rows=22 loops=1)
   Recheck Cond: rightname)::text = 
'OwnTicket'::text) AND ((objecttype)::text = 'RT::System'::text)) OR 
(((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 
'RT::Queue'::text) AND (objectid = 10)))
   -  BitmapOr  (cost=4.24..4.24 rows=34 
width=0) (actual time=0.097..0.097 rows=0 loops=1)
 -  Bitmap Index Scan on acl1  
(cost=0.00..2.13 rows=22 width=0) (actual time=0.054..0.054 rows=8 loops=1)
   Index Cond: 
(((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 
'RT::System'::text))
 -  Bitmap Index Scan on acl1  
(cost=0.00..2.11 rows=13 width=0) (actual time=0.041..0.041 rows=14 loops=1)
   Index Cond: 
(((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 
'RT::Queue'::text) AND (objectid = 10))
 -  Bitmap Heap Scan on groups groups_3  
(cost=6.27..14.32 rows=2 width=36) (actual time=0.036..0.036 rows=1 loops=22)
   Recheck Cond: ((outer.principalid = 
groups_3.id) OR groups_3.type)::text = (outer.principaltype)::text) AND 
(groups_3.instance = 10) AND ((groups_3.domain)::text = 
'RT::Queue-Role'::text)) OR (((groups_3.type)::text = 
(outer.principaltype)::text) AND (groups_3.instance = 999028) AND 
((groups_3.domain)::text = 'RT::Ticket-Role'::text
   Filter: (((domain)::text = 

Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Gavin Flower

On 03/05/13 00:27, Simon Riggs wrote:

On 2 May 2013 01:49, Mark Kirkwoodmark.kirkw...@catalyst.net.nz  wrote:

On 02/05/13 02:06, Tom Lane wrote:

Mark Kirkwoodmark.kirkw...@catalyst.net.nz  writes:

I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered 9.2 in the original email, but we
have covered that now.

No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' to
look at later queues.  The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.


Yeah, I did think that beta might be consuming everyone's attention (of
course immediately *after* sending the email)!

And yes, the patch was merely to illustrate the problem rather than any
serious attempt at a solution.

I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.

ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.

The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.

--
  Simon Riggshttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
Would be practicable to have a facility for telling Postgres in advance 
what you intend to do, so it can create plans accordingly?


I won't try and invent syntax, but it would be good to tell the system 
that you intend to:

insert a million rows sequentially
or
insert ten million rows randomly
or
update two million rows in the primary key range 'AA0' to 'PP88877'
etc.

Though, sometime it may be useful to give a little more 
detail,especially if you have a good estimate of the distribution of 
primary keys, e.g.:

AA00
20%
AA000456
 0%
KE700999
30%
NN45
35%
PA01
15%
PP808877


I figure that if the planner had more information about what one intends 
to do, then it could combine that with the statistics it knows, to come 
up with a more realistic plan.



Cheers,
Gavin





[PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Anne Rosset
Hi all,
We are running a stress test that executes one select query with multiple 
threads.
The query executes very fast (10ms). It returns 100 rows.  I see deterioration 
in the performance when we have multiple threads executing the query. With 100 
threads, the query takes between 3s and 8s.

I suppose there is a way to tune our database. What are the parameters I should 
look into? (shared_buffer?, wal_buffer?)


srdb= explain analyze SELECT
psrdb-artifact.id AS id,
psrdb-artifact.priority AS priority,
psrdb-project.path AS projectPathString,
psrdb-project.title AS projectTitle,
psrdb-folder.project_id AS projectId,
psrdb-folder.path AS folderPathString,
psrdb-folder.title AS folderTitle,
psrdb-item.folder_id AS folderId,
psrdb-item.planning_folder_id AS planningFolderId,
psrdb-item.title AS title,
psrdb-item.name AS name,
psrdb-artifact.description AS description,
psrdb-field_value.value AS artifactGroup,
psrdb-field_value2.value AS status,
psrdb-field_value2.value_class AS statusClass,
psrdb-field_value3.value AS category,
psrdb-field_value4.value AS customer,
psrdb-sfuser.username AS submittedByUsername,
psrdb-sfuser.full_name AS submittedByFullname,
psrdb-item.date_created AS submittedDate,
psrdb-artifact.close_date AS closeDate,
psrdb-sfuser2.username AS assignedToUsername,
psrdb-sfuser2.full_name AS assignedToFullname,
psrdb-item.date_last_modified AS lastModifiedDate,
psrdb-artifact.estimated_effort AS estimatedEffort,
psrdb-artifact.actual_effort AS actualEffort,
psrdb-artifact.remaining_effort AS remainingEffort,
psrdb-artifact.points AS points,
psrdb-artifact.autosumming AS autosumming,
psrdb-item.version AS version
psrdb- FROM
psrdb-field_value field_value2,
psrdb-sfuser sfuser2,
psrdb-field_value field_value3,
psrdb-field_value field_value,
psrdb-field_value field_value4,
psrdb-item item,
psrdb-project project,
psrdb-relationship relationship,
psrdb-artifact artifact,
psrdb-sfuser sfuser,
psrdb-folder folder
psrdb- WHERE
psrdb-artifact.id=item.id
psrdb- AND item.folder_id=folder.id
psrdb- AND folder.project_id=project.id
psrdb- AND artifact.group_fv=field_value.id
psrdb- AND artifact.status_fv=field_value2.id
psrdb- AND artifact.category_fv=field_value3.id
psrdb- AND artifact.customer_fv=field_value4.id
psrdb- AND item.created_by_id=sfuser.id
psrdb- AND relationship.is_deleted=false
psrdb- AND relationship.relationship_type_name='ArtifactAssignment'
psrdb-
psrdb-   AND relationship.origin_id=sfuser2.id
psrdb- AND artifact.id=relationship.target_id
psrdb- AND item.is_deleted=false
psrdb- AND folder.is_deleted=false
psrdb- AND folder.project_id='proj1032'
psrdb- AND item.folder_id='tracker1213'
psrdb- AND folder.path='tracker.trackerName';

  QUERY PLAN
--
Nested Loop  (cost=0.00..117.32 rows=3 width=1272) (actual time=7.003..9.684 
rows=100 loops=1)
   -  Nested Loop  (cost=0.00..116.69 rows=2 width=1271) (actual 
time=6.987..8.820 rows=100 loops=1)
 Join Filter: ((item.created_by_id)::text = (sfuser.id)::text)
 -  Seq Scan on sfuser  (cost=0.00..7.65 rows=65 width=30) (actual 
time=0.013..0.053 rows=65 loops=1)
 -  Materialize  (cost=0.00..107.10 rows=2 width=1259) (actual 
time=0.005..0.100 rows=100 loops=65)
   -  Nested Loop  (cost=0.00..107.09 rows=2 width=1259) (actual 
time=0.307..5.667 rows=100 loops=1)
 -  Nested Loop  (cost=0.00..106.45 rows=2 width=1263) 
(actual time=0.294..4.841 rows=100 loops=1)
   -  Nested Loop  (cost=0.00..105.82 rows=2 
width=1267) (actual time=0.281..3.988 rows=100 loops=1)
 -  Nested Loop  (cost=0.00..105.18 rows=2 
width=1271) (actual time=0.239..3.132 rows=100 loops=1)
   -  Nested Loop  (cost=0.00..104.61 
rows=2 width=1259) (actual time=0.223..2.457 rows=100 loops=1)
 -  Nested Loop  (cost=0.00..16.55 
rows=1 width=1099) (actual time=0.095..0.096 rows=1 loops=1)
   -  Index Scan using 
project_pk on project  (cost=0.00..8.27 rows=1 width=1114) (actual 
time=0.039..0.039 rows=1 loops=1)
 Index Cond: 
((id)::text = 'proj1032'::text)

Re: [PERFORM] RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1

2013-05-06 Thread Tom Lane
Christoph Berg christoph.b...@credativ.de writes:
 We recently (finally) migrated an Request Tracker 3.4 database running
 on 8.1.19 to 9.2.4. The queries used by rt3.4 are sometimes weird, but
 8.1 coped without too much tuning. The schema looks like this:

The newer rowcount estimates are much further away from reality:

  Unique  (cost=1117.67..1118.46 rows=9 width=1115) (actual 
 time=82.646..85.695 rows=439 loops=1)

  Unique  (cost=784205.94..796940.08 rows=145533 width=1061) (actual 
 time=9710.683..9713.175 rows=439 loops=1)

Has the new DB been analyzed?  Maybe you had custom stats targets in
the old DB that didn't get copied to the new one?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Anne Rosset
Hi Igor,
The explain analyze is from when there was no load.

Artifact table: 251831 rows
Field_value table: 77378 rows
Mntr_subscription: 929071 rows
Relationship: 270478 row
Folder: 280356 rows
Item: 716465 rows
Sfuser: 5733 rows
Project: 1817 rows

8CPUs
RAM: 8GB

Postgres version: 9.0.13

 And no we haven't switched or tested yet  with pgbouncer. We would like to do 
a bit more analysis before trying this.

Thanks for your help,
Anne


-Original Message-
From: Igor Neyman [mailto:iney...@perceptron.com] 
Sent: Monday, May 06, 2013 7:06 AM
To: Anne Rosset; k...@rice.edu
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Deterioration in performance when query executed in 
multi threads



 -Original Message-
 From: Anne Rosset [mailto:aros...@collab.net]
 Sent: Friday, May 03, 2013 4:52 PM
 To: Igor Neyman; k...@rice.edu
 Cc: pgsql-performance@postgresql.org
 Subject: RE: [PERFORM] Deterioration in performance when query 
 executed in multi threads
 
 We saw a little bit improvement by increasing the min_pool_size but 
 again I see a bigvariation in the time the query is executed. Here is 
 the query:
 
 srdb= explain analyze SELECT
 psrdb-artifact.id AS id,
 psrdb-artifact.priority AS priority,
 psrdb-project.path AS projectPathString,
 psrdb-project.title AS projectTitle,
 psrdb-folder.project_id AS projectId,
 psrdb-folder.title AS folderTitle,
 psrdb-item.folder_id AS folderId,
 psrdb-item.title AS title,
 psrdb-item.name AS name,
 psrdb-field_value2.value AS status,
 psrdb-field_value3.value AS category,
 psrdb-sfuser.username AS submittedByUsername,
 psrdb-sfuser.full_name AS submittedByFullname,
 psrdb-sfuser2.username AS assignedToUsername,
 psrdb-sfuser2.full_name AS assignedToFullname,
 psrdb-item.version AS version,
 psrdb-CASE when ((SELECT
 psrdb(mntr_subscription.user_id AS userId
 psrdb( FROM
 psrdb(mntr_subscription mntr_subscription
 psrdb( WHERE
 psrdb(artifact.id=mntr_subscription.object_key
 psrdb( AND mntr_subscription.user_id='user1439'
 psrdb( )= 'user1439') THEN 'user1439' ELSE null END AS 
 monitoringUserId,
 psrdb-tracker.icon AS trackerIcon,
 psrdb-tracker.remaining_effort_disabled AS
 remainingEffortDisabled,
 psrdb-tracker.actual_effort_disabled AS actualEffortDisabled,
 psrdb-tracker.estimated_effort_disabled AS
 estimatedEffortDisabled
 psrdb- FROM
 psrdb-field_value field_value2,
 psrdb-field_value field_value,
 psrdb-sfuser sfuser2,
 psrdb-field_value field_value3,
 psrdb-field_value field_value4,
 psrdb-item item,
 psrdb-project project,
 psrdb-relationship relationship,
 psrdb-tracker tracker,
 psrdb-artifact artifact,
 psrdb-sfuser sfuser,
 psrdb-folder folder
 psrdb- WHERE
 psrdb-artifact.id=item.id
 psrdb- AND item.folder_id=folder.id
 psrdb- AND folder.project_id=project.id
 psrdb- AND artifact.group_fv=field_value.id
 psrdb- AND artifact.status_fv=field_value2.id
 psrdb- AND artifact.category_fv=field_value3.id
 psrdb- AND artifact.customer_fv=field_value4.id
 psrdb- AND item.created_by_id=sfuser.id
 psrdb- AND relationship.is_deleted=false
 psrdb- AND
 relationship.relationship_type_name='ArtifactAssignment'
 psrdb- AND relationship.origin_id=sfuser2.id
 psrdb- AND artifact.id=relationship.target_id
 psrdb- AND item.is_deleted=false
 psrdb- AND ((artifact.priority=3))
 psrdb- AND (project.path='projects.psr-pub-13')
 psrdb- AND item.folder_id=tracker.id ;
 
 QUERY PLAN
 
 --
 -
 -
 --
 -
 
  Nested Loop  (cost=0.00..272.62 rows=1 width=181) (actual
 time=805.934..1792.596 rows=177 loops=1)
 
-  Nested Loop  (cost=0.00..263.87 rows=1 width=167) (actual
 time=707.739..1553.348 rows=177 loops=1)
 
  -  Nested Loop  (cost=0.00..263.58 rows=1 width=153) (actual
 time=653.053..1496.839 rows=177 loops=1)
 
-  Nested Loop  (cost=0.00..262.50 rows=1 width=154) 
 (actual time=565.627..1385.667 rows=177 loops=1)
 
  -  Nested Loop  (cost=0.00..262.08 rows=1
 width=163) (actual time=565.605..1383.686 rows=177 loops
 =1)
-  Nested Loop  (cost=0.00..261.67 rows=1
 width=166) (actual time=530.928..1347.053 rows=177
  loops=1)
  -  Nested Loop  (cost=0.00..261.26
 rows=1 width=175) (actual time=530.866..1345.032
 rows=177 loops=1)
-  Nested Loop
 (cost=0.00..260.84 rows=1 width=178) (actual 

Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


 -Original Message-
 From: Anne Rosset [mailto:aros...@collab.net]
 Sent: Monday, May 06, 2013 1:01 PM
 To: Igor Neyman; k...@rice.edu
 Cc: pgsql-performance@postgresql.org
 Subject: RE: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 Hi Igor,
 The explain analyze is from when there was no load.
 
 Artifact table: 251831 rows
 Field_value table: 77378 rows
 Mntr_subscription: 929071 rows
 Relationship: 270478 row
 Folder: 280356 rows
 Item: 716465 rows
 Sfuser: 5733 rows
 Project: 1817 rows
 
 8CPUs
 RAM: 8GB
 
 Postgres version: 9.0.13
 
  And no we haven't switched or tested yet  with pgbouncer. We would
 like to do a bit more analysis before trying this.
 
 Thanks for your help,
 Anne
 
 


Anne,

Just as a quick test, try in the psql session/connection locally change 
enable_nestloop setting and run your query:

set enable_nestloop = off;
explain analyze your_query;

just to see if different execution plan will be better and optimizer needs to 
be convinced to use this different plan.
Please post what you get with the modified setting.

Also, what is the setting for effective_cache_size in postgresql.conf?

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Thomas Kellerer
 Sent: Monday, May 06, 2013 1:12 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 Anne Rosset, 06.05.2013 19:00:
  Postgres version: 9.0.13
 
  Work_mem is set to 64MB
  Shared_buffer to 240MB
  Segment_size is 1GB
  Wal_buffer is 10MB
 
  Artifact table: 251831 rows
  Field_value table: 77378 rows
  Mntr_subscription: 929071 rows
  Relationship: 270478 row
  Folder: 280356 rows
  Item: 716465 rows
  Sfuser: 5733 rows
  Project: 1817 rows
 
  8CPUs
  RAM: 8GB
 
 
 With 8GB RAM you should be able to increase shared_buffer to 1GB or
 maybe even higher especially if this is a dedicated server.
 240MB is pretty conservative for a server with that amount of RAM
 (unless you have many other applications running on that box)
 
 Also what are the values for
 
 cpu_tuple_cost
 seq_page_cost
 random_page_cost
 effective_cache_size
 
 What kind of harddisk is in the server? SSD? Regular ones (spinning
 disks)?
 
 
 


Also, with 8 CPUs, your max connection_pool size shouldn't much bigger than 20.

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Matt Clarkson

 
 (2) seems fairly hard generically, since we'd have to keep track of
 the tids returned from the IndexScan to allow us to switch to a
 different plan and avoid re-issuing rows that we've already returned.
 But maybe if we adapted the IndexScan plan type so that it adopted a
 more page oriented approach internally, it could act like a
 bitmapscan. Anyway, that would need some proof that it would work and
 sounds like a fair task.
 
 (1) sounds more easily possible and plausible. At the moment we have
 enable_indexscan = off. If we had something like
 plan_cost_weight_indexscan = N, we could selectively increase the cost
 of index scans so that they would be less likely to be selected. i.e.
 plan_cost_weight_indexscan = 2 would mean an indexscan would need to
 be half the cost of any other plan before it was selected. (parameter
 name selected so it could apply to all parameter types). The reason to
 apply this weighting would be to calculate risk adjusted cost not
 just estimated cost.
 
 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


Another option would be for the bulk insert/update/delete to track the
distribution stats as the operation progresses and if it detects that it
is changing the distribution of data beyond a certain threshold it would
update the pg stats accordingly.

-- 
Matt Clarkson
Catalyst.Net Limited




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Anne Rosset
Hi Thomas,
It is not a dedicated box (we have Jboss running too).

cpu_tuple_cost  | 0.01
seq_page_cost   | 1
random_page_cost| 4
effective_cache_size| 512MB

We have the data directory on nfs 
(rw,intr,hard,tcp,rsize=32768,wsize=32768,nfsvers=3,tcp). Note that we have 
also tested putting the data directory on local disk and didn't find a big 
improvement.

Thanks,
Anne



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Thomas Kellerer
Sent: Monday, May 06, 2013 10:12 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Deterioration in performance when query executed in 
multi threads

Anne Rosset, 06.05.2013 19:00:
 Postgres version: 9.0.13

 Work_mem is set to 64MB
 Shared_buffer to 240MB
 Segment_size is 1GB
 Wal_buffer is 10MB

 Artifact table: 251831 rows
 Field_value table: 77378 rows
 Mntr_subscription: 929071 rows
 Relationship: 270478 row
 Folder: 280356 rows
 Item: 716465 rows
 Sfuser: 5733 rows
 Project: 1817 rows

 8CPUs
 RAM: 8GB


With 8GB RAM you should be able to increase shared_buffer to 1GB or maybe even 
higher especially if this is a dedicated server.
240MB is pretty conservative for a server with that amount of RAM (unless you 
have many other applications running on that box)

Also what are the values for

cpu_tuple_cost
seq_page_cost
random_page_cost
effective_cache_size

What kind of harddisk is in the server? SSD? Regular ones (spinning disks)?





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Anne Rosset
Hi Igor,
Result with enable_nestloop off:




---
 Hash Join  (cost=49946.49..58830.02 rows=1 width=181) (actual 
time=2189.474..2664.888 rows=180 loops=1)
   Hash Cond: ((item.created_by_id)::text = (sfuser.id)::text)
   -  Hash Join  (cost=49470.50..58345.53 rows=1 width=167) (actual 
time=1931.870..2404.745 rows=180 loops=1)
 Hash Cond: ((relationship.origin_id)::text = (sfuser2.id)::text)
 -  Hash Join  (cost=48994.51..57869.52 rows=1 width=153) (actual 
time=1927.603..2400.334 rows=180 loops=1)
   Hash Cond: ((relationship.target_id)::text = (artifact.id)::text)
   -  Seq Scan on relationship  (cost=0.00..7973.38 rows=240435 
width=19) (actual time=0.036..492.442 rows=241285 loops=1)
 Filter: ((NOT is_deleted) AND 
((relationship_type_name)::text = 'ArtifactAssignment'::text))
   -  Hash  (cost=48994.49..48994.49 rows=1 width=154) (actual 
time=1858.350..1858.350 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 -  Hash Join  (cost=47260.54..48994.49 rows=1 width=154) 
(actual time=1836.495..1858.151 rows=180 loops=1)
   Hash Cond: ((field_value4.id)::text = 
(artifact.customer_fv)::text)
   -  Seq Scan on field_value field_value4  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=22.104..30.694 rows=77378 
loops=1)
   -  Hash  (cost=47260.52..47260.52 rows=1 width=163) 
(actual time=1814.005..1814.005 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 35kB
 -  Hash Join  (cost=45526.57..47260.52 rows=1 
width=163) (actual time=1790.908..1813.780 rows=180 loops=1)
   Hash Cond: ((field_value3.id)::text = 
(artifact.category_fv)::text)
   -  Seq Scan on field_value field_value3 
 (cost=0.00..1443.78 rows=77378 width=15) (actual time=0.002..9.262 rows=77378 
loops=1)
   -  Hash  (cost=45526.55..45526.55 
rows=1 width=166) (actual time=1790.505..1790.505 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory 
Usage: 36kB
 -  Hash Join  
(cost=43792.60..45526.55 rows=1 width=166) (actual time=1768.362..1790.304 
rows=180 loops=1)
   Hash Cond: 
((field_value.id)::text = (artifact.group_fv)::text)
   -  Seq Scan on field_value  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=0.002..8.687 rows=77378 
loops=1)
   -  Hash  
(cost=43792.58..43792.58 rows=1 width=175) (actual time=1767.928..1767.928 
rows=180 loops=1)
 Buckets: 1024  
Batches: 1  Memory Usage: 38kB
 -  Hash Join  
(cost=42058.63..43792.58 rows=1 width=175) (actual time=1499.822..1767.734 
rows=180 loops=1)
   Hash Cond: 
((field_value2.id)::text = (artifact.status_fv)::text)
   -  Seq Scan on 
field_value field_value2  (cost=0.00..1443.78 rows=77378 width=15) (actual 
time=0.002..261.082 rows=77378 loops=1)
   -  Hash  
(cost=42058.61..42058.61 rows=1 width=178) (actual time=1492.707..1492.707 
rows=180 loops=1)
 Buckets: 
1024  Batches: 1  Memory Usage: 38kB
 -  Hash 
Join  (cost=18039.59..42058.61 rows=1 width=178) (actual 
time=1175.659..1492.482 rows=180 loops=1)
   Hash 
Cond: ((item.id)::text = (artifact.id)::text)
   -  
Hash Join  (cost=12112.31..36130.95 rows=30 width=128) (actual 
time=304.035..702.745 rows=1015 loops=1)

 Hash Cond: ((item.folder_id)::text = (folder.id)::text)

 -  Seq Scan on item  (cost=0.00..21381.10 rows=703322 width=58) (actual 
time=0.020..382.847 rows=704018 loops=1)

   Filter: (NOT is_deleted)

 -  

Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Gavin Flower

Anne, please read the comment at the bottom of this post!


On 07/05/13 09:46, Anne Rosset wrote:

Hi Thomas,
It is not a dedicated box (we have Jboss running too).

cpu_tuple_cost  | 0.01
seq_page_cost   | 1
random_page_cost| 4
effective_cache_size| 512MB

We have the data directory on nfs 
(rw,intr,hard,tcp,rsize=32768,wsize=32768,nfsvers=3,tcp). Note that we have 
also tested putting the data directory on local disk and didn't find a big 
improvement.

Thanks,
Anne



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Thomas Kellerer
Sent: Monday, May 06, 2013 10:12 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Deterioration in performance when query executed in 
multi threads

Anne Rosset, 06.05.2013 19:00:

Postgres version: 9.0.13


Work_mem is set to 64MB
Shared_buffer to 240MB
Segment_size is 1GB
Wal_buffer is 10MB

Artifact table: 251831 rows
Field_value table: 77378 rows
Mntr_subscription: 929071 rows
Relationship: 270478 row
Folder: 280356 rows
Item: 716465 rows
Sfuser: 5733 rows
Project: 1817 rows

8CPUs
RAM: 8GB


With 8GB RAM you should be able to increase shared_buffer to 1GB or maybe even 
higher especially if this is a dedicated server.
240MB is pretty conservative for a server with that amount of RAM (unless you 
have many other applications running on that box)

Also what are the values for

cpu_tuple_cost
seq_page_cost
random_page_cost
effective_cache_size

What kind of harddisk is in the server? SSD? Regular ones (spinning disks)?


The policy on this list is to add comments at the bottom, so people can 
first read what you are replying to.


Though you can intersperse comments where that is apprporiate.


Cheers,
Gavin


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread mark . kirkwood
 Simon Riggs wrote:

 Patch works and improves things, but we're still swamped by the block
 accesses via the index.

Which *might* be enough to stop it making the server go unresponsive,
we'll look at the effect of this in the next few days, nice work!


 Which brings me back to Mark's original point, which is that we are
 x100 times slower in this case and it *is* because the choice of
 IndexScan is a bad one for this situation.

 After some thought on this, I do think we need to do something about
 it directly, rather than by tuning infrastructire (as I just
 attempted). The root cause here is that IndexScan plans are sensitive
 to mistakes in data distribution, much more so than other plan types.

 The two options, broadly, are to either

 1. avoid IndexScans in the planner unless they have a *significantly*
 better cost. At the moment we use IndexScans if cost is lowest, even
 if that is only by a whisker.

 2. make IndexScans adaptive so that they switch to other plan types
 mid-way through execution.

 (2) seems fairly hard generically, since we'd have to keep track of
 the tids returned from the IndexScan to allow us to switch to a
 different plan and avoid re-issuing rows that we've already returned.
 But maybe if we adapted the IndexScan plan type so that it adopted a
 more page oriented approach internally, it could act like a
 bitmapscan. Anyway, that would need some proof that it would work and
 sounds like a fair task.

 (1) sounds more easily possible and plausible. At the moment we have
 enable_indexscan = off. If we had something like
 plan_cost_weight_indexscan = N, we could selectively increase the cost
 of index scans so that they would be less likely to be selected. i.e.
 plan_cost_weight_indexscan = 2 would mean an indexscan would need to
 be half the cost of any other plan before it was selected. (parameter
 name selected so it could apply to all parameter types). The reason to
 apply this weighting would be to calculate risk adjusted cost not
 just estimated cost.


I'm thinking that a variant of (2) might be simpler to inplement:

(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows 
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.

regards

Mark



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


From: Anne Rosset [aros...@collab.net]
Sent: Monday, May 06, 2013 5:51 PM
To: Igor Neyman; k...@rice.edu
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Deterioration in performance when query executed in  
multi threads

Hi Igor,
Result with enable_nestloop off:




---
 Hash Join  (cost=49946.49..58830.02 rows=1 width=181) (actual 
time=2189.474..2664.888 rows=180 loops=1)
   Hash Cond: ((item.created_by_id)::text = (sfuser.id)::text)
   -  Hash Join  (cost=49470.50..58345.53 rows=1 width=167) (actual 
time=1931.870..2404.745 rows=180 loops=1)
 Hash Cond: ((relationship.origin_id)::text = (sfuser2.id)::text)
 -  Hash Join  (cost=48994.51..57869.52 rows=1 width=153) (actual 
time=1927.603..2400.334 rows=180 loops=1)
   Hash Cond: ((relationship.target_id)::text = (artifact.id)::text)
   -  Seq Scan on relationship  (cost=0.00..7973.38 rows=240435 
width=19) (actual time=0.036..492.442 rows=241285 loops=1)
 Filter: ((NOT is_deleted) AND 
((relationship_type_name)::text = 'ArtifactAssignment'::text))
   -  Hash  (cost=48994.49..48994.49 rows=1 width=154) (actual 
time=1858.350..1858.350 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 -  Hash Join  (cost=47260.54..48994.49 rows=1 width=154) 
(actual time=1836.495..1858.151 rows=180 loops=1)
   Hash Cond: ((field_value4.id)::text = 
(artifact.customer_fv)::text)
   -  Seq Scan on field_value field_value4  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=22.104..30.694 rows=77378 
loops=1)
   -  Hash  (cost=47260.52..47260.52 rows=1 width=163) 
(actual time=1814.005..1814.005 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 35kB
 -  Hash Join  (cost=45526.57..47260.52 rows=1 
width=163) (actual time=1790.908..1813.780 rows=180 loops=1)
   Hash Cond: ((field_value3.id)::text = 
(artifact.category_fv)::text)
   -  Seq Scan on field_value field_value3 
 (cost=0.00..1443.78 rows=77378 width=15) (actual time=0.002..9.262 rows=77378 
loops=1)
   -  Hash  (cost=45526.55..45526.55 
rows=1 width=166) (actual time=1790.505..1790.505 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory 
Usage: 36kB
 -  Hash Join  
(cost=43792.60..45526.55 rows=1 width=166) (actual time=1768.362..1790.304 
rows=180 loops=1)
   Hash Cond: 
((field_value.id)::text = (artifact.group_fv)::text)
   -  Seq Scan on field_value  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=0.002..8.687 rows=77378 
loops=1)
   -  Hash  
(cost=43792.58..43792.58 rows=1 width=175) (actual time=1767.928..1767.928 
rows=180 loops=1)
 Buckets: 1024  
Batches: 1  Memory Usage: 38kB
 -  Hash Join  
(cost=42058.63..43792.58 rows=1 width=175) (actual time=1499.822..1767.734 
rows=180 loops=1)
   Hash Cond: 
((field_value2.id)::text = (artifact.status_fv)::text)
   -  Seq Scan on 
field_value field_value2  (cost=0.00..1443.78 rows=77378 width=15) (actual 
time=0.002..261.082 rows=77378 loops=1)
   -  Hash  
(cost=42058.61..42058.61 rows=1 width=178) (actual time=1492.707..1492.707 
rows=180 loops=1)
 Buckets: 
1024  Batches: 1  Memory Usage: 38kB
 -  Hash 
Join  (cost=18039.59..42058.61 rows=1 width=178) (actual 
time=1175.659..1492.482 rows=180 loops=1)
   Hash 
Cond: ((item.id)::text = (artifact.id)::text)
   -  
Hash Join  (cost=12112.31..36130.95 rows=30 width=128) (actual 
time=304.035..702.745 rows=1015 loops=1)

 Hash Cond: ((item.folder_id)::text = (folder.id)::text)

 -  Seq Scan on item