[PERFORM] Can postgresql plan a query using multiple CPU cores?

2017-05-22 Thread Clemens Eisserer
Hi,

I have a letancy-sensitive legacy application, where the time consumed
by query planning was always causing some headaches.
Currently it is running on postgresql-8.4 - will postgresql-10 support
generating plans using multiple CPU cores to reduce the time required
to generate a single plan?

Thank you in advance and best regards, Clemens


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-13 Thread Clemens Eisserer
Hello Noah,

Thanks a lot for your feedback and explanations.

 Since you have 15+ tables at the top level, the genetic query optimizer should
 be kicking in and delivering a plan in reasonable time, albeit with plan
 quality hazards.  There's a danger zone when the deterministic planner is
 still in effect but {from,join}_collapse_limit have limited the scope of its
 investigation.  If you're in that zone and have not hand-tailored your
 explicit join order, poor plans are unsurprising.  What exact configuration
 changes are you using?

Basically only the changes, suggested here a year ago,  which made the
problem go away for less complex queries:

geqo_threshold = 20
from_collapse_limit = 13
join_collapse_limit = 13


 Hundreds of rows, no.  Consider this example:
 IN(...):
  Total runtime: 2200.767 ms

 ANY(ARRAY(...)):
  Total runtime: 11748.348 ms

In case there is an index on C, the resulting index scan is, even with
1000 elements, 3 times faster on my Notebook.
However, both queries execute in next-to-no time (15 vs 5ms).

 Filing a bug report with the content you've already posted would not add much,
 but a self-contained test case could prove useful.  Many of the deficiencies
 that can make ANY(ARRAY(...)) win do represent unimplemented planner
 intelligence more than bugs.

 Incidentally, you can isolate whether ANY(ARRAY(...))'s advantage comes solely
 from suppressing the subquery collapse.  Keep IN but tack OFFSET 0 onto
 the subquery.  If this gives the same performance as ANY(ARRAY(...)), then the
 subquery-collapse suppression was indeed the source of advantage.

I see your point, some dumb logic to replace IN with ANY(ARRAY
wouldn't always yield better results.
I'll try to come up with a self-containing testcase.

Thanks again, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-09 Thread Clemens Eisserer
Hi,

I would be really grateful for feedback regardding this issue. Tom?

Should Ifile a bug-report about the optimizer trying too hard to
collapse the subquery and therefor generating a bad plan?
Its my understanding that a IN shouldn't perform any worse than ANY on
an ARRAY, right?

Thank you in advance, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-04 Thread Clemens Eisserer
Hi again,

 That doesn't sound like a tremendously good idea to me.
 Could you elaborate on the downsides of this approach a bit?

Any other thoughts about the pro/cons replacing IN(subquery) with
=ANY(ARRAY(subquery))?
Are there patological cases, except when the subquery returns a huge
amount of rows?

Should Ifile a bug-report about the optimizer trying too hard to
collapse the subquery and therefor generating a bad plan?

Thank you in advance, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-01 Thread Clemens Eisserer
Hi,

I am using postgresql as database for a hibernate based java oltp
project and as in previous projects am totally impressed by
postgresql's robustness, performance and feature-richness. Thanks for
this excellent piece of software.

Quite often Hibernate ends up generating queries with a lot of joins
which usually works well, except for queries which load some
additional data based on a previous query (SUBSELECT collections),
which look like:

select . from table1 ... left outer join table 15  WHERE
table1.id IN (select id  join table16 ... join table20 WHERE
table20.somevalue=?)

Starting with some amount of joins, the optimizer starts to do quite
suboptimal things like hash-joining huge tables where selctivity would
very low.
I already raised join_collapse_limit and from_collapse_limit, but
after a certain point query planning starts to become very expensive.

However, when using  =ANY(ARRAY(select ...)) instead of IN the
planner seems to do a lot better, most likely because it treats the
subquery as a black-box that needs to be executed independently. I've
hacked hibernate a bit to use ANY+ARRAY, and it seems to work a lot
better than using IN.

However, I am a bit uncertain:
- Is it safe to use ANY(ARRAY(select ...)) when I know the sub-query
will only return a small amount (0-100s) of rows?
- Shouldn't the optimizer be a bit smarter avoiding optimizing this
case in the first place, instead of bailing out later? Should I file a
bug-report about this problem?

Thank you in advance, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-01 Thread Clemens Eisserer
Hi Tom,

Thanks for your reply.

 What PG version are we talking about here?
For development I use 9.1.3, on the production server is 8.4.7 -
happens with both cases.

 That doesn't sound like a tremendously good idea to me.
Could you elaborate on the downsides of this approach a bit?

 But with
 so few details, it's hard to comment intelligently.
 Can you provide a concrete test case?

A self contained testcase would take some time to create (and list
members willing to configure and run), so I hope a query as well as an
explain-analyze run will provide more information (done with 9.1.3):
http://pastebin.com/BGRdAPg2

Its kind of the worst-worst case which I will improve later (way too
much relations loaded through join-fetching), but its quite a good way
to show the issue. Replacing the IN with a ANY(ARRAY()) already yields
a way better plan.

Thank you in advance, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash semi join caused by IN (select ...)

2011-05-18 Thread Clemens Eisserer
Hi,

Does anybody know why the planner treats = ANY(ARRAY(select ...))
differently than IN(select ...)?
Which one is preferable, when I already have a lot of joins?

Thanks, Clemens

2011/5/17 Clemens Eisserer linuxhi...@gmail.com:
 Hi,

 select  from t1 left join t2  WHERE id IN (select )

 Does it work as expected with one less join?  If so, try increasing
 join_collapse_limit ...

 That did the trick - thanks a lot. I only had to increase
 join_collapse_limit a bit and now get an almost perfect plan.
 Instead of hash-joining all the data, the planner generates
 nested-loop-joins with index only on the few rows I fetch.

 Using = ANY(array(select... )) also seems to work, I wonder which one
 works better. Does ANY(ARRAY(...)) force the optimizer to plan the
 subquery seperated from the main query?

 Thanks, Clemens


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash semi join caused by IN (select ...)

2011-05-17 Thread Clemens Eisserer
Hi,

 select  from t1 left join t2  WHERE id IN (select )

 Does it work as expected with one less join?  If so, try increasing
 join_collapse_limit ...

That did the trick - thanks a lot. I only had to increase
join_collapse_limit a bit and now get an almost perfect plan.
Instead of hash-joining all the data, the planner generates
nested-loop-joins with index only on the few rows I fetch.

Using = ANY(array(select... )) also seems to work, I wonder which one
works better. Does ANY(ARRAY(...)) force the optimizer to plan the
subquery seperated from the main query?

Thanks, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] hash semi join caused by IN (select ...)

2011-05-16 Thread Clemens Eisserer
Hi,

I have a quite complex, performance sensitive query in a system with a
few (7) joins:
select  from t1 left join t2  WHERE id IN (select )

For this query the planner evaluates the IN with a hash semi join last,
and all the joining is done by hash joins for all rows contained in t1.

However when I specify the ids manually (IN (1, 2, 3, 4, 5) the
planner first does an index lookup on the primary key column id,
and subsequently does nested loop joins using an index on t2 - which
gives way better results.

Is there any way to guide the planner to evaluate the IN condition
first, instead of last?
Why is the planner behaving this way? (postgresql 8.4.??)

Thank you in advance, Clemens


Query plan with IN(select):

Sort  (cost=165.77..165.77 rows=2 width=16974) (actual
time=13.459..13.460 rows=2 loops=1)
   Sort Key: this_.id
   Sort Method:  quicksort  Memory: 26kB
   -  Hash Semi Join  (cost=123.09..165.76 rows=2 width=16974)
(actual time=12.741..13.432 rows=2 loops=1)
 Hash Cond: (this_.id = kladdenent0_.id)
 -  Hash Left Join  (cost=119.17..160.90 rows=348
width=16974) (actual time=8.765..13.104 rows=342 loops=1)
   Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id)
   -  Hash Left Join  (cost=118.10..155.08 rows=348
width=16454) (actual time=8.724..12.412 rows=342 loops=1)
 Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id)
 -  Hash Left Join  (cost=117.06..152.71 rows=348
width=15934) (actual time=8.660..11.786 rows=342 loops=1)
   Hash Cond: (this_.lehrerid = pilot5_.id)
   -  Hash Left Join  (cost=96.66..130.46
rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1)
 Hash Cond: (this_.nachid = flugplatz6_.id)
 -  Hash Left Join
(cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429
rows=342 loops=1)
   Hash Cond: (this_.flugzeugid =
flugzeug2_.id)
   -  Hash Left Join
(cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374
rows=342 loops=1)
 Hash Cond: (this_.pilotid
= pilot7_.id)
 -  Hash Left Join
(cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548
rows=342 loops=1)
   Hash Cond:
(this_.vonid = flugplatz8_.id)
   -  Seq Scan on
startkladde this_  (cost=0.00..14.48 rows=348 width=117) (actual
time=0.004..0.074 rows=342 loops=1)
   -  Hash
(cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79
loops=1)
 -  Seq Scan
on flugplatz flugplatz8_  (cost=0.00..1.79 rows=79 width=542) (actual
time=0.003..0.010 rows=79 loops=1)
 -  Hash
(cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938
rows=375 loops=1)
   -  Seq Scan on
pilot pilot7_  (cost=0.00..15.73 rows=373 width=7022) (actual
time=0.006..0.769 rows=375 loops=1)
   -  Hash  (cost=51.43..51.43
rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1)
 -  Seq Scan on flugzeug
flugzeug2_  (cost=0.00..51.43 rows=1543 width=689) (actual
time=0.006..1.615 rows=1543 loops=1)
 -  Hash  (cost=1.79..1.79 rows=79
width=542) (actual time=0.031..0.031 rows=79 loops=1)
   -  Seq Scan on flugplatz
flugplatz6_  (cost=0.00..1.79 rows=79 width=542) (actual
time=0.003..0.011 rows=79 loops=1)
   -  Hash  (cost=15.73..15.73 rows=373
width=7022) (actual time=2.236..2.236 rows=375 loops=1)
 -  Seq Scan on pilot pilot5_
(cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781
rows=375 loops=1)
 -  Hash  (cost=1.02..1.02 rows=2 width=520)
(actual time=0.005..0.005 rows=2 loops=1)
   -  Seq Scan on bmintype bmintype4_
(cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2
loops=1)
   -  Hash  (cost=1.03..1.03 rows=3 width=520) (actual
time=0.004..0.004 rows=3 loops=1)
 -  Seq Scan on flugzeugtype flugzeugty3_
(cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3
loops=1)
 -  Hash  (cost=3.90..3.90 rows=2 width=4) (actual
time=0.239..0.239 rows=2 loops=1)
   -  Limit  (cost=0.00..3.88 rows=2 width=4) (actual
time=0.202..0.236 rows=2 loops=1)
 -  Index Scan using startkladde_pkey on
startkladde kladdenent0_  (cost=0.00..56.24 rows=29 width=4) (actual
time=0.200..0.233 rows=2 loops=1)

Re: [PERFORM] Slow query execution over high latency network

2011-02-21 Thread Clemens Eisserer
Hi Andrej,

Thanks a lot for taking a loot at the tcpdump data.

 I just had a brief glance over your tcpdump data ... are you sure
 hibernate isn't using a cursor to fetch each row individually?

Pretty sure, yes. I get the same performance when executing the
hibernate-generated query using JDBC,
even setting a large fetch-size doesn't improve the situation:

   st.setFetchSize(100);
   st.setFetchDirection(ResultSet.FETCH_FORWARD);

Could it be jdbc driver struggles with the huge number of columns (~500)?

Thanks, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query execution over high latency network

2011-02-19 Thread Clemens Eisserer
Hello,

When executing huge (10kb), hibernate-generated queries I noticed that
when executed remotly over  high-latency network (ping to server
200-400ms), the query takes a lot longer to complete.

When the query is executed remotly (psql or jdbc) it takes 1800ms to
execute, when I issue the query in an ssh terminal, I see the results
almost immediatly.
So although I should see the same latency over ssh , its way faster over ssh.
The transmitted data is small (the wireshard-file has 22kb, attached),
and even though the umts-network is high-latency its relativly high
bandwith (~512kbit/s up, ~2mbit/s down).

Any idea whats causing this? Maybe too small buffers somewhere?
For me it poses problem, because I am working on a 2-Tier java
application which should connect to postgres remotly - however with
every more complex query taking 2s its almost unuseable over wireless
networks (umts).

Thank you in advance, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query execution over high latency network

2011-02-19 Thread Clemens Eisserer
Hi Pierre,

Thanks a lot for your reply.

 Your attached file didn't come through.
Hmm, ok.
I uploaded the wireshark-log to: http://93.190.88.182/psql_large_query.bin

 - different execution plan between your app and ssh+psql, which can happen
 if the planning uses/doesn't use your specific parameters,

Its both times (ssh and remote psql) exactly the same query - I copied
the SQL generated by hibernate and executed it in psql. And although
it has many columns (~210) the result-set is only about 5 rows and am
sure not larger than a few kb.

 - dumb client versus smart client :
 smart client : use the protocol which sends the query text + parameters +
 prepare + execute in 1 TCP message, 1 ping, postgres works, 1 ping, get
 reply

So are both psql and the jdbc driver dumb clients?
Or are there only buffers somewhere too small and therefor data is
sent in many smal batches.
I thought one query would more or less equal to one roundtrip, right?
Maybe I should ask on the pgsql-jdbc list.

 If you want to ensure the fastest response time you need to ensure than one
 user action (click) needs one and only one roundtrip to the server before
 all the results are displayed
 One solution could be to put the database handling stuff inside an
 appserver, make your app communicate to it with a low-overhead RPC protocol
 (ie, not raw uncompressed XML) that minimizes the number of roudtrips, and
 compresses data thoroughly.

I use well tuned hibernate fetch profiles to ensure fewest possible roundtrips,
however I am not getting paid well enough to create an appserver tier ;)

Thanks, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on new 64bit server compared to my 32bit desktop

2010-08-30 Thread Clemens Eisserer
Hi,

 This isn't an older Opteron, its 6 core, 6MB L3 cache Istanbul.  Its not
 the newer stuff either.

 Everything before Magny Cours is now an older Opteron from my perspective.

The 6-cores are identical to Magny Cours (except that Magny Cours has
two of those beast in one package).

- Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Are Indices automatically generated for primary keys?

2010-08-18 Thread Clemens Eisserer
Hi,

Are indices for columns marked with PRIMARY KEY automatically generated by
postgresql, or do I have to do it manually?
The question might seem dumb, I ask because I remember from working with
MySQL it generates indices automatically in this case.

Thank you in advance, Clemens


Re: [PERFORM] Are Indices automatically generated for primary keys?

2010-08-18 Thread Clemens Eisserer
Hi,

 they are generated automatically.

Thanks depesz!
The reason why I asked was because pgAdmin doesn't display the
automatically created indices, which confused me.

Thanks, Clemens

PS:

 If you look at the documentation page for CREATE TABLE, you'll see
 the following . but if you use your
 browser to search for PRIMARY KEY within the page, it's not too hard
 to find.
Its quite harsh to imply I didn't look for documentation.
I looked at the Indexes and ORDER BY which doesn't mention it, or
I've overlook it.
Doesn't make a difference anyway.

 Also, if you create a primary key or a unique constraint on a table,
 you should see a notice informing you of the creation of the index,
 and its name.
I use Hibernate, and it generates the DDL for me.
Even with debug/DDL/SQL-output enabled, I don't get any hint that an
index was created.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Question about explain-command...

2006-05-10 Thread Clemens Eisserer

Hello,

I just discovered the explain command and well ... have some (for you
of course very stupid) questions.

I do a quite large (for my taste) join, the query looks like the following:
SELECT DISTINCT customer.email AS cemail, customer.key AS ckey,
customer.anrede AS canrede, customer.strasse AS cstrasse, customer.plz
AS cplz, customer.ort AS cort, customer.vorname AS cvorname,
customer.nachname AS cnachname , custtype.name AS tname, customer.land
AS cland, customer.datanotvalid AS cdatanvalid FROM customer LEFT JOIN
sells ON customer.key=sells.custid LEFT JOIN goods ON
sells.goodsid=goods.key LEFT JOIN custtype ON
customer.custgroup=custtype.key LEFT JOIN prodtype ON
prodtype.key=goods.prodgroup WHERE customer.nachname LIKE  '%name%';

All primary keys are indixed, and this is what explain tells me:
Unique  (cost=15.67..16.69 rows=34 width=115)
  -  Sort  (cost=15.67..15.75 rows=34 width=115)
Sort Key: customer.email, customer.key, customer.anrede, customer.str
asse, customer.plz, customer.ort, customer.vorname, customer.nachname, custtype.
name, customer.land, customer.datanotvalid
-  Hash Left Join  (cost=6.16..14.80 rows=34 width=115)
  Hash Cond: (outer.prodgroup = inner.key)
  -  Hash Left Join  (cost=4.97..13.10 rows=34 width=119)
Hash Cond: (outer.custgroup = inner.key)
-  Hash Left Join  (cost=3.88..11.49 rows=34 width=111)
  Hash Cond: (outer.goodsid = inner.key)
  -  Hash Left Join  (cost=1.98..9.08
rows=34 width=111)
Hash Cond: (outer.key = inner.custid)
-  Seq Scan on customer 
(cost=0.00..6.10 rows=34 width=107)

  Filter: ((nachname)::text ~~
'%au%'::text)
-  Hash  (cost=1.78..1.78 rows=78 width=8)
  -  Seq Scan on sells 
(cost=0.00..1.78 rows=78 width=8)

  -  Hash  (cost=1.72..1.72 rows=72 width=8)
-  Seq Scan on goods 
(cost=0.00..1.72 rows=72 width=8)

-  Hash  (cost=1.08..1.08 rows=8 width=16)
  -  Seq Scan on custtype  (cost=0.00..1.08
rows=8 width=16)
  -  Hash  (cost=1.15..1.15 rows=15 width=4)
-  Seq Scan on prodtype  (cost=0.00..1.15 rows=15 width=4)


What does the hash-lines mean, does that mean my query does not use
the indices at all?
Why are some table-names and some column-names surrounded by '  '?
Are they threated as text-columns?
I have to admit that the tables are just filled with test-data so the
analyzer may take just a very simple way since almost no data is in...

lg Clemens

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings