Re: [PERFORM] Maximum number of sequences that can be created

2012-05-13 Thread Віталій Тимчишин
2012/5/11 Robert Klemme shortcut...@googlemail.com

 On Fri, May 11, 2012 at 12:50 PM, Vidhya Bondre meetvbon...@gmail.com
 wrote:
  Is there any max limit set on sequences that can be created on the
 database
  ? Also would like to know if we create millions of sequences in a single
 db
  what is the downside of it.


The sequences AFAIK are accounted as relations. Large list of relations may
slowdown different system utilities like vacuuming (or may not, depends on
queries and indexes on pg_class).



 On the contrary: what would be the /advantage/ of being able to create
 millions of sequences?  What's the use case?


We are using sequences as statistics counters - they produce almost no
performance impact and we can tolerate it's non-transactional nature. I can
imaging someone who wants to have a  sequence per user or other relation
row.

-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Maximum number of sequences that can be created

2012-05-13 Thread Robert Klemme
On Sun, May 13, 2012 at 10:12 AM, Віталій Тимчишин tiv...@gmail.com wrote:
 2012/5/11 Robert Klemme shortcut...@googlemail.com

 On the contrary: what would be the /advantage/ of being able to create
 millions of sequences?  What's the use case?

 We are using sequences as statistics counters - they produce almost no
 performance impact and we can tolerate it's non-transactional nature. I can
 imaging someone who wants to have a  sequence per user or other relation
 row.

I can almost see the point. But my natural choice in that case would
be a table with two columns.  Would that actually be so much less
efficient? Of course you'd have fully transactional behavior and thus
locking.

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-13 Thread Clemens Eisserer
Hello Noah,

Thanks a lot for your feedback and explanations.

 Since you have 15+ tables at the top level, the genetic query optimizer should
 be kicking in and delivering a plan in reasonable time, albeit with plan
 quality hazards.  There's a danger zone when the deterministic planner is
 still in effect but {from,join}_collapse_limit have limited the scope of its
 investigation.  If you're in that zone and have not hand-tailored your
 explicit join order, poor plans are unsurprising.  What exact configuration
 changes are you using?

Basically only the changes, suggested here a year ago,  which made the
problem go away for less complex queries:

geqo_threshold = 20
from_collapse_limit = 13
join_collapse_limit = 13


 Hundreds of rows, no.  Consider this example:
 IN(...):
  Total runtime: 2200.767 ms

 ANY(ARRAY(...)):
  Total runtime: 11748.348 ms

In case there is an index on C, the resulting index scan is, even with
1000 elements, 3 times faster on my Notebook.
However, both queries execute in next-to-no time (15 vs 5ms).

 Filing a bug report with the content you've already posted would not add much,
 but a self-contained test case could prove useful.  Many of the deficiencies
 that can make ANY(ARRAY(...)) win do represent unimplemented planner
 intelligence more than bugs.

 Incidentally, you can isolate whether ANY(ARRAY(...))'s advantage comes solely
 from suppressing the subquery collapse.  Keep IN but tack OFFSET 0 onto
 the subquery.  If this gives the same performance as ANY(ARRAY(...)), then the
 subquery-collapse suppression was indeed the source of advantage.

I see your point, some dumb logic to replace IN with ANY(ARRAY
wouldn't always yield better results.
I'll try to come up with a self-containing testcase.

Thanks again, Clemens

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Maximum number of sequences that can be created

2012-05-13 Thread Craig James
On Sun, May 13, 2012 at 1:12 AM, Віталій Тимчишин tiv...@gmail.com wrote:



 2012/5/11 Robert Klemme shortcut...@googlemail.com

 On Fri, May 11, 2012 at 12:50 PM, Vidhya Bondre meetvbon...@gmail.com
 wrote:
  Is there any max limit set on sequences that can be created on the
 database
  ? Also would like to know if we create millions of sequences in a
 single db
  what is the downside of it.


 The sequences AFAIK are accounted as relations. Large list of relations
 may slowdown different system utilities like vacuuming (or may not, depends
 on queries and indexes on pg_class).


Not may slow down.  Change that to will slow down and possibly corrupt
your system.

In my experience (PG 8.4.x), the system can handle in the neighborhood of
100,000 relations pretty well.  Somewhere over 1,000,000 relations, the
system becomes unusable.  It's not that it stops working -- day-to-day
operations such as querying your tables and running your applications
continue to work.  But system operations that have to scan for table
information seem to freeze (maybe they run out of memory, or are
encountering an O(N^2) operation and simply cease to complete).

For example, pg_dump fails altogether.  After 24 hours, it won't even start
writing to its output file.  The auto-completion in psql of table and
column names freezes the system.  It takes minutes to drop one table.
Stuff like that. You'll have a system that works, but can't be backed up,
dumped, repaired or managed.

As I said, this was 8.4.x. Things may have changed in 9.x.

Craig


Re: [PERFORM] Any disadvantages of using =ANY(ARRAY()) instead of IN?

2012-05-13 Thread Noah Misch
On Tue, May 01, 2012 at 04:34:10PM +0200, Clemens Eisserer wrote:
 select . from table1 ... left outer join table 15  WHERE
 table1.id IN (select id  join table16 ... join table20 WHERE
 table20.somevalue=?)
 
 Starting with some amount of joins, the optimizer starts to do quite
 suboptimal things like hash-joining huge tables where selctivity would
 very low.
 I already raised join_collapse_limit and from_collapse_limit, but
 after a certain point query planning starts to become very expensive.

On Sun, May 13, 2012 at 04:35:30PM +0200, Clemens Eisserer wrote:
  Since you have 15+ tables at the top level, the genetic query optimizer 
  should
  be kicking in and delivering a plan in reasonable time, albeit with plan
  quality hazards. ??There's a danger zone when the deterministic planner is
  still in effect but {from,join}_collapse_limit have limited the scope of its
  investigation. ??If you're in that zone and have not hand-tailored your
  explicit join order, poor plans are unsurprising. ??What exact configuration
  changes are you using?
 
 Basically only the changes, suggested here a year ago,  which made the
 problem go away for less complex queries:
 
 geqo_threshold = 20
 from_collapse_limit = 13
 join_collapse_limit = 13

Given those settings and the query above, the planner will break the 15
top-level tables into lists of 13 and 2 tables, the 20 subquery tables into
lists of 13 and 7 tables.  The split points arise from order of appearance in
the query text.  The planner then optimizes join order within each list but
not across lists.  That perfectly explains your observation of hash-joining
huge tables where selctivity would very low.

If it were me, I would try two things.  First, set from_collapse_limit = 100,
join_collapse_limit = 100, geqo_threshold = 8.  This will let the planner
consider all join orders for the 35 tables; it will use the genetic query
optimizer to choose one.  See if the plan time and resulting plan are decent.

Second, set from_collapse_limit = 100, join_collapse_limit = 100, geqo = off
and EXPLAIN the query.  This will take forever and might exhaust all system
memory.  If it does complete, you'll see the standard planner's opinion of an
optimal join order.  You can then modify the textual join order in your query
to get the same plan despite returning to a lower join_collapse_limit.  Since
Hibernate is generating your queries, that may prove inconvenient.  It's the
remaining escape hatch if the genetic query optimizer does not suffice.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance