Re: [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres

2005-04-28 Thread a3a18850
Well, this guy has it nailed. He cites Flajolet and Martin, which was (I 
thought) as good as you could get with only a reasonable amount of memory per 
statistic. Unfortunately, their hash table is a one-shot deal; there's no way 
to maintain it once the table changes. His incremental update doesn't degrade 
as the table changes. If there isn't the same wrangle of patent as with the 
ARC algorithm, and if the existing stats collector process can stand the extra 
traffic, then this one is a winner. 
 
Many thanks to the person who posted this reference in the first place; so 
sorry I canned your posting and can't recall your name. 
 
Now, if we can come up with something better than the ARC algorithm ... 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread a3a18850
Quoting Bill Chandler <[EMAIL PROTECTED]>:

> Running PostgreSQL 7.4.2, Solaris.
> Client is reporting that the size of an index is
> greater than the number of rows in the table (1.9
> million vs. 1.5 million).  Index was automatically
> created from a 'bigserial unique' column.

> We have been running 'VACUUM ANALYZE' very regularly. 
> In fact, our vacuum schedule has probably been
> overkill.  We have been running on a per-table basis
> after every update (many per day, only inserts
> occurring) and after every purge (one per day,
> deleting a day's worth of data).  
> 
> What about if an out-of-the-ordinary number of rows
> were deleted (say 75% of rows in the table, as opposed
> to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
> things get out of whack because of that situation?

I gather you mean, out-of-the-ordinary for most apps, but not for this client?

In case nobody else has asked: is your max_fsm_pages big enough to handle all
the deleted pages, across ALL tables hit by the purge? If not, you're
haemorrhaging pages, and VACUUM is probably warning you about exactly that.

If that's not a problem, you might want to consider partitioning the data.
Take a look at inherited tables. For me, they're a good approximation of
clustered indexes (sigh, miss'em) and equivalent to table spaces.

My app is in a similar boat to yours: up to 1/3 of a 10M-row table goes away
every day. For each of the child tables that is a candidate to be dropped, there
is a big prologue txn, whichs moves (INSERT then DELETE) the good rows into a
child table that is NOT to be dropped. Then BANG pull the plug on the tables you
don't want. MUCH faster than DELETE: the dropped tables' files' disk space goes
away in one shot, too.

Just my 2c.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)

2005-04-10 Thread a3a18850
Quoting Tom Lane <[EMAIL PROTECTED]>:

> Mischa <[EMAIL PROTECTED]> writes:
> > Quoting Tom Lane <[EMAIL PROTECTED]>:
> >> WHERE a.x > b.y AND a.x < 42
> 
> > Out of curiosity, will the planner induce "b.y < 42" out of this?
> 
> No.  There's some smarts about transitive equality, but none about
> transitive inequalities.  Offhand I'm not sure if it'd be useful to add
> such.  The transitive-equality code pulls its weight [...]
> but I'm less able to think of common use-cases for transitive
> inequality ...

Thanks. My apologies for not just going and looking at the code first.

Equality-transitives: yes, worth their weight in gold.
Inequality-transitivies: I see in OLAP queries (usually ranges), or in queries
against big UNION ALL views, where const false inequalities are the norm.
"a.x > b.y and a.x < c.z" comes up in OLAP, too, usually inside an EXISTS(...),
where you are doing something analogous to finding a path.




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match