Re: [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres

2005-04-28 Thread Josh Berkus
> Now, if we can come up with something better than the ARC algorithm ... Tom already did. His clock-sweep patch is already in the 8.1 source. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked

Re: [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres

2005-04-28 Thread a3a18850
Well, this guy has it nailed. He cites Flajolet and Martin, which was (I thought) as good as you could get with only a reasonable amount of memory per statistic. Unfortunately, their hash table is a one-shot deal; there's no way to maintain it once the table changes. His incremental update doesn

Re: [PERFORM] index on different types

2005-04-28 Thread Michael Fuhr
On Fri, Apr 29, 2005 at 04:35:13AM +0200, Enrico Weigelt wrote: > > there's often some talk about indices cannot be used if datatypes > dont match. PostgreSQL 8.0 is smarter than previous versions in this respect. It'll use an index if possible even when the types don't match. > On a larger (an

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-28 Thread Andrew Dunstan
Mischa Sandberg wrote: Perhaps I can save you some time (yes, I have a degree in Math). If I understand correctly, you're trying extrapolate from the correlation between a tiny sample and a larger sample. Introducing the tiny sample into any decision can only produce a less accurate result than

[PERFORM] index on different types

2005-04-28 Thread Enrico Weigelt
Hi folks, there's often some talk about indices cannot be used if datatypes dont match. On a larger (and long time growed) application I tend to use OID for references on new tables while old stuff is using integer. Is the planner smart enough to see both as compatible datatype or is manual c

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-28 Thread Marko Ristola
First I will comment my original idea. Second I will give another improved suggestion (an idea). I hope, that they will be useful for you. (I don't know, wether the first one was useful at all because it showed, that I and some others of us are not very good with statistics :( ) I haven't looked ab

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-28 Thread Mischa Sandberg
Quoting Josh Berkus : > > >Perhaps I can save you some time (yes, I have a degree in Math). If I > > >understand correctly, you're trying extrapolate from the correlation > > >between a tiny sample and a larger sample. Introducing the tiny sample > > >into any decision can only produce a less accu

Re: [PERFORM] Suggestions for a data-warehouse migration routine

2005-04-28 Thread Mischa Sandberg
Quoting Richard Rowell <[EMAIL PROTECTED]>: > I've ported enough of my companies database to Postgres to make > warehousing on PG a real possibility. I thought I would toss my > data > migration architecture ideas out for the list to shoot apart.. > [...] Not much feedback required. Yes, droppi

Re: [PERFORM] Why is this system swapping?

2005-04-28 Thread Jeff
On Apr 27, 2005, at 7:46 PM, Greg Stark wrote: In fact I think it's generally superior to having a layer like pgpool having to hand off all your database communication. Having to do an extra context switch to handle every database communication is crazy. I suppose this depends on how many machin

Re: [PERFORM] Final decision

2005-04-28 Thread Dave Page
> -Original Message- > From: Josh Berkus [mailto:[EMAIL PROTECTED] > Sent: 28 April 2005 04:09 > To: Dave Page > Cc: Joshua D. Drake; Joel Fradkin; PostgreSQL Perform > Subject: Re: [PERFORM] Final decision > > Dave, folks, > > > Err, yes. But that's not quite the same as core telling