Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-03 Thread Kim Bisgaard

Are there plans to make a small follow-up patch to make
CREATE UNIQUE INDEX on one column
(and variants in CREATE TABLE ... PRIMARY KEY) automatically do
SET STATISTICS DISTINCT?

It might not be as perfect a solution as teaching the planner to know 
about unique indexes, but it is better than nothing.


Good work!

Regards,
Kim


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-03 Thread Tom Lane
Kim Bisgaard kim...@alleroedderne.adsl.dk writes:
 Are there plans to make a small follow-up patch to make
 CREATE UNIQUE INDEX on one column
 (and variants in CREATE TABLE ... PRIMARY KEY) automatically do
 SET STATISTICS DISTINCT?

No.  It would be pointless for a single column and wrong for
multiple columns.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Robert Haas
On Sat, Aug 1, 2009 at 7:27 AM, Dimitri Fontainedfonta...@hi-media.com wrote:
 Unless you want me to test for impact on planner choices (even if we already
 know it has impact on pg_statistic), I'd say it's ready for commiter.

Great, thanks for the review.  I think if stadistinct is getting
populated that's sufficient in terms of testing; the problems caused
by inaccurate ndistinct estimates have been discussed elsewhere.  I
see you've already marked this Ready for Committer, thanks.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Tom Lane
There wasn't any response to this comment:

marcin mank marcin.m...@gmail.com writes:
 ALTER COLUMN SET DISTINCT
 feels like adding a unique constraint.

 ALTER COLUMN SET STATISTICS DISTINCT ?
 ALTER COLUMN SET STATISTICS NDISTINCT ?

I don't want to add a new keyword, but SET STATISTICS DISTINCT would
be an easy change.  Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Pavel Stehule
2009/8/2 Tom Lane t...@sss.pgh.pa.us:
 There wasn't any response to this comment:

 marcin mank marcin.m...@gmail.com writes:
 ALTER COLUMN SET DISTINCT
 feels like adding a unique constraint.

 ALTER COLUMN SET STATISTICS DISTINCT ?
 ALTER COLUMN SET STATISTICS NDISTINCT ?

 I don't want to add a new keyword, but SET STATISTICS DISTINCT would
 be an easy change.  Comments?

It's much better than SET DISTINCT.

Pavel


                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Joshua Tolley
On Sun, Aug 02, 2009 at 01:23:27PM -0400, Tom Lane wrote:
 There wasn't any response to this comment:
 
 marcin mank marcin.m...@gmail.com writes:
  ALTER COLUMN SET DISTINCT
  feels like adding a unique constraint.
 
  ALTER COLUMN SET STATISTICS DISTINCT ?
  ALTER COLUMN SET STATISTICS NDISTINCT ?
 
 I don't want to add a new keyword, but SET STATISTICS DISTINCT would
 be an easy change.  Comments?

+1

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Tom Lane
Dimitri Fontaine dfonta...@hi-media.com writes:
 Yes, and as I didn't have the time to install filterdiff I've attached  
 a revision of your patch in git format which adresses the problem I  
 mentioned, in a tarball also containing raw notes, tests, and  
 regression.{out,diffs}.

Applied with assorted corrections, plus the just-discussed change of
syntax to SET STATISTICS DISTINCT.

 mqke check is failing on opr_sanity test in what looks like an  
 ordering issue, but I didn't feel confident enough to adapt the .out  
 to force the regression into passing.

Hmm, I saw no such change here, so I left the regression test alone.
It might be a platform-specific behavior, in which case the buildfarm
will soon tell us.  If so, adding an ORDER BY to that query seems
like the right fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Robert Haas
On Sun, Aug 2, 2009 at 6:17 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Dimitri Fontaine dfonta...@hi-media.com writes:
 Yes, and as I didn't have the time to install filterdiff I've attached
 a revision of your patch in git format which adresses the problem I
 mentioned, in a tarball also containing raw notes, tests, and
 regression.{out,diffs}.

 Applied with assorted corrections, plus the just-discussed change of
 syntax to SET STATISTICS DISTINCT.

Thanks.  The changes all look good - except I'm curious why %g vs. %f?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Thanks.  The changes all look good - except I'm curious why %g vs. %f?

So as not to add .00 unnecessarily.  Positive values for ndistinct
should be treated as integers.  (I considered adding an error check to
reject values like 20.5, but refrained...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-02 Thread Robert Haas
On Sun, Aug 2, 2009 at 8:35 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Thanks.  The changes all look good - except I'm curious why %g vs. %f?

 So as not to add .00 unnecessarily.  Positive values for ndistinct
 should be treated as integers.  (I considered adding an error check to
 reject values like 20.5, but refrained...)

Oh, I see.  That makes sense.

I think we do entirely too much forcing things to integers in the
query planner as it is.  The fact that a value can't truly be
fractional doesn't mean that an estimate of the value can't be
fractional.

Now, in this particular case, it seems hard to imagine that 20.5 is a
very useful value.  But that's not really our problem: we just need to
reject illegal values, not useless ones.

I'm interested to see how useful this proves to be in the field.  I
implemented it mostly on a whim because you and others seemed to think
it could have some value, and because I get an unhealthy amount of
personal satisfaction out of writing code during my spare time.  But
the real test will be to see whether the real users who were getting
bad query plans as a result of poor ndistinct estimates are able to
make practical use of this feature to get better ones.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-08-01 Thread Dimitri Fontaine

Hi,

Le 1 août 09 à 06:08, Robert Haas a écrit :

I'm lost.  I think you're getting the new column attdistinct mixed up
with the existing column stadistinct.  attdistinct is always going to
have whatever value you assign it.  stadistinct will get attdistinct
!= 0 ? attdistinct : result of analyze calculation.


haha!
Sorry about that, I felt like I had to run against time and once again  
I lost.


dim=# alter table foo alter column x set distinct 0.25;
ALTER TABLE
dim=# select stadistinct from pg_statistic where starelid =  
'foo'::regclass;

-[ RECORD 1 ]-
stadistinct | 0.25

dim=# alter table foo alter column x set distinct 0;
ALTER TABLE
dim=# analyze verbose foo;
INFO:  analyzing public.foo
INFO:  foo: scanned 4 of 4 pages, containing 1000 live rows and 0  
dead rows; 1000 rows in sample, 1000 estimated total rows

ANALYZE
dim=# select stadistinct from pg_statistic where starelid =  
'foo'::regclass;

 stadistinct
-
  -0.652
(1 row)

I'm back on track, it seems. Time to further test this, but code  
review is ok for me (except for the strange new error already  
mentioned), doc is ok too, and basic behaviour is sane. I just checked  
pg_dump dim|psql foo and new database (foo) has same explicit distinct  
settings than origin, both for pg_attribute and pg_statistic.


Unless you want me to test for impact on planner choices (even if we  
already know it has impact on pg_statistic), I'd say it's ready for  
commiter.


--
dim
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-31 Thread Dimitri Fontaine

Hi,

Finally some update on the patch.

Le 18 juil. 09 à 20:55, Robert Haas a écrit :

This is one of the things that I hate about the requirement to post
context diffs: filterdiff, at least for me, strips out the git tags
that indicate the base rev of the patch.


Yes, and as I didn't have the time to install filterdiff I've attached  
a revision of your patch in git format which adresses the problem I  
mentioned, in a tarball also containing raw notes, tests, and  
regression.{out,diffs}.


mqke check is failing on opr_sanity test in what looks like an  
ordering issue, but I didn't feel confident enough to adapt the .out  
to force the regression into passing.



 - style issue, convert at PQgetvalue() time


Ah, good catch.


Fixed in the updated version, which uses a float4 variable in pg_dump  
and strtod() rather than atof. This last point is maybe overkill as  
I'm using:
	tbinfo-attdistinct[j] = strtod(PQgetvalue(res, j, i_attdistinct),  
(char **)NULL);


 What about adding the following before the switch, to do like  
surrounding

code?
   Assert(IsA(newValue, Integer) || IsA(newValue, Float));


Not a good plan.  In my experience, gcc doesn't like switch ()
statements over enums that don't list all the values, unless they have
a default clause; it masterminds by giving you a warning that you've
inadvertently left out some values.


I've left this part alone but still wonders about this, which is a new  
user visible error message kind:

default:
elog(ERROR, unrecognized node type: %d,
 (int) nodeTag(newValue));

Given your revised version I'll try to play around with ndistinct  
behavior

and exercise dump and restore, etc, but for now I'll pause my work.


I failed to have 0 to allow for analyze to compute a value again, as  
shown in the raw notes in attachement:


dim=# alter table foo alter column x set distinct 1;
ALTER TABLE
...
dim=# alter table foo alter column x set distinct 0;
ALTER TABLE
dim=# analyze verbose foo;
INFO:  analyzing public.foo
INFO:  foo: scanned 4 of 4 pages, containing 1000 live rows and 0  
dead rows; 1000 rows in sample, 1000 estimated total rows

ANALYZE
dim=# select attname, attdistinct from pg_attribute where attrelid =  
'foo'::regclass and attname = 'x';

 attname | attdistinct
-+-
 x   |   0
(1 row)

What I understand from the doc part of your work contradicts what I  
see here...


Regards,
--
dim

Will mark as Waiting on Author, as I need Robert to tell me what to do  
next.




ndistinct.tgz
Description: Binary data






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-31 Thread Robert Haas
On Fri, Jul 31, 2009 at 11:09 AM, Dimitri
Fontainedfonta...@hi-media.com wrote:
 I failed to have 0 to allow for analyze to compute a value again, as shown
 in the raw notes in attachement:

I'm lost.  I think you're getting the new column attdistinct mixed up
with the existing column stadistinct.  attdistinct is always going to
have whatever value you assign it.  stadistinct will get attdistinct
!= 0 ? attdistinct : result of analyze calculation.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-29 Thread Robert Haas
On Sat, Jul 18, 2009 at 5:54 PM, Dimitri Fontainedfonta...@hi-media.com wrote:
 I'm going to update the status of patch and resume work on it next week.

Any update?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-29 Thread Dimitri Fontaine

Le 29 juil. 09 à 13:07, Robert Haas a écrit :
On Sat, Jul 18, 2009 at 5:54 PM, Dimitri Fontainedfonta...@hi-media.com 
 wrote:
I'm going to update the status of patch and resume work on it next  
week.


Any update?



Sorry about delays, resuming tomorrow evening is the current plan.
--
dim
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-18 Thread Robert Haas
On Fri, Jul 17, 2009 at 11:10 AM, Dimitri Fontainedfonta...@hi- I
couldn't apply it to current head because of bitrot. It applies fine
to
 git commit bcaef8b5a0e2d5c143dabd8516090a09e39b27b8 [1] but

This is one of the things that I hate about the requirement to post
context diffs: filterdiff, at least for me, strips out the git tags
that indicate the base rev of the patch.  In fact that rev is BEFORE
the one I based the patch on, which was
64b2da3f7778bc6fd3d213ceb9e73ff3b6e03890, and it actually applies OK
up until 0e3abe7ec78a3d51032d684598f188b0b0304fe9, the commit
immediately preceding the 8.4 pgindent run.

 Now I've had time to read the code, here are my raw notes:

 pg_dump.c:
            tbinfo-attstattarget[j] = atoi(PQgetvalue(res, j,
 i_attstattarget));
 +           tbinfo-attdistinct[j] = strdup(PQgetvalue(res, j,
 i_attdistinct));
 ...
 +           if (atof(tbinfo-attdistinct[j]) != 0 
 +               !tbinfo-attisdropped[j])

  - style issue, convert at PQgetvalue() time

Ah, good catch.

  - prefer strtod() over atof? Here's what my local man page has to say about
 the case:

     The atof() function has been deprecated by strtod() and should not be
 used in
     new code.

Hmm, I didn't know that (my man page doesn't contain that language).
It looks like strtod() is more widely used in the existing source code
than atof(), so I'll change it (atof() is however used in the
floatVal() macro).

 tablecmds.c:
 +   switch (nodeTag(newValue))
 +   {
 +       case T_Integer:
 ...
 +       case T_Float:
 ...
 +               default:
 +                       elog(ERROR, unrecognized node type: %d,
 +                                (int) nodeTag(newValue));

  What about adding the following before the switch, to do like surrounding
 code?
        Assert(IsA(newValue, Integer) || IsA(newValue, Float));

Not a good plan.  In my experience, gcc doesn't like switch ()
statements over enums that don't list all the values, unless they have
a default clause; it masterminds by giving you a warning that you've
inadvertently left out some values.

 Given your revised version I'll try to play around with ndistinct behavior
 and exercise dump and restore, etc, but for now I'll pause my work.

 I guess I'll have a second look at the code to check that it's all in the
 spirit of surrounding code, which I didn't complete yet (wanted to exercise
 my abilities to apply the patch from a past commit and forward-merge from
 there).

OK.  My one concern about updating this is that Peter is currently
reviewing my patch to autogenerate headers  bki stuff.  If that gets
committed it's going to break this all again, and I'd rather just fix
it once (especially since that patch would reduce the amount of fixing
needed here by 50%).  Also if this gets applied after being fixed it
will break that one.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-18 Thread Dimitri Fontaine

Le 18 juil. 09 à 20:55, Robert Haas a écrit :

that indicate the base rev of the patch.  In fact that rev is BEFORE
the one I based the patch on, which was
64b2da3f7778bc6fd3d213ceb9e73ff3b6e03890, and it actually applies OK
up until 0e3abe7ec78a3d51032d684598f188b0b0304fe9, the commit
immediately preceding the 8.4 pgindent run.


Robert told me on rrreviewers it's all whitespace stuff, which I could  
confirmed after having trained my eyes to recognize the pattern and  
mix and match whitespace and new column in the middle. Patch applied,  
I'll have to update the version check, but the following paste means  
I'm the one having to work on it now:


   Table pg_catalog.pg_attribute
   Column |   Type| Modifiers
---+---+---
...
attstattarget | integer   | not null
attdistinct   | real  | not null
attlen| smallint  | not null


Hmm, I didn't know that (my man page doesn't contain that language).
It looks like strtod() is more widely used in the existing source code
than atof(), so I'll change it (atof() is however used in the
floatVal() macro).


Considering I don't have an updated version when I start again, I'll  
have a try at it: don't rush into it yourself :)


 What about adding the following before the switch, to do like  
surrounding

code?
   Assert(IsA(newValue, Integer) || IsA(newValue, Float));


Not a good plan.  In my experience, gcc doesn't like switch ()
statements over enums that don't list all the values, unless they have
a default clause; it masterminds by giving you a warning that you've
inadvertently left out some values.


Maybe this could still be a supplementary barrier for cassert builds  
in order to match existing code?

--
dim

I'm going to update the status of patch and resume work on it next week.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-17 Thread Dimitri Fontaine

Hi,

Le 3 mai 09 à 22:13, Robert Haas a écrit :

OK, new version of patch, this time with the weird scaling removed and
the datatype changed to float4.


You've been assigning me this patch review, so here it goes :)


I have not changed the minimum value for remoteVersion in pg_dump.c,
as that would make the patch not able to be tested now.  So that line
and the comment two lines following will need to be updated prior to
application.  Also requires catversion bump.


I guess now would be a good time to fix this part of the patch?

I couldn't apply it to current head because of bitrot. It applies fine  
to git commit bcaef8b5a0e2d5c143dabd8516090a09e39b27b8 [1] but from  
there the automatic forward merge of git isn't able to merge  
pg_attribute.h. As I don't have as much time to give to the review as  
I'd like, I'd very much welcome if you could fix this part of your  
patch and I'll resume my work thereafter.
I'll change the patch status to Waiting on Author as soon as I'll  
have this mail id.



Now I've had time to read the code, here are my raw notes:

pg_dump.c:
tbinfo-attstattarget[j] = atoi(PQgetvalue(res, j,  
i_attstattarget));
+   tbinfo-attdistinct[j] = strdup(PQgetvalue(res, j,  
i_attdistinct));

...
+   if (atof(tbinfo-attdistinct[j]) != 0 
+   !tbinfo-attisdropped[j])

 - style issue, convert at PQgetvalue() time

 - prefer strtod() over atof? Here's what my local man page has to  
say about the case:


 The atof() function has been deprecated by strtod() and should  
not be used in

 new code.

tablecmds.c:
+   switch (nodeTag(newValue))
+   {
+   case T_Integer:
...
+   case T_Float:
...
+   default:
+   elog(ERROR, unrecognized node type: %d,
+(int) nodeTag(newValue));

 What about adding the following before the switch, to do like  
surrounding code?

Assert(IsA(newValue, Integer) || IsA(newValue, Float));

Given your revised version I'll try to play around with ndistinct  
behavior and exercise dump and restore, etc, but for now I'll pause my  
work.


I guess I'll have a second look at the code to check that it's all in  
the spirit of surrounding code, which I didn't complete yet (wanted to  
exercise my abilities to apply the patch from a past commit and  
forward-merge from there).


Regards,
--
dim

[1]: 
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=bcaef8b5a0e2d5c143dabd8516090a09e39b27b8
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-07-17 Thread marcin mank
ALTER COLUMN SET DISTINCT

feels like adding a unique constraint.

ALTER COLUMN SET STATISTICS DISTINCT ?
ALTER COLUMN SET STATISTICS NDISTINCT ?

Greetings
Marcin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-05-05 Thread Robert Haas
On Mon, May 4, 2009 at 10:41 PM, Joshua Tolley eggyk...@gmail.com wrote:
 On Mon, May 04, 2009 at 10:13:31PM -0400, Robert Haas wrote:

 nit
 +       own analysis indicates otherwie).  When set to a negative value, 
 which
 s/otherwie/otherwise
 /nit


 A question: why does attdistinct become entry #5 instead of going at the end?
 I assume it's because the order here controls the column order, and it makes
 sense to have attdistinct next to attstattarget, since they're related. Is
 that right? Thanks in advance...

Yep, that was my thought.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-05-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 4, 2009 at 10:41 PM, Joshua Tolley eggyk...@gmail.com wrote:
 A question: why does attdistinct become entry #5 instead of going at the end?
 I assume it's because the order here controls the column order, and it makes
 sense to have attdistinct next to attstattarget, since they're related. Is
 that right? Thanks in advance...

 Yep, that was my thought.

We generally want fixed-size columns before variable-size ones, to ease
accessing them from C code.  So it shouldn't go at the end in any case.
Beyond that it's mostly aesthetics, with maybe some thought for avoiding
unnecessary alignment padding.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-05-05 Thread Robert Haas
On Tue, May 5, 2009 at 12:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, May 4, 2009 at 10:41 PM, Joshua Tolley eggyk...@gmail.com wrote:
 A question: why does attdistinct become entry #5 instead of going at the 
 end?
 I assume it's because the order here controls the column order, and it makes
 sense to have attdistinct next to attstattarget, since they're related. Is
 that right? Thanks in advance...

 Yep, that was my thought.

 We generally want fixed-size columns before variable-size ones, to ease
 accessing them from C code.  So it shouldn't go at the end in any case.
 Beyond that it's mostly aesthetics, with maybe some thought for avoiding
 unnecessary alignment padding.

I thought about that as well; it should be OK where it is, in that regard.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-05-04 Thread Joshua Tolley
On Mon, May 04, 2009 at 10:13:31PM -0400, Robert Haas wrote:

nit
 +   own analysis indicates otherwie).  When set to a negative value, which
s/otherwie/otherwise
/nit


A question: why does attdistinct become entry #5 instead of going at the end?
I assume it's because the order here controls the column order, and it makes
sense to have attdistinct next to attstattarget, since they're related. Is
that right? Thanks in advance...

 --- 185,210 
* 
*/
   
 ! #define Natts_pg_attribute  19
   #define Anum_pg_attribute_attrelid  1
   #define Anum_pg_attribute_attname   2
   #define Anum_pg_attribute_atttypid  3
   #define Anum_pg_attribute_attstattarget 4
 ! #define Anum_pg_attribute_attdistinct   5
 ! #define Anum_pg_attribute_attlen6
 ! #define Anum_pg_attribute_attnum7

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-05-03 Thread Jaime Casanova
On Sun, May 3, 2009 at 3:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Apr 6, 2009 at 11:15 AM, Robert Haas robertmh...@gmail.com wrote:
 So based on this comment and Stephen's remarks, I'm going to assume
 that I'm succumbing to a fit of unjustified paranoia and re-implement
 as you suggest.

 OK, new version of patch, this time with the weird scaling removed and
 the datatype changed to float4.


In this paragraph i think you mean: ALTER TABLE ... ALTER COLUMN ...
SET DISTINCT?

para
+The analyzer also estimates the number of distinct values that appear
+in each column.  Because only a subset of pages are scanned, this method
+can sometimes be quite inaccurate, especially for large tables.  If this
+inaccuracy leads to bad query plans, the analyzer can be forced to use
+a more correct value with commandALTER TABLE ... ALTER COLUMN ... SET
+STATISTICS/command (see xref linkend=sql-altertable
+endterm=sql-altertable-title).
+   /para

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-06 Thread Robert Haas
On Sun, Apr 5, 2009 at 11:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 (It's also worth pointing out that the calculations we do with
 ndistinct are pretty approximations anyway.  If the difference between
 stadistinct = -1 x 10^-6 and stadistinct = -1.4^10-6 is the thing
 that's determining whether the planner is picking the correct plan on
 your 4-billion-row table,

 No, it's the loss of ability to set stadistinct to -1e-9 or -1e-12 or
 -1e-15 or so that is bothering me.  In a table with billions of rows
 that could become important.

 Or maybe not; but the real bottom line here is that it is 100% silly to
 use a different representation in this column than is used in the
 underlying stadistinct column.  All you accomplish by that is to impose
 on the user the intersection of the accuracy/range limits of the two
 different representations.

Well, I think I was pretty clear about what I was trying to
accomplish.  I think there are more people who care about pg_dump
output being diffable than there are who need to set ndistinct more
accurately than 1 ppm and yet not as an integer.  Perhaps if any of
those people are reading this thread they could chime in.  Otherwise,
I will implement as you propose.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-06 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 Well, I think I was pretty clear about what I was trying to
 accomplish.  I think there are more people who care about pg_dump
 output being diffable than there are who need to set ndistinct more
 accurately than 1 ppm and yet not as an integer.  Perhaps if any of
 those people are reading this thread they could chime in.  Otherwise,
 I will implement as you propose.

I do such diffs pretty often, but I don't think I've *ever* done it on
catalog tables..  Perhaps it'll come up in the future, but I doubt it.

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-06 Thread Robert Haas
On Mon, Apr 6, 2009 at 7:30 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Well, I think I was pretty clear about what I was trying to
 accomplish.  I think there are more people who care about pg_dump
 output being diffable than there are who need to set ndistinct more
 accurately than 1 ppm and yet not as an integer.  Perhaps if any of
 those people are reading this thread they could chime in.  Otherwise,
 I will implement as you propose.

 I do such diffs pretty often, but I don't think I've *ever* done it on
 catalog tables..  Perhaps it'll come up in the future, but I doubt it.

        Stephen

Well the point is when you dump a user table, it will dump this
setting along with it, same as it does now for statistics_target.  So
if you diff the DDL you might see differences in rounding.  If you
only diff the data, it won't matter unless, as you say, you're dumping
pg_attribute itself.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, I think I was pretty clear about what I was trying to
 accomplish.  I think there are more people who care about pg_dump
 output being diffable than there are who need to set ndistinct more
 accurately than 1 ppm and yet not as an integer.

My, you *are* paranoid about float4 aren't you?  Once you've stored
a given value, it's always going to dump the same.  Diffing different
dumps isn't going to be a problem.  I concede that it might look
different from what you put in originally, but we make no effort to
guarantee that for DDL anyway.

(It is true that with pg_dump's default float_extra_digits setting,
outputs from this column might look uglier than they need to.
I don't see anything wrong with having pg_dump forcibly round the
value to five or so digits, though.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-06 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Apr 6, 2009 at 7:30 AM, Stephen Frost sfr...@snowman.net wrote:
  I do such diffs pretty often, but I don't think I've *ever* done it on
  catalog tables..  Perhaps it'll come up in the future, but I doubt it.
 
 Well the point is when you dump a user table, it will dump this
 setting along with it, same as it does now for statistics_target.  So
 if you diff the DDL you might see differences in rounding.  If you
 only diff the data, it won't matter unless, as you say, you're dumping
 pg_attribute itself.

The rounding when you dump it out is going to be consistant though, is
it not?  I mean, you might get a difference between what you try to set
it to and the result that you get from pg_dump, but if you compare one
pg_dump to another done later there shouldn't be any change, right?  If
there is an architecture difference then I could maybe see it, but I
thought float-handling was well-defined on systems we run on.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-06 Thread Robert Haas
On Mon, Apr 6, 2009 at 10:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, I think I was pretty clear about what I was trying to
 accomplish.  I think there are more people who care about pg_dump
 output being diffable than there are who need to set ndistinct more
 accurately than 1 ppm and yet not as an integer.

 My, you *are* paranoid about float4 aren't you?

Yes - perhaps unreasonably so.  I've had so many bad experiences with
floating-point arithmetic that I've stopped trying to understand all
of the crazy things that can happen and started just avoiding it like
the plague.  Long live numeric!

 Once you've stored
 a given value, it's always going to dump the same.  Diffing different
 dumps isn't going to be a problem.  I concede that it might look
 different from what you put in originally, but we make no effort to
 guarantee that for DDL anyway.

 (It is true that with pg_dump's default float_extra_digits setting,
 outputs from this column might look uglier than they need to.
 I don't see anything wrong with having pg_dump forcibly round the
 value to five or so digits, though.)

So based on this comment and Stephen's remarks, I'm going to assume
that I'm succumbing to a fit of unjustified paranoia and re-implement
as you suggest.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Robert Haas
On Sat, Apr 4, 2009 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not thrilled about adding a column to pg_attribute for this.

 What is the specific nature of your concern?

 Actually, I'm more worried about the TupleDesc data structure than
 the catalogs.  There are TupleDescs all over the backend, and I've
 seen evidence in profiles that setting them up is a nontrivial cost.

 You're very possibly right that four more bytes is in the noise,
 though.

 Two other comments now that I've read a little further:

 * This isn't happening for 8.4, so adjust the pg_dump code.

I thought about writing 80500, but the effect of that would have been
to render the patch impossible to test, so I didn't. :-)

I think I'll be very lucky if that's the most bitrot this accumulates
between now and when the tree is open for 8.5 development.  System
catalog changes stink in that regard.  I suppose we could tag and
branch the tree now, but that would just move the work of fixing any
subsequent conflicts from patch authors to committers, which is sort
of a zero-sum game.

 * Using an integer is bogus.  Use a float4 and forget the weird scaling;
 it should have exactly the same interpretation as stadistinct, except
 for 0 meaning unset instead of unknown.

I think there's a pretty good chance that will lead to a complaint
that is some variant of the following: I ran this command and then I
did a pg_dump and the output doesn't match what I put in. Or maybe,
I did a dump and a restore on a different machine with a different
architecture and then another dump and then I diffed them and this
popped out.

I have a deep-seated aversion to storing important values as float,
and we seem to have no other floats anywhere in our DDL, so I was a
little leery about breaking new ground.  There's nothing particularly
special about the scaling that the pg_statistic stuff uses, and it's
basically pretty obscure internal stuff anyway, so I think the
consistency argument is fairly weak.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 4, 2009 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * Using an integer is bogus.  Use a float4 and forget the weird scaling;
 it should have exactly the same interpretation as stadistinct, except
 for 0 meaning unset instead of unknown.

 I have a deep-seated aversion to storing important values as float,

[ shrug... ]  Precision is not important for this value: we are not
anywhere near needing more than six significant digits for our
statistical estimates.  Range, on the other hand, could be important
when dealing with really large tables.  So I'm much more concerned
about whether the definition is too restrictive than about whether
some uninformed person complains about exactness.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Robert Haas
On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 4, 2009 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * Using an integer is bogus.  Use a float4 and forget the weird scaling;
 it should have exactly the same interpretation as stadistinct, except
 for 0 meaning unset instead of unknown.

 I have a deep-seated aversion to storing important values as float,

 [ shrug... ]  Precision is not important for this value: we are not
 anywhere near needing more than six significant digits for our
 statistical estimates.  Range, on the other hand, could be important
 when dealing with really large tables.  So I'm much more concerned
 about whether the definition is too restrictive than about whether
 some uninformed person complains about exactness.

I thought about that, and if you think that's better, I can implement
it that way.  Personally, I'm unconvinced.  The use case for
specifying a number of distinct values in excess of 2 billion as an
absolute number rather than as a percentage of the table size seems
pretty weak to me.  I would rather use integers and have it be clean.
But I would rather have it your way than not have it at all.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  Precision is not important for this value: we are not
 anywhere near needing more than six significant digits for our
 statistical estimates.  Range, on the other hand, could be important
 when dealing with really large tables.

 I thought about that, and if you think that's better, I can implement
 it that way.  Personally, I'm unconvinced.  The use case for
 specifying a number of distinct values in excess of 2 billion as an
 absolute number rather than as a percentage of the table size seems
 pretty weak to me.

I was more concerned about the other end of it.  Your patch sets a
not-too-generous lower bound on the percentage that can be represented ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Robert Haas
On Sun, Apr 5, 2009 at 10:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  Precision is not important for this value: we are not
 anywhere near needing more than six significant digits for our
 statistical estimates.  Range, on the other hand, could be important
 when dealing with really large tables.

 I thought about that, and if you think that's better, I can implement
 it that way.  Personally, I'm unconvinced.  The use case for
 specifying a number of distinct values in excess of 2 billion as an
 absolute number rather than as a percentage of the table size seems
 pretty weak to me.

 I was more concerned about the other end of it.  Your patch sets a
 not-too-generous lower bound on the percentage that can be represented ...

Huh?  With a scaling factor of 1 million, you can represent anything
down to about 0.01, which is apparently all you can expect out of
a float4 anyway.

http://archives.postgresql.org/pgsql-bugs/2009-01/msg00039.php

In fact, we could change the scaling factor to 1 billion if you like,
and it would then give you MORE significant digits than you'll get out
of a float4 (and you'll be able to predict the exact number that
you're gonna get).  If someone has billions of rows in the table but
only thousands of distinct values, I would expect them to run a script
to count 'em up and specify the exact number, rather than specifying
some microscopic percentage.  But there's certainly enough range in
int4 to tack on three more decimal places if you think it's warranted.

(It's also worth pointing out that the calculations we do with
ndistinct are pretty approximations anyway.  If the difference between
stadistinct = -1 x 10^-6 and stadistinct = -1.4^10-6 is the thing
that's determining whether the planner is picking the correct plan on
your 4-billion-row table, you probably want to tune some other
parameter as well so as to get further away from that line.  Just
getting the value in the ballpark should be a big improvement over how
things stand now.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Robert Haas
On Sun, Apr 5, 2009 at 10:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Apr 5, 2009 at 10:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  Precision is not important for this value: we are not
 anywhere near needing more than six significant digits for our
 statistical estimates.  Range, on the other hand, could be important
 when dealing with really large tables.

 I thought about that, and if you think that's better, I can implement
 it that way.  Personally, I'm unconvinced.  The use case for
 specifying a number of distinct values in excess of 2 billion as an
 absolute number rather than as a percentage of the table size seems
 pretty weak to me.

 I was more concerned about the other end of it.  Your patch sets a
 not-too-generous lower bound on the percentage that can be represented ...

 Huh?  With a scaling factor of 1 million, you can represent anything
 down to about 0.01, which is apparently all you can expect out of
 a float4 anyway.

 http://archives.postgresql.org/pgsql-bugs/2009-01/msg00039.php

I guess I'm wrong here - 0.1 is only one SIGNIFICANT digit.  But
the point remains that specifying ndistinct in ppm is probably enough
for most cases, and ppb (which would still fit in int4) even more so.
I don't think we need to worry about people with trillions of rows
(and even they could still specify an absolute number).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 (It's also worth pointing out that the calculations we do with
 ndistinct are pretty approximations anyway.  If the difference between
 stadistinct = -1 x 10^-6 and stadistinct = -1.4^10-6 is the thing
 that's determining whether the planner is picking the correct plan on
 your 4-billion-row table,

No, it's the loss of ability to set stadistinct to -1e-9 or -1e-12 or
-1e-15 or so that is bothering me.  In a table with billions of rows
that could become important.

Or maybe not; but the real bottom line here is that it is 100% silly to
use a different representation in this column than is used in the
underlying stadistinct column.  All you accomplish by that is to impose
on the user the intersection of the accuracy/range limits of the two
different representations.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Per previous discussion.
 http://archives.postgresql.org/message-id/8066.1229106...@sss.pgh.pa.us
 http://archives.postgresql.org/message-id/603c8f070904021926g92eb55sdfc6814113395...@mail.gmail.com

I'm not thrilled about adding a column to pg_attribute for this.
Isn't there some way of keeping it in pg_statistic?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-04 Thread Robert Haas
On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Per previous discussion.
 http://archives.postgresql.org/message-id/8066.1229106...@sss.pgh.pa.us
 http://archives.postgresql.org/message-id/603c8f070904021926g92eb55sdfc6814113395...@mail.gmail.com

 I'm not thrilled about adding a column to pg_attribute for this.
 Isn't there some way of keeping it in pg_statistic?

I don't like the idea of keeping it in pg_statistic.  Right now, all
of the data in pg_statistic is transient, so you could theoretically
truncate the table at any time without losing anything permanent.
It's true that we don't do that right now, but it seems cleaner to
keep the data generated by the analyzer separate from the stuff we
consider part of the structure of the database.  If we did put the
data in pg_statistic, then we'd have to teach vacuum that when it
writes out new statistics, it also has to copy over this setting from
the previous version of the tuple.  And that means it would have to
lock the tuples against concurrent updates while analyze is running.
Also, if someone happened to run ALTER TABLE SET DISTINCT before the
first run of ANALYZE on that table (for example, during pg_load)
there'd be no existing row in pg_statistic for the DDL command to
update, so we'd need to create and insert a fake row (which,
incidentally, would blow up any concurrent ANALYZE already in progress
when it got and tried to insert the resulting rows into pg_statistic,
violating the unique constraint).  All in all it seems rather messy.

What is the specific nature of your concern?  I thought about the
possibility of a distributed performance penalty that might be
associated with enlarging pg_attribute, but increasing the size of a
structure that is already 112 bytes by another 4 doesn't seem likely
to be significant, especially since we're not crossing a power-of-two
boundary.  It might be possible to reclaim 4 bytes by changing
attstattarget and attndims from int4 to int2, but I'd rather do that
as a separate patch.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-04 Thread Alvaro Herrera
Robert Haas escribió:
 On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Per previous discussion.
  http://archives.postgresql.org/message-id/8066.1229106...@sss.pgh.pa.us
  http://archives.postgresql.org/message-id/603c8f070904021926g92eb55sdfc6814113395...@mail.gmail.com
 
  I'm not thrilled about adding a column to pg_attribute for this.
  Isn't there some way of keeping it in pg_statistic?
 
 I don't like the idea of keeping it in pg_statistic.  Right now, all
 of the data in pg_statistic is transient, so you could theoretically
 truncate the table at any time without losing anything permanent.

Maybe use a new catalog?

 What is the specific nature of your concern?  I thought about the
 possibility of a distributed performance penalty that might be
 associated with enlarging pg_attribute, but increasing the size of a
 structure that is already 112 bytes by another 4 doesn't seem likely
 to be significant, especially since we're not crossing a power-of-two
 boundary.

FWIW it has been said that whoever is concerned about pg_attribute bloat
should be first looking at getting rid of the redundant entries for
system columns, for each and every table.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-04 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 FWIW it has been said that whoever is concerned about pg_attribute bloat
 should be first looking at getting rid of the redundant entries for
 system columns, for each and every table.

That's been overtaken by events, unfortunately: we now need those
entries to carry per-column permissions on system columns.  So they're
no longer merely overhead.

Possibly we could hack things so that only system columns with
non-default permissions are actually stored, but that feels like
a kluge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-04 Thread Robert Haas
On Sat, Apr 4, 2009 at 10:31 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Robert Haas escribió:
 On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Per previous discussion.
  http://archives.postgresql.org/message-id/8066.1229106...@sss.pgh.pa.us
  http://archives.postgresql.org/message-id/603c8f070904021926g92eb55sdfc6814113395...@mail.gmail.com
 
  I'm not thrilled about adding a column to pg_attribute for this.
  Isn't there some way of keeping it in pg_statistic?

 I don't like the idea of keeping it in pg_statistic.  Right now, all
 of the data in pg_statistic is transient, so you could theoretically
 truncate the table at any time without losing anything permanent.

 Maybe use a new catalog?

If we go that route, we would probably make sense to move
attstattarget there as well.  Obviously it wouldn't make sense to move
anything that's in the critical path of ordinary database operations,
but maybe attislocal or attinhcount could be moved as well.  But I'm
not sure it's really warranted because, AFAIK, we have no evidence
that this is a real as opposed to a theoretical problem, and even if
we moved all of that stuff, that's only 12 bytes, and now you have
another table that's competing for space in the system cache.  If
someone could demonstrate (say, by reducing NAMEDATALEN) that a
smaller pg_attribute structure would generate a real performance
benefit, then it would be worth spending the time to figure out a way
to make that happen (obviously without actually reducing NAMEDATALEN,
that's only a possible way to measure the impact).

 What is the specific nature of your concern?  I thought about the
 possibility of a distributed performance penalty that might be
 associated with enlarging pg_attribute, but increasing the size of a
 structure that is already 112 bytes by another 4 doesn't seem likely
 to be significant, especially since we're not crossing a power-of-two
 boundary.

 FWIW it has been said that whoever is concerned about pg_attribute bloat
 should be first looking at getting rid of the redundant entries for  BN
 system columns, for each and every table.

That's a different kind of bloat (more rows vs. larger rows) but a
valid point all the same.  I suspect neither type has much practical
impact, and that if we listed all the performance problems that
PostgreSQL has today, neither would be in the top 500.  Bad ndistinct
estimates would be, however.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not thrilled about adding a column to pg_attribute for this.

 What is the specific nature of your concern?

Actually, I'm more worried about the TupleDesc data structure than
the catalogs.  There are TupleDescs all over the backend, and I've
seen evidence in profiles that setting them up is a nontrivial cost.

You're very possibly right that four more bytes is in the noise,
though.

Two other comments now that I've read a little further:

* This isn't happening for 8.4, so adjust the pg_dump code.

* Using an integer is bogus.  Use a float4 and forget the weird scaling;
it should have exactly the same interpretation as stadistinct, except
for 0 meaning unset instead of unknown.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers