Re: [HACKERS] Show sequences owned by

2011-11-05 Thread Magnus Hagander
On Fri, Nov 4, 2011 at 16:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Updated patch attached that does this, and the proper schema qualifications.

 I'd schema-qualify the quote_ident and regclass names too.

 Also, just as a matter of style, I think it'd be better to assign short
 aliases to the table names (pg_catalog.pg_class c etc) and use those.
 I forget what the letter of the SQL standard is about whether an
 un-aliased schema-qualified table name can be referenced in the query
 without schema-qualifying the reference, but I'm pretty sure that not
 doing so is at least frowned on.

Fixed, and applied.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show statistics target in \d+

2011-11-05 Thread Magnus Hagander
On Fri, Nov 4, 2011 at 16:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Would you find it better if we showed blank (NULL) when it was -1?

 Yeah, I would.  Seems less confusing.

Adjusted per this, renamed to Stats target, and applied.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: casts row to array and array to row

2011-11-05 Thread Noah Misch
On Tue, Oct 11, 2011 at 10:40:26AM +0200, Pavel Stehule wrote:
 What do you think about this idea?

+1, belatedly.  Having inherent casts to/from text since version 8.3 has
smoothed out some aggravating corner cases.  If the patch isn't invasive and the
casts are all explicit-only, I anticipate a similar win.

True, unlike any - text, not every cast will actually work.  However, the
semantics are well-defined and incompatible choices can be detected just as
readily as we do for incompatible casts among scalars.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] type privileges and default privileges

2011-11-05 Thread Peter Eisentraut
During the closing days of the 9.1 release, we had discussed that we
should add privileges on types (and domains), so that owners can prevent
others from using their types because that would prevent the owners from
changing them in certain ways.  (Collations have similar issues and work
quite similar to types, so we could include them in this consideration.)

As I'm plotting to write code for this, I wonder about how to handle
default privileges.  For compatibility and convenience, we would still
want to have types with public privileges by default.  Should we
continue to hardcode this, as we have done in the past with functions,
for example, or should we use the new default privileges facility to
register the public default privileges in the template database?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plpython extension control files installation

2011-11-05 Thread Peter Eisentraut
We only build the language plpython2u or plpython3u, not both, in any
build, but we always install the extension control files for all
variants.  Is there a reason for this, or just an oversight?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql expanded auto

2011-11-05 Thread Peter Eisentraut
On fre, 2011-11-04 at 07:34 -0400, Noah Misch wrote:
 For \pset format wrapped, we only use $COLUMNS when the output is a
 tty.  I'm thinking it's best, although not terribly important, to
 apply the same rule to this feature.

I think it does work that way.  There is only one place where the
environment variable is queries, and it's used for both wrapped format
and expanded auto format.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Strange problem with create table as select * from table;

2011-11-05 Thread Adrian Klaver
On Friday, November 04, 2011 6:04:02 pm Tom Lane wrote:
 I wrote:
  A different line of thought is that there's something about these
  specific source rows, and only these rows, that makes them vulnerable to
  corruption during INSERT/SELECT.  Do they by any chance contain any
  values that are unusual elsewhere in your table?  One thing I'm
  wondering about right now is the nulls bitmap --- so do these rows have
  nulls (or not-nulls) in any place that's unusual elsewhere?
 
 Hah ... I have a theory.
 

 
 This is trivial to fix, now that we know there's a problem --- the
 function is only using that assumption to save itself a couple lines
 of code.  Penny wise, pound foolish :-(

I killed a few brain cells just reading the explanation:) 

 
   regards, tom lane
 
 


-- 
Adrian Klaver
adrian.kla...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IDLE in transaction introspection

2011-11-05 Thread Greg Smith

On 11/04/2011 05:01 PM, Tom Lane wrote:

Scott Meadsco...@openscg.com  writes:
   

I leave the waiting flag in place for posterity.  With this in mind, is
the consensus:
RUNNING
 or
ACTIVE
 

Personally, I'd go for lower case.
   


I was thinking it would be nice if this state looked like the WAL sender 
state values in pg_stat_replication, which are all lower case.  For 
comparison those states are:


startup
backup
catchup
streaming

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql expanded auto

2011-11-05 Thread Noah Misch
On Sat, Nov 05, 2011 at 04:53:56PM +0200, Peter Eisentraut wrote:
 On fre, 2011-11-04 at 07:34 -0400, Noah Misch wrote:
  For \pset format wrapped, we only use $COLUMNS when the output is a
  tty.  I'm thinking it's best, although not terribly important, to
  apply the same rule to this feature.
 
 I think it does work that way.  There is only one place where the
 environment variable is queries, and it's used for both wrapped format
 and expanded auto format.

You're correct; given output to a non-tty and no use of \pset columns,
output_columns always becomes zero.  This makes wrapped format never wrap, but
it makes expanded auto mode always expand.  Would it be more consistent to never
expand when output_columns == 0?  That is, make these give the same output:

psql -X -P expanded=auto -c select 'a' as a
psql -X -P expanded=auto -c select 'a' as a | cat

I just noticed: the help text for \x in slashUsage() will also need an update.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-11-05 Thread Peter Eisentraut
On mån, 2011-09-19 at 07:06 +0300, Peter Eisentraut wrote:
 I found a simpler way to get this working.  Just hack up the catalogs
 for the new path directly.  So I can now run this test suite against
 older versions as well, like this:
 
 contrib/pg_upgrade$ make installcheck oldsrc=somewhere oldbindir=elsewhere

Any comments on how to proceed with this?  I think it has been useful in
detecting pg_upgrade breakage a few times already, so I'd like to commit
it and start using it.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Strange problem with create table as select * from table;

2011-11-05 Thread Tom Lane
I wrote:
 A different line of thought is that there's something about these
 specific source rows, and only these rows, that makes them vulnerable to
 corruption during INSERT/SELECT.  Do they by any chance contain any
 values that are unusual elsewhere in your table?  One thing I'm
 wondering about right now is the nulls bitmap --- so do these rows have
 nulls (or not-nulls) in any place that's unusual elsewhere?

Hah ... I have a theory.

I will bet that you recently added some column(s) to the source table
using ALTER TABLE ADD COLUMN and no default value, so that the added
columns were nulls and no table rewrite happened.  And that these
troublesome rows predate that addition, but contained no nulls before
that.  And that they are the only rows that, in addition to the above
conditions, contain data fields wide enough to require out-of-line
toasting.

These conditions together are enough to break the assumption in
toast_insert_or_update that the old and new tuples must have the same
value of t_hoff.  But it can only happen when the source tuple is an
original on-disk tuple, which explains why only INSERT ... SELECT *
causes the problem, not any variants that require projection of a new
column set.  When it does happen, toast_insert_or_update correctly
computes the required size of the new tuple ... but then it tells
heap_fill_tuple to fill the data part at offset olddata-t_hoff, which
is wrong (too small) and so the nulls bitmap that heap_fill_tuple
concurrently constructs will overwrite the first few data bytes.  In
your example, the table contains 49 columns so the nulls bitmap requires
7 bytes, just enough to overwrite the first 6 data bytes as observed.
(In fact, given the values we see being filled in, I can confidently say
that you have two added-since-creation null columns, no more, no less.)

I can reproduce the problem with the attached test case (using the
regression database).  With asserts enabled, the 
Assert(new_len == olddata-t_hoff);
fails.  With asserts off, corrupt data.

This is trivial to fix, now that we know there's a problem --- the
function is only using that assumption to save itself a couple lines
of code.  Penny wise, pound foolish :-(

regards, tom lane


drop table wide;

create table wide as
select
ten as firstc,
unique1 as unique1_1,
unique2 as unique2_1,
two as two_1,
four as four_1,
ten as ten_1,
twenty as twenty_1,
hundred as hundred_1,
thousand as thousand_1,
twothousand as twothousand_1,
fivethous as fivethous_1,
tenthous as tenthous_1,
odd as odd_1,
even as even_1,
stringu1 as stringu1_1,
stringu2 as stringu2_1,
string4 as string4_1,
unique1 as unique1_2,
unique2 as unique2_2,
two as two_2,
four as four_2,
ten as ten_2,
twenty as twenty_2,
hundred as hundred_2,
thousand as thousand_2,
twothousand as twothousand_2,
fivethous as fivethous_2,
tenthous as tenthous_2,
odd as odd_2,
even as even_2,
stringu1 as stringu1_2,
stringu2 as stringu2_2,
string4 as string4_2,
unique1 as unique1_3,
unique2 as unique2_3,
two as two_3,
four as four_3,
ten as ten_3,
twenty as twenty_3,
hundred as hundred_3,
thousand as thousand_3,
twothousand as twothousand_3,
fivethous as fivethous_3,
tenthous as tenthous_3,
odd as odd_3,
even as even_3,
repeat('xyzzyxydlkadlkndvlelfzzy', 2) as widec
from onek limit 10;

alter table wide add column nullc1 int;
alter table wide add column nullc2 int;

drop table widec;

create table widec as select * from wide;

select firstc, to_hex(unique1_1), unique2_1, to_hex(unique1_2) from widec;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: psql concise mode

2011-11-05 Thread Josh Kupershmidt
Hi all,

The good news is that psql's backslash commands are becoming quite
thorough at displaying all information which could conceivably be of
interest about an object. The bad news is, psql's backslash commands
often produce a lot of noise and wasted output. (There was some
grumbling along these lines re: the recent Stats Target patch).

I'd like to propose a concise mode for psql, which users might turn
on via a \pset option. Concise mode would affect only the output of
psql's backslash commands. For output results which have some all-NULL
columns, as in:

test=# \d+ foo
 Table public.foo
 Column |  Type   | Modifiers | Storage | Stats target | Description
+-+---+-+--+-
 a  | integer |   | plain   |  |
 b  | integer |   | plain   |  |
Has OIDs: no

Concise mode would simply omit the all-NULL columns, so that the
output would look like this:

test=# \d+ foo
 Table public.foo
 Column |  Type   | Storage
+-+-
 a  | integer | plain
 b  | integer | plain
Has OIDs: no

For actually implementing this: it'd be nice if the changes could be
localized to printQuery(). Unfortunately, there are a few stragglers
such as describeOneTableDetails() which have their own notions about
how to print their output, so I'm not sure how straightforward/small
such a patch would be.

But I just wanted to throw the idea out there. Any thoughts?

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] Strange problem with create table as select * from table;

2011-11-05 Thread Martijn van Oosterhout
On Fri, Nov 04, 2011 at 09:04:02PM -0400, Tom Lane wrote:
 Hah ... I have a theory.
 
 I will bet that you recently added some column(s) to the source table
 using ALTER TABLE ADD COLUMN and no default value, so that the added
 columns were nulls and no table rewrite happened.  And that these
 troublesome rows predate that addition, but contained no nulls before
 that.  And that they are the only rows that, in addition to the above
 conditions, contain data fields wide enough to require out-of-line
 toasting.
 
 These conditions together are enough to break the assumption in
 toast_insert_or_update that the old and new tuples must have the same
 value of t_hoff. 

Wow! Good catch.

 This is trivial to fix, now that we know there's a problem --- the
 function is only using that assumption to save itself a couple lines
 of code.  Penny wise, pound foolish :-(

No doubt the assumption was true when the code was written, but still.

Hve a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Include commit identifier in version() function

2011-11-05 Thread Jeff Janes
2011/10/28 Robert Haas robertmh...@gmail.com:
 2011/10/28 pasman pasmański pasma...@gmail.com:
 I think, it be useful to include in version() function a hexadecimal
 identifier of commit, for fast checkout to it in git.

 It's sort of impossible to make this accurate, though.  You may have
 patched your tree but not created a commit with those changes before
 you build; or if you have created a commit it may very well not be one
 that's in the public repo, but just something on our system.

Sure, I may be compiling changes that are not committed, but still
knowing the most recent commit/checkout that did happen would have
better granularity than just 9.2devel.  And if that commit hash is
not in the public repo, that is fine, as I would want such a thing for
my own use.

 Moreover, there's no real guarantee that you're compiling from a git
 tree at all; you could well be compiling from a tarball.

 All in all I feel like this is a solution in search of a problem.
 People shouldn't be using anything other than a released version in
 production,

That is probably true, but 99+% of the compilations and installs I do
are not intended for production use.  I would be happy to see this
feature in my own dev repository, even if it never did see its way
into production.  (It would already be there, if I knew of a way to
make it happen)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

2011-11-05 Thread Tomas Vondra
Hi everyone,

I propose a patch that would allow optional cleaning of queries
tracked in pg_stat_statements, compressing the result and making it more
readable.

The default behavior is that when the same query is run with different
parameter values, it's actually stored as two separate queries (the
string do differ).

A small example - when you run pgbench -S you'll get queries like
this

   SELECT abalance FROM pgbench_accounts WHERE aid = 12433
   SELECT abalance FROM pgbench_accounts WHERE aid = 2322
   SELECT abalance FROM pgbench_accounts WHERE aid = 52492

and so on, and each one is listed separately in the pg_stat_statements.
This often pollutes the pg_stat_statements.

The patch implements a simple cleaning that replaces the parameter
values with generic strings - e.g. numbers are turned to :n, so the
queries mentioned above are turned to

   SELECT abalance FROM pgbench_accounts WHERE aid = :n

and thus tracked as a single query in pg_stat_statements.

The patch provides an enum GUC (pg_stat_statements.clean) with three
options - none, basic and aggressive. The default option is none, the
basic performs the basic value replacement (described above) and
aggressive performs some additional cleaning - for example replaces
multiple spaces with a single one etc.

The parsing is intentionally very simple and cleans the query in a
single pass. Currently handles three literal types:

 a) string (basic, C-style escaped, Unicode-escaped, $-espaced)
 b) numeric (although 1.925e-3 is replaced by :n-:n)
 c) boolean (true/false)

There is probably room for improvement (e.g. handling UESCAPE).


Tomas
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
new file mode 100644
index 8dc3054..a3ace20
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*** static const struct config_enum_entry tr
*** 143,152 
--- 143,167 
{NULL, 0, false}
  };
  
+ typedef enum
+ {
+   PGSS_CLEAR_NONE,/* no query cleaning at all */
+   PGSS_CLEAR_BASIC,   /* basic parameter value 
replacement  */
+   PGSS_CLEAR_AGGRESSIVE   /* more replacements (spaces, comments) 
*/
+ } PGSSClearLevel;
+ 
+ static const struct config_enum_entry clear_options[] = {
+   {none, PGSS_CLEAR_NONE, false},
+   {basic, PGSS_CLEAR_BASIC, false},
+   {aggressive, PGSS_CLEAR_AGGRESSIVE, false},
+   {NULL, 0, false}
+ };
+ 
  static intpgss_max;   /* max # statements to track */
  static intpgss_track; /* tracking level */
  static bool pgss_track_utility; /* whether to track utility commands */
  static bool pgss_save;/* whether to save stats across 
shutdown */
+ static intpgss_clean; /* whether to clean query parameter 
values */
  
  
  #define pgss_enabled() \
*** static Size pgss_memsize(void);
*** 183,189 
  static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
  static void entry_reset(void);
! 
  
  /*
   * Module load callback
--- 198,204 
  static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
  static void entry_reset(void);
! static char * pgss_clean_query(const char * query);
  
  /*
   * Module load callback
*** _PG_init(void)
*** 252,257 
--- 267,284 
 NULL,
 NULL);
  
+ DefineCustomEnumVariable(pg_stat_statements.clean,
+Clean the queries 
(remove parameter values).,
+NULL,
+pgss_clean,
+PGSS_CLEAR_NONE,
+clear_options,
+PGC_SIGHUP,
+0,
+NULL,
+NULL,
+NULL);
+ 
EmitWarningsOnPlaceholders(pg_stat_statements);
  
/*
*** pgss_ExecutorFinish(QueryDesc *queryDesc
*** 583,588 
--- 610,618 
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
+ 
+   char * query;
+ 
if (queryDesc-totaltime  pgss_enabled())
{
/*
*** pgss_ExecutorEnd(QueryDesc *queryDesc)
*** 590,597 
 * levels of hook all do this.)
 */
InstrEndLoop(queryDesc-totaltime);
  
!   pgss_store(queryDesc-sourceText,
   

Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

2011-11-05 Thread Peter Geoghegan
2011/11/6 Tomas Vondra t...@fuzzy.cz:
 Hi everyone,

 The patch implements a simple cleaning that replaces the parameter
 values with generic strings - e.g. numbers are turned to :n, so the
 queries mentioned above are turned to

   SELECT abalance FROM pgbench_accounts WHERE aid = :n

 and thus tracked as a single query in pg_stat_statements.

I'm a couple of days away from posting a much better principled
implementation of pg_stat_statements normalisation. To normalise, we
perform a selective serialisation of the query tree, which is hashed.
This has the additional benefit of considering queries equivalent even
when, for example, there is a different amount of whitespace, or if
queries differ only in their use of aliases, or if queries differ only
in that one uses a noise word the other doesn't. It also does things
like intelligently distinguishing between queries with different
limit/offset constant values, as these constants are deemed to be
differentiators of queries for our purposes. A guiding principal that
I've followed is that anything that could result in a different plan
is a differentiator of queries.

I intend to maintain a backwards compatible version, as this can be
expected to work with earlier versions of Postgres.

There will be additional infrastructure added to the parser to support
normalisation of query strings for the patch I'll be submitting (that
obviously won't be supported in the version that builds against
existing Postgres versions that I'll make available). Essentially,
I'll be adding a length field to certain nodes, to go alongside the
existing location field (currently just used to highlight an offending
token in an error message outside the parser). pg_stat_statements will
then use the location and length field of const nodes to swap out
constants in the query string.

It's unfortunate that there was a duplicate effort here. With respect,
the approach that you've taken probably would have turned out to have
been a bit of a tar pit - I'm reasonably sure that had you pursued it,
you'd have found yourself duplicating quite a bit of the parser as new
problems came to light.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

2011-11-05 Thread Tomas Vondra
Dne 6.11.2011 03:16, Peter Geoghegan napsal(a):
 2011/11/6 Tomas Vondra t...@fuzzy.cz:
 Hi everyone,
 
 The patch implements a simple cleaning that replaces the parameter
 values with generic strings - e.g. numbers are turned to :n, so the
 queries mentioned above are turned to

   SELECT abalance FROM pgbench_accounts WHERE aid = :n

 and thus tracked as a single query in pg_stat_statements.
 
 I'm a couple of days away from posting a much better principled
 implementation of pg_stat_statements normalisation. To normalise, we
 perform a selective serialisation of the query tree, which is hashed.

OK, my patch definitely is not the only possible and if there's a way to
get more info from the planner, the results surely might be better. My
goal was to provide a simple patch that solves the problem better then
I'll be more than happy to remove mine.

 This has the additional benefit of considering queries equivalent even
 when, for example, there is a different amount of whitespace, or if
 queries differ only in their use of aliases, or if queries differ only
 in that one uses a noise word the other doesn't. It also does things

Yup, that's nice. And achieving the same behavior (aliases, noise) would
require a much more complex parser than the one I wrote.

 like intelligently distinguishing between queries with different
 limit/offset constant values, as these constants are deemed to be
 differentiators of queries for our purposes. A guiding principal that
 I've followed is that anything that could result in a different plan
 is a differentiator of queries.

Not sure if I understand it correctly - do you want to keep multiple
records for a query, for each differentiator (different limit/offset
values, etc.)? That does not make much sense to me - even a different
parameter value may cause change of the plan, so I don't see much
difference between a query parameter, limit/offset constant values etc.

If it was possible to compare the actual plan (or at least a hash of the
plan), and keeping one record for each plan, that'd be extremely nice.
You'd see that the query was executed with 3 different plans, number of
calls, average duration etc.

 I intend to maintain a backwards compatible version, as this can be
 expected to work with earlier versions of Postgres.

 There will be additional infrastructure added to the parser to support
 normalisation of query strings for the patch I'll be submitting (that
 obviously won't be supported in the version that builds against
 existing Postgres versions that I'll make available). Essentially,
 I'll be adding a length field to certain nodes, to go alongside the
 existing location field (currently just used to highlight an offending
 token in an error message outside the parser). pg_stat_statements will
 then use the location and length field of const nodes to swap out
 constants in the query string.
 
 It's unfortunate that there was a duplicate effort here. With respect,
 the approach that you've taken probably would have turned out to have
 been a bit of a tar pit - I'm reasonably sure that had you pursued it,
 you'd have found yourself duplicating quite a bit of the parser as new
 problems came to light.

The duplicate effort is not a problem. I needed to learn something more
about the internals and how to use the executor hooks, and the
pg_stat_statements is a neat place to learn this. The patch is rather a
side effect of this effort, so no waste on my side.

Anyway you're right that the approach I've taken is not much extensible
without building a parser parallel to the actual one. It was meant as a
simple solution not solving the problem perfectly, just sufficiently for
what I need.

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Measuring relation free space

2011-11-05 Thread Greg Smith
Attached patch adds a new function to the pageinspect extension for 
measuring total free space, in either tables or indexes.  It returns the 
free space as a percentage, so higher numbers mean more bloat.  After 
trying a couple of ways to quantify it, I've found this particular 
measure correlates well with the nastiest bloat issues I've ran into in 
production recently.  For example, an index that had swelled to over 5X 
its original size due to autovacuum issues registered at 0.86 on this 
scale.  I could easily see people putting an alert at something like 
0.40 and picking candidates to reindex based on it triggering.  That 
would be about a million times smarter than how I've been muddling 
through this class of problems so far.


Code by Jaime Casanova, based on a prototype by me.  Thanks to attendees 
and sponsors of the PgWest conference for helping to fund some deeper 
exploration of this idea.


Here's a test case showing it in action:

create extension pageinspect;
create table t (k serial,v integer);
insert into t(v) (select generate_series(1,10));
create index t_idx on t(k);
delete from t where k5;
vacuum t;

gsmith=# select relation_free_space('t');
 relation_free_space
-
0.445466

gsmith=# select relation_free_space('t_idx');
 relation_free_space
-
0.550946

Some open questions in my mind:

-Given this is doing a full table scan, should it hook into a ring 
buffer to keep from trashing the buffer cache?  Or might it loop over 
the relation in a different way all together?  I was thinking about 
eyeing the FSM instead at one point, didn't explore that yet.  There's 
certainly a few ways to approach this, we just aimed at the easiest way 
to get a working starter implementation, and associated results to 
compare others against.


-Should there be a non-superuser version of this?  We'd certainly need 
to get a less cache demolishing version before that would seem wise.


-There were related things in the pageinspect module, but a case could 
be made for this being a core function instead.  It's a bit more likely 
to be used in production than the rest of that extension.


-What if anything related to TOAST should this handle?

We're also planning to do a sampling version of this, using the same 
approach ANALYZE does.  Grab a number of blocks, extrapolate from 
there.  It shouldn't take many samples before the accuracy is better 
than how people are estimated this now.  That work is just waiting on 
some better thinking about how to handle the full relation version first.


And, yes, the explanation in the docs and code should be clear that it's 
returning a percentage, which I just realized when writing this.  At 
least I remembered to document something; still ahead of the average new 
patch...


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index dbb2158..aac9148 100644
*** a/contrib/pageinspect/btreefuncs.c
--- b/contrib/pageinspect/btreefuncs.c
***
*** 34,39 
--- 34,40 
  #include utils/builtins.h
  #include utils/rel.h
  
+ #include btreefuncs.h
  
  extern Datum bt_metap(PG_FUNCTION_ARGS);
  extern Datum bt_page_items(PG_FUNCTION_ARGS);
*** GetBTPageStatistics(BlockNumber blkno, B
*** 155,160 
--- 156,202 
  		stat-avg_item_size = 0;
  }
  
+ /*
+  * GetBTRelationFreeSpace
+  *
+  * Get the free space for a btree index.
+  * This is a helper function for relation_free_space()
+  *
+  */
+ float4
+ GetBTRelationFreeSpace(Relation rel)
+ {
+ 	BTPageStat stat;
+ 
+ 	Buffer		buffer;
+ 	BlockNumber blkno;
+ 	BlockNumber totalBlcksInRelation = RelationGetNumberOfBlocks(rel);
+ 	Size 		free_space = 0;
+ 	double		free_percent = 0;
+ 	
+ 	/* Skip page 0 because it is a metapage */
+ 	for (blkno = 1; blkno  totalBlcksInRelation; blkno++)
+ 	{
+ 		buffer = ReadBuffer(rel, blkno);
+ 		/* 
+ 		 * get the statistics of the indexes and use that info
+ 		 * to determine free space on the page
+ 		 */
+ 		GetBTPageStatistics(blkno, buffer, stat);
+ 		if (stat.type == 'd')
+ 			free_space += stat.page_size;
+ 		else
+ 			free_space += stat.free_size;		
+ 
+ 		ReleaseBuffer(buffer);
+ 	}
+ 
+ 	if (totalBlcksInRelation  1)
+ 		free_percent = (free_space * 1.0) / ((totalBlcksInRelation - 1) * BLCKSZ);
+ 	return free_percent;
+ }
+ 
+ 
  /* ---
   * bt_page()
   *
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index fa50655..c1d72ba 100644
*** a/contrib/pageinspect/heapfuncs.c
--- b/contrib/pageinspect/heapfuncs.c
***
*** 28,33 
--- 28,36 
  #include funcapi.h
  #include utils/builtins.h
  #include miscadmin.h
+ #include 

Re: [HACKERS] Include commit identifier in version() function

2011-11-05 Thread Robert Haas
On Sat, Nov 5, 2011 at 2:46 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Moreover, there's no real guarantee that you're compiling from a git
 tree at all; you could well be compiling from a tarball.

 All in all I feel like this is a solution in search of a problem.
 People shouldn't be using anything other than a released version in
 production,

 That is probably true, but 99+% of the compilations and installs I do
 are not intended for production use.  I would be happy to see this
 feature in my own dev repository, even if it never did see its way
 into production.  (It would already be there, if I knew of a way to
 make it happen)

I wouldn't mind having it in my repository either, though I suspect it
would be of only occasional use to me in practice.  I think your
parenthetical note hints at the real problem: right now, there's
nothing in our repository that knows anything about the repository
itself.  To make something like this work, you'd have to know, for
example, that on Bruce's machine, the git you need to use is actually
called pggit, and is probably located at some pathname that would make
the gods of standardization cringe.  Leaving aside the wisdom of such
configurations, upon which all sorts of doubt may be and has been
cast, they cause the rest of us no inconvenience right now, and doing
something like this would change that.  Or you could ignore the git
executables and just look for a .git directory that you could read by
hand; but then you're at the mercy of any changes in the way that git
does things.  Every way I can think of to do something like this would
fail in some situations in which everything now works.

I've taken to installing test builds on my machine under
$HOME/install/$IDENTIFIER, where $IDENTIFIER is a branch name or the
first 10 characters of a commit SHA.  I've found that to be a pretty
handy solution to this problem.  It still doesn't account for the
possibility that different builds were done with different options,
but at least I know what source code I was building from (the tip of
branch X as of whenever I did the install, or that specific commit).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers