Re: [HACKERS] Typed hstore proposal

2011-12-22 Thread Johann 'Myrkraverk' Oskarsson
Christopher Browne cbbro...@gmail.com writes:

 On Wed, Dec 21, 2011 at 8:32 PM, Johann 'Myrkraverk' Oskarsson
 joh...@2ndquadrant.com wrote:
 I mean to create a typed hstore, called tstore for now.  I'm open
 to name suggestions.  It'll only support a subset of core Postgres
 types to begin with.  Keys are always text, it's the value that's
 typed.

 JSON is the thing of the day that it would be desirable for this to
 be potent enough to represent, and JSON has the following types:

 1.  Number (in practice, FLOAT)
 2.  String (UTF-8)
 3.  Boolean (t/f)
 4.  Array (not necessarily of uniform type
 5.  Object (string key, JSON value pairs, unordered)
 6.  NULL

 #4 and #5 are obviously entirely more hairy.

Not so much if I extend tstore to include itself.

Unless I'm mistaking the Object type.  I'm not a user of JSON myself.
How are people likely to use it with it?


-- 
   Johann Oskarssonhttp://www.2ndquadrant.com/|[]
   PostgreSQL Development, 24x7 Support, Training and Services  --+--
  |
   Blog: http://my.opera.com/myrkraverk/blog/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication timeout units

2011-12-22 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 from postgresql.conf.sample:
 #replication_timeout = 60s# in milliseconds; 0 disables

 Seconds or milliseconds? I would suggest we just remove the in
 milliseconds, and instead say timeout for replication connections; 0
 disables.

That would take away any information about whether it could be set at
sub-second resolution.  I don't like the current wording either, but
just removing the in milliseconds seems wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cursor behavior

2011-12-22 Thread amit sehas
It seems that the task of fetching next n results without moving the cursor 
seems like too complicated to implement for any query that has
even a little bit of complication in it...

--- On Wed, 12/21/11, Robert Haas robertmh...@gmail.com wrote:

 From: Robert Haas robertmh...@gmail.com
 Subject: Re: [HACKERS] Cursor behavior
 To: amit sehas cu...@yahoo.com
 Cc: pgsql-hackers@postgresql.org
 Date: Wednesday, December 21, 2011, 8:43 AM
 On Thu, Dec 15, 2011 at 4:15 PM, amit
 sehas cu...@yahoo.com
 wrote:
  I had a question about the cursor internals
 implementation. When you Fetch next 'n' results without
 moving the cursors, is this kind of functionality
 implemented by firstly executing the whole query and then
 moving the cursor over the results, or are the movements
 done on active database lookups, moving forward and
 backward...
 
 I think it depends on the query.  For example, I
 believe that a query
 involving writeable CTEs will be run to completion before
 returning
 any results, but I believe that a seqscan will not.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-22 Thread Benedikt Grundmann
Let me mention another lightweight data-interchange format.

At http://www.janestreet.com we have developed a small c module to deal
with S-expressions (sexp) as a way to store arbitrary data.  As we write
most of our code in OCaml sexps are a natural way for us to store data.
http://hg.ocaml.info/release/sexplib/ provides automatic ways to convert
any ocaml value into a sexp).

The extension is still pretty new but we use it successfully on a daily
basis.  After we have upgraded to 9.x we will pack it as an extension 
and releast it opensource.

API wise the module at the moment offers the following:

sexp_validate(text) returns boolean
Validate that the passed in text is a valid s expression.

create domain sexp as text check (sexp_validate(value));

BTW: It is a PITA that arrays of domains are not valid types.

And several functions to manipulate take apart sexp's or modify sexp's
using a path into the sexp (similar to what xpath does for xml).

Such as:

sexp_get(sexp, text) returns sexp
Get the sub sexp of sexp identified by the path.  
Returns NULL if path is not a valid path in sexp.
Example:
   path=.a space.b.[1].x
   ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) )))
- The Value

And sexp_get_atom(sexp, text) returns text
Get the sub atom of sexp identified by the path.  
Returns NULL if path is not a valid path in sexp or
does not identify an atom.
Example:
   path=.a space.b.[1].x
   ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) )))
^
- The Value

Cheers,

Bene

On 20/12/11 19:39, Claes Jakobsson wrote:
 On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote:
  On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:
  
  My understanding is that JSON is a subset of ECMAscript
  
  Well, no, JSON is formally “a lightweight data-interchange format.” It’s 
  derived from JavaScript syntax, but it is not a programming language, so I 
  wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.
  
   http://json.org/
 
 Are people explicitly asking for a) *JSON* datatype or b) a type that lets 
 you store arbitrary complex semi-untyped data structures?
 
 if b) then this might get a lot more interesting
 
 Cheers,
 Claes
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-22 Thread Leonardo Francalanci

Agreed.

I do agree with Heikki that it really ought to be the OS problem, but
then we thought that about dtrace and we're still waiting for that or
similar to be usable on all platforms (+/- 4 years).



My point is that it looks like this is going to take 1-2 years in 
postgresql, so it looks like a huge job... but at the same time I 
understand we can't hope other filesystems will catch up!



I guess this feature will be tunable (off/on)?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Florian Weimer
* David Fetter:

 The issue is that double writes needs a checksum to work by itself,
 and page checksums more broadly work better when there are double
 writes, obviating the need to have full_page_writes on.

How desirable is it to disable full_page_writes?  Doesn't it cut down
recovery time significantly because it avoids read-modify-write cycles
with a cold cache?

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Real-life range datasets

2011-12-22 Thread Benedikt Grundmann
Hello,

We have a table in a postgres 8.4 database that would make use of date
ranges and exclusion constraints if they were available.  Sadly I cannot
give you the data as it is based on data we are paying for and as part
of the relevant licenses we are obliqued to not give the data to third
parties.

Basically the tables keep meta data about financial instruments on
a given day.  Thanks to corporate actions (companies merging, splitting
up, etc...) that meta data can be different from one day to the next.

One way to model such a table is:

identifier, date, payload columns...


unique index on (date, identifier)

(and in fact some of the payload columns are unique per day indices
as well).

But because there are a large number of rows per day and most don't
change this is a very wasteful representation.

Instead we use this

identifier, effective_from, effective_until, payload columns...

And we have some clever plpgsql functions that merge a days snapshot
into that representation.  That happens only a few times per day and
is currently quite slow mostly because a lot of time is spend in 
validation triggers to check that there are no overlapping entries
for effective_from,effective_until for jane_symbol and a few other
identifiers.

The most common operations are: 

  Get all or most rows of a given day 

(select ... from instruments where :date between effective_from and 
effective_until)

left join of the instruments (again in the normal case constrained to
one day but in same cases periods of a week or a few month)

select ... from t left join instruments on 
  t.jane_symbol = instruments.jane_symbol
  t.date between instruments.effective_from and t.effective_until
  where t.date = X
and additional constraint on the number of rows from t

With t a huge table clustered on date with roughly 500,000 to 2,000,000 
entries per day.  The left join would work most of the time (my guess is
more than 90%).  But there are entries in t where the jane_symbol would
not be in instruments (sadly).

Current size (immediately after a cluster):

  table toast(all indices)   total 
| 1268 MB | 900 MB | 693 MB| 2861 MB

= select min(effective_from), max(effective_from) from instruments;
min |max 
+
 2011-05-30 | 2011-12-21
(1 row)

b= select count(*) from instruments where current_date - 1 between 
effective_from and effective_until ;
 count  

 358741
(1 row)

I should be able to give you a table with the same characteristics as
the instruments table but bogus data by replacing all entries in the
table with random strings of the same length or something like that.
I can probably take a little bit of time during this or the next week
to generate such fake real world data ;-)   Is there an ftp site to
upload the gzipped pg_dump file to?

Cheers,

Bene

On 20/12/11 16:48, Alexander Korotkov wrote:
 Hackers,
 
 For better GiST indexing of range types it's important to have real-life
 datasets for testing on. Real-life range datasets would help to proof (or
 reject) some concepts and get more realistic benchmarks. Also, it would be
 nice to know what queries you expect to run fast on that datasets. Ideally
 it should be real-life set of queries, but it also could be your
 presentation of what are typical queries  for such datasets.
 Thanks!
 
 -
 With best regards,
 Alexander Korotkov.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 7:44 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 22.12.2011 01:43, Tom Lane wrote:

 A utility to bump the page version is equally a whole lot easier said
 than done, given that the new version has more overhead space and thus
 less payload space than the old.  What does it do when the old page is
 too full to be converted?  Move some data somewhere else might be
 workable for heap pages, but I'm less sanguine about rearranging indexes
 like that.  At the very least it would imply that the utility has full
 knowledge about every index type in the system.


 Remembering back the old discussions, my favorite scheme was to have an
 online pre-upgrade utility that runs on the old cluster, moving things
 around so that there is enough spare room on every page. It would do normal
 heap updates to make room on heap pages (possibly causing transient
 serialization failures, like all updates do), and split index pages to make
 room on them. Yes, it would need to know about all index types. And it would
 set a global variable to indicate that X bytes must be kept free on all
 future updates, too.

 Once the pre-upgrade utility has scanned through the whole cluster, you can
 run pg_upgrade. After the upgrade, old page versions are converted to new
 format as pages are read in. The conversion is staightforward, as there the
 pre-upgrade utility ensured that there is enough spare room on every page.

That certainly works, but we're still faced with pg_upgrade rewriting
every page, which will take a significant amount of time and with no
backout plan or rollback facility. I don't like that at all, hence why
I think we need an online upgrade facility if we do have to alter page
headers.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 8:42 AM, Florian Weimer fwei...@bfk.de wrote:
 * David Fetter:

 The issue is that double writes needs a checksum to work by itself,
 and page checksums more broadly work better when there are double
 writes, obviating the need to have full_page_writes on.

 How desirable is it to disable full_page_writes?  Doesn't it cut down
 recovery time significantly because it avoids read-modify-write cycles
 with a cold cache?

It's way too late in the cycle to suggest removing full page writes or
code them. We're looking to add protection, not swap out existing
ones.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jesper Krogh

On 2011-12-22 09:42, Florian Weimer wrote:

* David Fetter:


The issue is that double writes needs a checksum to work by itself,
and page checksums more broadly work better when there are double
writes, obviating the need to have full_page_writes on.

How desirable is it to disable full_page_writes?  Doesn't it cut down
recovery time significantly because it avoids read-modify-write cycles
with a cold cache

What is the downsides of having full_page_writes enabled .. except from
log-volume? The manual mentions something about speed, but it is
a bit unclear where that would come from, since the full pages must
be somewhere in memory when being worked on anyway,.

Anyway, I have an archive_command that looks like:
archive_command = 'test ! -f /data/wal/%f.gz  gzip --fast  %p  
/data/wal/%f.gz'


It brings on along somewhere between 50 and 75% reduction in log-volume
with no cost on the production system (since gzip just occupices one 
of the

many cores on the system) and can easily keep up even during
quite heavy writes.

Recovery is a bit more tricky, because hooking gunzip into the command 
there
will cause the system to replay log, gunzip, read data, replay log cycle 
where the gunzip
easily could be done on the other logfiles while replay are being done 
on one.


So a straightforward recovery will cost in recovery time, but that can 
be dealt with.


Jesper
--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Simon Riggs  wrote:
 
 So overall, I do now think its still possible to add an optional
 checksum in the 9.2 release and am willing to pursue it unless
 there are technical objections.
 
Just to restate Simon's proposal, to make sure I'm understanding it,
we would support a new page header format number and the old one in
9.2, both to be the same size and carefully engineered to minimize
what code would need to be aware of the version.  PageHeaderIsValid()
and PageInit() certainly would, and we would need some way to set,
clear (maybe), and validate a CRC.  We would need a GUC to indicate
whether to write the CRC, and if present we would always test it on
read and treat it as a damaged page if it didn't match.  (Perhaps
other options could be added later, to support recovery attempts, but
let's not complicate a first cut.)  This whole idea would depend on
either (1) trusting your storage system never to tear a page on write
or (2) getting the double-write feature added, too.
 
I see some big advantages to this over what I suggested to David. 
For starters, using a flag bit and putting the CRC somewhere other
than the page header would require that each AM deal with the CRC,
exposing some function(s) for that.  Simon's idea doesn't require
that.  I was also a bit concerned about shifting tuple images to
convert non-protected pages to protected pages.  No need to do that,
either.  With the bit flags, I think there might be some cases where
we would be unable to add a CRC to a converted page because space was
too tight; that's not an issue with Simon's proposal.
 
Heikki was talking about a pre-convert tool.  Neither approach really
needs that, although with Simon's approach it would be possible to
have a background *post*-conversion tool to add CRCs, if desired. 
Things would continue to function if it wasn't run; you just wouldn't
have CRC protection on pages not updated since pg_upgrade was run.
 
Simon, does it sound like I understand your proposal?
 
Now, on to the separate-but-related topic of double-write.  That
absolutely requires some form of checksum or CRC to detect torn
pages, in order for the technique to work at all.  Adding a CRC
without double-write would work fine if you have a storage stack
which prevents torn pages in the file system or hardware driver.  If
you don't have that, it could create a damaged page indication after
a hardware or OS crash, although I suspect that would be the
exception, not the typical case.  Given all that, and the fact that
it would be cleaner to deal with these as two separate patches, it
seems the CRC patch should go in first.  (And, if this is headed for
9.2, *very soon*, so there is time for the double-write patch to
follow.)
 
It seems to me that the full_page_writes GUC could become an
enumeration, with off having the current meaning, wal meaning
what on now does, and double meaning that the new double-write
technique would be used.  (It doesn't seem to make any sense to do
both at the same time.)  I don't think we need a separate GUC to tell
us *what* to protect against torn pages -- if not off we should
always protect the first write of a page after checkpoint, and if
double and write_page_crc (or whatever we call it) is on, then we
protect hint-bit-only writes.  I think.  I can see room to argue that
with CRCs on we should do a full-page write to the WAL for a
hint-bit-only change, or that we should add another GUC to control
when we do this.
 
I'm going to take a shot at writing a patch for background hinting
over the holidays, which I think has benefit alone but also boosts
the value of these patches, since it would reduce double-write
activity otherwise needed to prevent spurious error when using CRCs.
 
This whole area has some overlap with spreading writes, I think.  The
double-write approach seems to count on writing a bunch of pages
(potentially from different disk files) sequentially to the
double-write buffer, fsyncing that, and then writing the actual pages
-- which must be fsynced before the related portion of the
double-write buffer can be reused.  The simple implementation would
be to simply fsync the files just written to if they required a prior
write to the double-write buffer, although fancier techniques could
be used to try to optimize that.  Again, setting hint bits set before
the write when possible would help reduce the impact of that.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 4:00 AM, Jesper Krogh jes...@krogh.cc wrote:
 On 2011-12-22 09:42, Florian Weimer wrote:

 * David Fetter:

 The issue is that double writes needs a checksum to work by itself,
 and page checksums more broadly work better when there are double
 writes, obviating the need to have full_page_writes on.

 How desirable is it to disable full_page_writes?  Doesn't it cut down
 recovery time significantly because it avoids read-modify-write cycles
 with a cold cache

 What is the downsides of having full_page_writes enabled .. except from
 log-volume? The manual mentions something about speed, but it is
 a bit unclear where that would come from, since the full pages must
 be somewhere in memory when being worked on anyway,.



I thought I will share some of my perspective on this checksum +
doublewrite from a performance point of view.

Currently from what I see in our tests based on dbt2, DVDStore, etc
is that checksum does not impact scalability or total throughput
measured. It does increase CPU cycles depending on the algorithm used
by not really anything that causes problems. The Doublewrite change
will be the big win to performance compared to full_page_write.  For
example compared to other databases our WAL traffic is one of the
highest. Most of it is attributed to full_page_write. The reason
full_page_write is necessary in production (atleast without worrying
about replication impact) is that if a write fails, we can recover
that whole page from WAL Logs as it is and just put it back out there.
(In fact I believe thats the recovery does.) However the net impact is
during high OLTP the runtime impact on WAL is high due to the high
traffic and compared to other databases due to the higher traffic, the
utilization is high. Also this has a huge impact on transaction
response time the first time a page is changed which in all OLTP
environments it is huge because by nature the transactions are all on
random pages.

When we use Doublewrite with checksums, we can safely disable
full_page_write causing a HUGE reduction to the WAL traffic without
loss of reliatbility due to a write fault since there are two writes
always. (Implementation detail discussable). Since the double writes
itself are sequential bundling multiple such writes further reduces
the write time. The biggest improvement is that now these writes are
not done during TRANSACTION COMMIT but during CHECKPOINT WRITES which
improves performance drastically for OLTP application's transaction
performance  and you still get the reliability that is needed.

Typically  Performance in terms of throughput tps system is like
tps(Full_page Write)  tps (no full page write)
With the double write and CRC we see
tps (Full_page_write)  tps (Doublewrite)  tps(no full page
write)
Which is a big win for production systems to get the reliability of
full_page write.

Also the side effect for response times is that they are more leveled
unlike full page write where the response time varies like  0.5ms to
5ms depending on whether the same transaction needs to write a full
page onto WAL or not.  With doublewrite it can always be around 0.5ms
rather than have a huge deviation on transaction performance. With
this folks measuring the 90 %ile  response time will see a huge relief
on trying to meet their SLAs.

Also from WAL perspective, I like to put the WAL on its own
LUN/spindle/VMDK etc .. The net result that I have with the reduced
WAL traffic, my utilization drops which means the same hardware can
now handle higher WAL traffic in terms of IOPS resulting that WAL
itself becomes lesser of a bottleneck. Typically this is observed by
the reduction in response times of the transactions and increase in
tps till some other bottleneck becomes the gating factor.

So overall this is a big win.

Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typed hstore proposal

2011-12-22 Thread Tom Lane
Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes:
 I mean to create a typed hstore, called tstore for now.

Um ... what is the point of this, exactly?  From what I've seen, most
applications for hstore are pretty happy with the fact that hstore is
only weakly typed, and if an entry *is* an integer, or a float, or
whatever else, it's not hard to cast to and from text as needed.
So this idea looks like a solution in search of a problem, which is
going to need a whole lot more work before it even gets to the point of
being as useful as hstore.  It's not for instance apparent what is the
use of iterating over only entries that were supplied as integers ---
there is no reason to think that they're related just because of that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Real-life range datasets

2011-12-22 Thread Oleg Bartunov

Bene,

we have pgfoundry project http://pgfoundry.org/projects/dbsamples/.
Since your sample database is very important (for me also), I suggest to use
this site.

Oleg
On Thu, 22 Dec 2011, Benedikt Grundmann wrote:


Hello,

We have a table in a postgres 8.4 database that would make use of date
ranges and exclusion constraints if they were available.  Sadly I cannot
give you the data as it is based on data we are paying for and as part
of the relevant licenses we are obliqued to not give the data to third
parties.

Basically the tables keep meta data about financial instruments on
a given day.  Thanks to corporate actions (companies merging, splitting
up, etc...) that meta data can be different from one day to the next.

One way to model such a table is:

identifier, date, payload columns...


unique index on (date, identifier)

(and in fact some of the payload columns are unique per day indices
as well).

But because there are a large number of rows per day and most don't
change this is a very wasteful representation.

Instead we use this

identifier, effective_from, effective_until, payload columns...

And we have some clever plpgsql functions that merge a days snapshot
into that representation.  That happens only a few times per day and
is currently quite slow mostly because a lot of time is spend in
validation triggers to check that there are no overlapping entries
for effective_from,effective_until for jane_symbol and a few other
identifiers.

The most common operations are:

 Get all or most rows of a given day

(select ... from instruments where :date between effective_from and 
effective_until)

left join of the instruments (again in the normal case constrained to
one day but in same cases periods of a week or a few month)

select ... from t left join instruments on
 t.jane_symbol = instruments.jane_symbol
 t.date between instruments.effective_from and t.effective_until
 where t.date = X
   and additional constraint on the number of rows from t

With t a huge table clustered on date with roughly 500,000 to 2,000,000
entries per day.  The left join would work most of the time (my guess is
more than 90%).  But there are entries in t where the jane_symbol would
not be in instruments (sadly).

Current size (immediately after a cluster):

 table toast(all indices)   total
| 1268 MB | 900 MB | 693 MB| 2861 MB

= select min(effective_from), max(effective_from) from instruments;
   min |max
+
2011-05-30 | 2011-12-21
(1 row)

b= select count(*) from instruments where current_date - 1 between 
effective_from and effective_until ;
count

358741
(1 row)

I should be able to give you a table with the same characteristics as
the instruments table but bogus data by replacing all entries in the
table with random strings of the same length or something like that.
I can probably take a little bit of time during this or the next week
to generate such fake real world data ;-)   Is there an ftp site to
upload the gzipped pg_dump file to?

Cheers,

Bene

On 20/12/11 16:48, Alexander Korotkov wrote:

Hackers,

For better GiST indexing of range types it's important to have real-life
datasets for testing on. Real-life range datasets would help to proof (or
reject) some concepts and get more realistic benchmarks. Also, it would be
nice to know what queries you expect to run fast on that datasets. Ideally
it should be real-life set of queries, but it also could be your
presentation of what are typical queries  for such datasets.
Thanks!

-
With best regards,
Alexander Korotkov.





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typed hstore proposal

2011-12-22 Thread Benedikt Grundmann
On 22/12/11 10:44, Tom Lane wrote:
 Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes:
  I mean to create a typed hstore, called tstore for now.
 
 Um ... what is the point of this, exactly?  From what I've seen, most
 applications for hstore are pretty happy with the fact that hstore is
 only weakly typed, and if an entry *is* an integer, or a float, or
 whatever else, it's not hard to cast to and from text as needed.

More over it is also easy with the current hstore to add constraints like this:

  contracts_is_an_integer CHECK ((tags - 'contracts'::text) ~ 
'^[0-9]+$'::text)

to ensure that it actually is.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Jignesh Shah jks...@gmail.com wrote:
 
 When we use Doublewrite with checksums, we can safely disable
 full_page_write causing a HUGE reduction to the WAL traffic
 without loss of reliatbility due to a write fault since there are
 two writes always. (Implementation detail discussable).
 
The always there surprised me.  It seemed to me that we only need
to do the double-write where we currently do full page writes or
unlogged writes.  In thinking about your message, it finally struck
me that this might require a WAL record to be written with the
checksum (or CRC; whatever we use).  Still, writing a WAL record
with a CRC prior to the page write would be less data than the full
page.  Doing double-writes instead for situations without the torn
page risk seems likely to be a net performance loss, although I have
no benchmarks to back that up (not having a double-write
implementation to test).  And if we can get correct behavior without
doing either (the checksum WAL record or the double-write), that's
got to be a clear win.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 1:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I understand why you say that and take no offence. All I can say is
 last time I has access to a good test rig and well structured
 reporting and analysis I was able to see evidence of what I described
 to you here.

 I no longer have that access, which is the main reason I've not done
 anything in the last few years. We both know you do have good access
 and that's the main reason I'm telling you about it rather than just
 doing it myself.

Right.  But I need more details.  If I know what to test and how to
test it, I can do it.  Otherwise, I'm just guessing.  I dislike
guessing.

You mentioned latency so this morning I ran pgbench with -l and
graphed the output.  There are latency spikes every few seconds.  I'm
attaching the overall graph as well as the graph of the last 100
seconds, where the spikes are easier to see clearly.  Now, here's the
problem: it seems reasonable to hypothesize that the spikes are due to
CLOG page replacement since the frequency is at least plausibly right,
but this is obviously not enough to prove that conclusively.  Ideas?

Also, if it is that, what do we do about it?  I don't think any of the
ideas proposed so far are going to help much.  Increasing the number
of CLOG buffers isn't going to fix the problem that once they're all
dirty, you have to write and fsync one before pulling in the next one.
 Striping might actually make it worse - everyone will move to the
next buffer right around the same time, and instead of everybody
waiting for one fsync, they'll each be waiting for their own.  Maybe
the solution is to have the background writer keep an eye on how many
CLOG buffers are dirty and start writing them out if the number gets
too big.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: latency.pngattachment: latency-end.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typed hstore proposal

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 10:44 AM, Tom Lane wrote:

Johann 'Myrkraverk' Oskarssonjoh...@2ndquadrant.com  writes:

I mean to create a typed hstore, called tstore for now.

Um ... what is the point of this, exactly?  From what I've seen, most
applications for hstore are pretty happy with the fact that hstore is
only weakly typed, and if an entry *is* an integer, or a float, or
whatever else, it's not hard to cast to and from text as needed.
So this idea looks like a solution in search of a problem, which is
going to need a whole lot more work before it even gets to the point of
being as useful as hstore.  It's not for instance apparent what is the
use of iterating over only entries that were supplied as integers ---
there is no reason to think that they're related just because of that.




Yeah, the thing that's annoying in some cases about hstore is not that 
it's untyped but that it's flat.


That's what a JSON type would buy us, a lightweight tree structured 
type, and moreover one that is widely and increasingly used and well 
understood.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wishlist: parameterizable types

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 1:04 AM, Joey Adams joeyadams3.14...@gmail.com wrote:
 What I'm wondering is: how complex would it be to add such a feature
 to PostgreSQL's type system?

Very.

It's been discussed before, although I can't tell you the subject
lines of the threads off the top of my head.  The basic problem is
that there is a ton of code that assumes that a type is fully
identified by an OID, and all of that code would need to be changed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every

2011-12-22 Thread Robert Haas
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp ma...@juffo.org wrote:
 PS: It seems that the min/max optimization isn't documented in the
 manual (apart from release notes), so I didn't include any doc changes
 in this patch.

I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade with plpython is broken

2011-12-22 Thread Robert Haas
On Mon, Dec 19, 2011 at 10:16 AM, Peter Eisentraut pete...@gmx.net wrote:
 Upgrading an instance containing plpython from =8.4 to =9.0 is broken
 because the module plpython.so was renamed to plpython2.so, and so the
 pg_upgrade check for loadable libraries fails thus:

    Your installation references loadable libraries that are missing from the
    new installation.  etc.

 Installing a symlink fixes the issue.  Should we teach pg_upgrade about
 this renaming, or should we install the symlink as part of the standard
 installation?

I feel like this is a pg_upgrade bug, not so much a PL/python problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: bytea_agg

2011-12-22 Thread Robert Haas
On Wed, Dec 21, 2011 at 5:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 this patch adds a bytea_agg aggregation.

 It allow fast bytea concatetation.

Looks fine to me.  I'll commit this, barring objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every

2011-12-22 Thread Marti Raudsepp
On Thu, Dec 22, 2011 at 18:41, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp ma...@juffo.org wrote:
 PS: It seems that the min/max optimization isn't documented in the
 manual (apart from release notes), so I didn't include any doc changes
 in this patch.

 I don't see a patch attached to this email, so either you forgot to
 attach it, or the list ate it somehow.

I forgot to attach it, sorry. Here it is.

Regards,
Marti
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 26966d2..29e262f 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -203,9 +203,9 @@ DATA(insert ( 2828	float8_regr_accum	float8_covar_samp		0	1022	{0,0,0,0,0,0} )
 DATA(insert ( 2829	float8_regr_accum	float8_corr0	1022	{0,0,0,0,0,0} ));
 
 /* boolean-and and boolean-or */
-DATA(insert ( 2517	booland_statefunc	-			0	16		_null_ ));
-DATA(insert ( 2518	boolor_statefunc	-			0	16		_null_ ));
-DATA(insert ( 2519	booland_statefunc	-			0	16		_null_ ));
+DATA(insert ( 2517	booland_statefunc	-			58	16		_null_ ));
+DATA(insert ( 2518	boolor_statefunc	-			59	16		_null_ ));
+DATA(insert ( 2519	booland_statefunc	-			58	16		_null_ ));
 
 /* bitwise integer */
 DATA(insert ( 2236 int2and		  -	0	21		_null_ ));
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 51ab6e5..7d245d2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -774,16 +774,19 @@ WHERE a.aggfnoid = p.oid AND
 (0 rows)
 
 -- Cross-check aggsortop (if present) against pg_operator.
--- We expect to find only  for min and  for max.
+-- We expect to find  for min/bool_and/every and  for max/bool_or
 SELECT DISTINCT proname, oprname
 FROM pg_operator AS o, pg_aggregate AS a, pg_proc AS p
 WHERE a.aggfnoid = p.oid AND a.aggsortop = o.oid
 ORDER BY 1;
- proname | oprname 
--+-
- max | 
- min | 
-(2 rows)
+ proname  | oprname 
+--+-
+ bool_and | 
+ bool_or  | 
+ every| 
+ max  | 
+ min  | 
+(5 rows)
 
 -- Check datatypes match
 SELECT a.aggfnoid::oid, o.oid
@@ -816,11 +819,14 @@ WHERE a.aggfnoid = p.oid AND a.aggsortop = o.oid AND
 amopopr = o.oid AND
 amopmethod = (SELECT oid FROM pg_am WHERE amname = 'btree')
 ORDER BY 1, 2;
- proname | oprname | amopstrategy 
--+-+--
- max ||5
- min ||1
-(2 rows)
+ proname  | oprname | amopstrategy 
+--+-+--
+ bool_and ||1
+ bool_or  ||5
+ every||1
+ max  ||5
+ min  ||1
+(5 rows)
 
 -- Check that there are not aggregates with the same name and different
 -- numbers of arguments.  While not technically wrong, we have a project policy
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index e29148f..bcd8544 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -626,7 +626,7 @@ WHERE a.aggfnoid = p.oid AND
 NOT binary_coercible(p.proargtypes[0], a.aggtranstype);
 
 -- Cross-check aggsortop (if present) against pg_operator.
--- We expect to find only  for min and  for max.
+-- We expect to find  for min/bool_and/every and  for max/bool_or
 
 SELECT DISTINCT proname, oprname
 FROM pg_operator AS o, pg_aggregate AS a, pg_proc AS p

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-22 Thread Robert Haas
On Mon, Dec 19, 2011 at 1:02 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Dec 12, you said It also strikes me that anything
 that is based on augmenting the walsender/walreceiver protocol leaves
 anyone who is using WAL shipping out in the cold.  I'm not clear from
 the comments you or Simon have made how important you think that use
 case still is.

 Not wanting to leave anyone out in the cold, I proposed something to
 enhance file based replication also.

Fair enough.

I am still of the opinion that we ought to commit some version of the
pg_last_xact_insert_timestamp patch.  I accept that patch isn't going
to solve every problem, but I still think it's worth having.  If one
of these other solutions comes along and turns out to work great,
that's fine, too; but I don't think any of them are so compelling that
we can credibly say that pg_last_xact_insert_timestamp is useless or
obsolete.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wishlist: parameterizable types

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 11:34 AM, Robert Haas wrote:

On Thu, Dec 22, 2011 at 1:04 AM, Joey Adamsjoeyadams3.14...@gmail.com  wrote:

What I'm wondering is: how complex would it be to add such a feature
to PostgreSQL's type system?

Very.

It's been discussed before, although I can't tell you the subject
lines of the threads off the top of my head.  The basic problem is
that there is a ton of code that assumes that a type is fully
identified by an OID, and all of that code would need to be changed.



And frankly I'd expect there to be serious resistance to such a change. 
It's not just a question of how hard it would be, but how advisable.


We have an extremely rich and uniquely extensible type system, and we've 
added a number of somewhat generic features over the years. I'd need a 
lot of convincing that we need fully generic types (and I came from the 
Ada world many years ago which was the inspiration for many such features).


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Thu, Dec 22, 2011 at 18:41, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp ma...@juffo.org wrote:
 PS: It seems that the min/max optimization isn't documented in the
 manual (apart from release notes), so I didn't include any doc changes
 in this patch.

 I don't see a patch attached to this email, so either you forgot to
 attach it, or the list ate it somehow.

 I forgot to attach it, sorry. Here it is.

Nice.  It doesn't get much simpler than that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wishlist: parameterizable types

2011-12-22 Thread Stephen Frost
* Joey Adams (joeyadams3.14...@gmail.com) wrote:
 This may be ambitious, but it'd be neat if PostgreSQL supported
 parameterizable types.  For example, suppose a contrib module defines
 a pair type.  It could be used as follows:

Have you looked at what the PostGIS folks are doing..?  We do have
support for custom typmod's...  I'm not sure if that helps you at all,
but figured I'd mention it.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Indexes with duplicate columns

2011-12-22 Thread Tom Lane
In bug #6351 it's pointed out that this fails unexpectedly:

CREATE TABLE tab (id SERIAL, a INTEGER, b INTEGER);
CREATE INDEX tab123 ON tab (a, b, a);
SELECT a, b FROM tab WHERE a = 0 AND b = 1;
ERROR:  btree index keys must be ordered by attribute

I looked into this a bit and find that indxpath.c is producing a correct
list of index quals, a = 0 AND b = 1 AND a = 0, but then createplan.c
messes it up because it matches both copies of a to the first possible
match in the index's column list.  So what the executor gets looks like
{INDEX_VAR 1} = 0 AND {INDEX_VAR 2} = 1 AND {INDEX_VAR 1} = 0
and there's a btree implementation restriction that makes it spit up on
that.

Now, what the planner did here is more wrong than just tripping over a
btree limitation.  The actual use-case for an index mentioning the same
table column more than once, IMO, would be if the index columns had
different operator classes and thus supported different sets of
indexable operators.  Matching the second instance of a to the first
index column could then be asking the index to implement an operator
that that column doesn't support.  So we need to fix the planner not
btree.

The representation that indxpath.c actually emits is correct and
unambiguous, because it produces a list of sublists of indexquals,
one sublist per index column.  So in that format it's clear which
index column each clause is meant for.  But then we flatten the list
in create_index_path, and so createplan.c has to reverse-engineer
the matching, and it really can't get it right unless we're willing
to make it recheck operator matching not only column matching.

The obvious solution seems to be to preserve the list-of-sublists
representation through to createplan.c.  Then that code can just verify
the expected column match instead of searching, so it might actually be
a tad faster.  However such a change is going to affect cost_index and
all the amcostestimate functions, and likely break any planner plugins
that do anything with IndexPaths.  So it's going to be a bit invasive
and I'm inclined to think it's not back-patchable.

My inclination is to fix it that way in HEAD and just leave it as a
known unsupported case in the back branches.  This case does not seem
important enough to justify trying to find a solution that would work
without a path representation change.

Comments, other ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Robert Haas
On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstan and...@dunslane.net wrote:
 The simple solution I originally proposed to put a line feed and some space
 before every target field in pretty print mode. This is a two line patch.
 The downsides are a) maybe not everyone will like the change and b) it will
 produce superfluous newlines, e.g. before CASE expressions.

With regard to (a), specifically, you won't like this change if your
column names are things like bob and sam, because you'll burn
through an inordinate amount of vertical space.  On the other hand, I
agree that you'll probably find it a big improvement if they are
things like nc.nspname::information_schema.sql_identifier as
udt_schema.

It has always seemed to me that a sensible strategy here would be to
try to produce output that looks good in 80 columns, on the assumption
that (1) everyone has at least that much horizontal space to play with
and (2) people who do won't necessarily mind it if we don't use all of
that horizontal space when pretty-printing SQL.

However, it is possible that I am living in the dark ages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 12:18 PM, Robert Haas wrote:

On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstanand...@dunslane.net  wrote:

The simple solution I originally proposed to put a line feed and some space
before every target field in pretty print mode. This is a two line patch.
The downsides are a) maybe not everyone will like the change and b) it will
produce superfluous newlines, e.g. before CASE expressions.

With regard to (a), specifically, you won't like this change if your
column names are things like bob and sam, because you'll burn
through an inordinate amount of vertical space.  On the other hand, I
agree that you'll probably find it a big improvement if they are
things like nc.nspname::information_schema.sql_identifier as
udt_schema.

It has always seemed to me that a sensible strategy here would be to
try to produce output that looks good in 80 columns, on the assumption
that (1) everyone has at least that much horizontal space to play with
and (2) people who do won't necessarily mind it if we don't use all of
CASE
 WHEN nco.nspname IS NOT NULL THEN current_database()
 ELSE NULL::name
 END::information_schema.sql_identifier AS collation_catalog, 
nco.nspname::information_schema.sql_identifier AS collation_schema,

I'd still be much happier with a


that horizontal space when pretty-printing SQL.

However, it is possible that I am living in the dark ages.


I've looked at that, and it was discussed a bit previously. It's more 
complex because it requires that we keep track of (or calculate) where 
we are on the line, and where we would be after adding the new text. But 
I could live with it if others could. It would certainly be an 
improvement. But that's still going to leave things that will look odd, 
such as a CASE expression's final line being filled up to 80 columns 
with more fields. My preference would be for a newline as a visual clue 
to where each column spec starts.


I used to try to be conservative about vertical space, but in these days 
of scrollbars and screens not limited to 24 or 25 lines (Yes, kids, 
that's what some of us grew up with) that seems a bit old-fashioned. One 
of the arguments for KR style braces used to be that it used one less 
line than BSD style. Maybe that made some sense 20 or so years ago, 
although I didn't really buy it even then, but it makes very little 
sense to me now.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Real-life range datasets

2011-12-22 Thread David E. Wheeler
On Dec 22, 2011, at 7:48 AM, Oleg Bartunov wrote:

 we have pgfoundry project http://pgfoundry.org/projects/dbsamples/.
 Since your sample database is very important (for me also), I suggest to use
 this site.

Or PGXN.

  http://pgxn.org/

You can register an account to upload extensions like you describe here:

  http://manager.pgxn.org/account/register

Details on what’s required to distribute on PGXN are here:

  http://manager.pgxn.org/howto

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 12:52 PM, Andrew Dunstan wrote:



On 12/22/2011 12:18 PM, Robert Haas wrote:
On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstanand...@dunslane.net  
wrote:
The simple solution I originally proposed to put a line feed and 
some space
before every target field in pretty print mode. This is a two line 
patch.
The downsides are a) maybe not everyone will like the change and b) 
it will

produce superfluous newlines, e.g. before CASE expressions.

With regard to (a), specifically, you won't like this change if your
column names are things like bob and sam, because you'll burn
through an inordinate amount of vertical space.  On the other hand, I
agree that you'll probably find it a big improvement if they are
things like nc.nspname::information_schema.sql_identifier as
udt_schema.

It has always seemed to me that a sensible strategy here would be to
try to produce output that looks good in 80 columns, on the assumption
that (1) everyone has at least that much horizontal space to play with
and (2) people who do won't necessarily mind it if we don't use all 
ofCASE

 WHEN nco.nspname IS NOT NULL THEN current_database()
 ELSE NULL::name
 END::information_schema.sql_identifier AS collation_catalog, 
nco.nspname::information_schema.sql_identifier AS collation_schema,


I'd still be much happier with a


that horizontal space when pretty-printing SQL.

However, it is possible that I am living in the dark ages.


I've looked at that, and it was discussed a bit previously. It's more 
complex because it requires that we keep track of (or calculate) where 
we are on the line, and where we would be after adding the new text. 
But I could live with it if others could. It would certainly be an 
improvement. But that's still going to leave things that will look 
odd, such as a CASE expression's final line being filled up to 80 
columns with more fields. My preference would be for a newline as a 
visual clue to where each column spec starts.


I used to try to be conservative about vertical space, but in these 
days of scrollbars and screens not limited to 24 or 25 lines (Yes, 
kids, that's what some of us grew up with) that seems a bit 
old-fashioned. One of the arguments for KR style braces used to be 
that it used one less line than BSD style. Maybe that made some sense 
20 or so years ago, although I didn't really buy it even then, but it 
makes very little sense to me now.





Wow. My editor or my fingers seem to have screwed up here. Something got 
CP'd into the middle of the quoted text that shouldn't have. *sigh*


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping : in .pgpass - code or docs bug?

2011-12-22 Thread Robert Haas
On Mon, Dec 19, 2011 at 2:42 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yeah, it is a pretty old bug -- this code was clearly written by some
 rookie that didn't know very well what he was doing.

Hey, I got that joke.

I fixed this in master.  I'm not going to bother with anything else
unless someone else feels motivated to make it happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 12:52 PM, Andrew Dunstan and...@dunslane.net wrote:
 I used to try to be conservative about vertical space, but in these days of
 scrollbars and screens not limited to 24 or 25 lines (Yes, kids, that's what
 some of us grew up with) that seems a bit old-fashioned. One of the
 arguments for KR style braces used to be that it used one less line than
 BSD style. Maybe that made some sense 20 or so years ago, although I didn't
 really buy it even then, but it makes very little sense to me now.

I *still* spend a lot of time editing in a 25x80 window.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstan and...@dunslane.net wrote:
 The simple solution I originally proposed to put a line feed and some space
 before every target field in pretty print mode. This is a two line patch.
 The downsides are a) maybe not everyone will like the change and b) it will
 produce superfluous newlines, e.g. before CASE expressions.

 With regard to (a), specifically, you won't like this change if your
 column names are things like bob and sam, because you'll burn
 through an inordinate amount of vertical space.

Yeah.  I'm not exactly thrilled with (b), either, if it's a consequence
of a change whose only excuse for living is to make the output look
nicer.  Random extra newlines don't look nicer to me.

 It has always seemed to me that a sensible strategy here would be to
 try to produce output that looks good in 80 columns,

Maybe, though I fear it might complicate the ruleutils code a bit.
You'd probably have to build the output for a column first and then
see how long it is before deciding whether to insert a newline.

In short, I don't mind trying to make this work better, but I think it
will take more work than a two-line patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-22 Thread Robert Haas
On Sat, Dec 17, 2011 at 3:52 PM, Phil Sorber p...@omniti.com wrote:
 Is SnapshotAny the snapshot I should be using? It seems to get the
 correct results. I can drop a table and I get NULL. Then after a
 vacuumdb it returns an error.

The suggestion on the original thread was to use SnapshotDirty or the
caller's MVCC snapshot.  SnapshotAny doesn't seem like a good idea.
We don't want to include rows from, say, transactions that have rolled
back.  And SnapshotAny rows can stick around for a long time - weeks,
months.  If a table is only occasionally updated, autovacuum may not
process it for a long time.

On the other hand, I think using SnapshotDirty is going to work out so
well: isn't there a race condition?  The caller's MVCC snapshot seems
better, but I still see pitfalls.  Who is to say that the immediate
caller has the same snapshot as the scan?  I'm thinking of cases
involving things like CTEs and nested function calls.

I'm wondering if we oughta just return NULL and be done with it.   If
SELECT select 1241241241::regclass doesn't choke, I'm not sure select
pg_relation_size(1241241241) ought to either.  It's not like the user
is going to see NULL and have no clue that the relation wasn't found.
 At the very worst they might think it's empty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm wondering if we oughta just return NULL and be done with it.

+1.  There are multiple precedents for that sort of response, which we
introduced exactly so that SELECT some_function(oid) FROM some_catalog
wouldn't fail just because one of the rows had gotten deleted by the
time the scan got to it.  I don't think it's necessary for the
relation-size functions to be any smarter.  Indeed, I'd assumed that's
all that Phil's patch did, since I'd not looked closer till just now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 11:16 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Jignesh Shah jks...@gmail.com wrote:

 When we use Doublewrite with checksums, we can safely disable
 full_page_write causing a HUGE reduction to the WAL traffic
 without loss of reliatbility due to a write fault since there are
 two writes always. (Implementation detail discussable).

 The always there surprised me.  It seemed to me that we only need
 to do the double-write where we currently do full page writes or
 unlogged writes.  In thinking about your message, it finally struck

Currently PG only does full page write for the first change that makes
the dirty after a checkpoint. This scheme works when all changes are
relative to that first page so when checkpoint write fails then it can
recreate the page by using the full page write + all the delta changes
from WAL.

In the double write implementation, every checkpoint write is double
writed, so if the first doublewrite page write fails then then
original page is not corrupted and if the second write to the actual
datapage fails, then one can recover it from the earlier write. Now
while it seems that there are 2X double writes during checkpoint is
true. I can argue that there are the same 2 X writes right now except
1X of the write goes to WAL DURING TRANSACTION COMMIT.  Also since
doublewrite is generally written in its own file it is essentially
sequential so it doesnt have the same write latencies as the actual
checkpoint write. So if you look at the net amount of the writes it is
the same. For unlogged tables even if you do doublewrite it is not
much of a penalty while that may not be logging before in the WAL.  By
doing the double write for it, it is still safe and gives resilience
for those tables to it eventhough it is not required. The net result
is that the underlying page is never irrecoverable due to failed
writes.


 me that this might require a WAL record to be written with the
 checksum (or CRC; whatever we use).  Still, writing a WAL record
 with a CRC prior to the page write would be less data than the full
 page.  Doing double-writes instead for situations without the torn
 page risk seems likely to be a net performance loss, although I have
 no benchmarks to back that up (not having a double-write
 implementation to test).  And if we can get correct behavior without
 doing either (the checksum WAL record or the double-write), that's
 got to be a clear win.

I am not sure why would one want to write the checksum to WAL.
As for the double writes, infact there is not a net loss because
(a) the writes to the doublewrite area is sequential the writes calls
are relatively very fast and infact does not cause any latency
increase to any transactions unlike full_page_write.
(b) It can be moved to a different location to have no stress on the
default tablespace if you are worried about that spindle handling 2X
writes which is mitigated in full_page_writes if you move pg_xlogs to
different spindle

and my own tests supports that the net result is almost as fast as
full_page_write=off  but not the same due to the extra write  (which
gives you the desired reliability) but way better than
full_page_write=on.


Regards,
Jignesh






 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-22 Thread Phil Sorber
On Thu, Dec 22, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm wondering if we oughta just return NULL and be done with it.

 +1.  There are multiple precedents for that sort of response, which we
 introduced exactly so that SELECT some_function(oid) FROM some_catalog
 wouldn't fail just because one of the rows had gotten deleted by the
 time the scan got to it.  I don't think it's necessary for the
 relation-size functions to be any smarter.  Indeed, I'd assumed that's
 all that Phil's patch did, since I'd not looked closer till just now.

                        regards, tom lane

Here it is without the checking for recently dead. If it can't open
the relation it simply returns NULL.
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 2ee5966..134bc03
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
*** pg_relation_size(PG_FUNCTION_ARGS)
*** 289,295 
  	Relation	rel;
  	int64		size;
  
! 	rel = relation_open(relOid, AccessShareLock);
  
  	size = calculate_relation_size((rel-rd_node), rel-rd_backend,
  			  forkname_to_number(text_to_cstring(forkName)));
--- 289,298 
  	Relation	rel;
  	int64		size;
  
! 	rel = try_relation_open(relOid, AccessShareLock);
! 
! 	if (rel == NULL)
! 		PG_RETURN_NULL();
  
  	size = calculate_relation_size((rel-rd_node), rel-rd_backend,
  			  forkname_to_number(text_to_cstring(forkName)));
*** calculate_toast_table_size(Oid toastreli
*** 339,352 
   * those won't have attached toast tables, but they can have multiple forks.
   */
  static int64
! calculate_table_size(Oid relOid)
  {
  	int64		size = 0;
- 	Relation	rel;
  	ForkNumber	forkNum;
  
- 	rel = relation_open(relOid, AccessShareLock);
- 
  	/*
  	 * heap size, including FSM and VM
  	 */
--- 342,352 
   * those won't have attached toast tables, but they can have multiple forks.
   */
  static int64
! calculate_table_size(Relation rel)
  {
  	int64		size = 0;
  	ForkNumber	forkNum;
  
  	/*
  	 * heap size, including FSM and VM
  	 */
*** calculate_table_size(Oid relOid)
*** 360,367 
  	if (OidIsValid(rel-rd_rel-reltoastrelid))
  		size += calculate_toast_table_size(rel-rd_rel-reltoastrelid);
  
- 	relation_close(rel, AccessShareLock);
- 
  	return size;
  }
  
--- 360,365 
*** calculate_table_size(Oid relOid)
*** 371,382 
   * Can be applied safely to an index, but you'll just get zero.
   */
  static int64
! calculate_indexes_size(Oid relOid)
  {
  	int64		size = 0;
- 	Relation	rel;
- 
- 	rel = relation_open(relOid, AccessShareLock);
  
  	/*
  	 * Aggregate all indexes on the given relation
--- 369,377 
   * Can be applied safely to an index, but you'll just get zero.
   */
  static int64
! calculate_indexes_size(Relation rel)
  {
  	int64		size = 0;
  
  	/*
  	 * Aggregate all indexes on the given relation
*** calculate_indexes_size(Oid relOid)
*** 405,412 
  		list_free(index_oids);
  	}
  
- 	relation_close(rel, AccessShareLock);
- 
  	return size;
  }
  
--- 400,405 
*** Datum
*** 414,429 
  pg_table_size(PG_FUNCTION_ARGS)
  {
  	Oid			relOid = PG_GETARG_OID(0);
  
! 	PG_RETURN_INT64(calculate_table_size(relOid));
  }
  
  Datum
  pg_indexes_size(PG_FUNCTION_ARGS)
  {
  	Oid			relOid = PG_GETARG_OID(0);
  
! 	PG_RETURN_INT64(calculate_indexes_size(relOid));
  }
  
  /*
--- 407,444 
  pg_table_size(PG_FUNCTION_ARGS)
  {
  	Oid			relOid = PG_GETARG_OID(0);
+ 	Relation	rel;
+ 	int64		size;
  
! 	rel = try_relation_open(relOid, AccessShareLock);
! 
! 	if (rel == NULL)
! 		PG_RETURN_NULL();
! 
! 	size = calculate_table_size(rel);
! 
! 	relation_close(rel, AccessShareLock);
! 
! 	PG_RETURN_INT64(size);
  }
  
  Datum
  pg_indexes_size(PG_FUNCTION_ARGS)
  {
  	Oid			relOid = PG_GETARG_OID(0);
+ 	Relation	rel;
+ 	int64		size;
  
! 	rel = try_relation_open(relOid, AccessShareLock);
! 
! 	if (rel == NULL)
! 		PG_RETURN_NULL();
! 
! 	size = calculate_indexes_size(rel);
! 
! 	relation_close(rel, AccessShareLock);
! 
! 	PG_RETURN_INT64(size);
  }
  
  /*
*** pg_indexes_size(PG_FUNCTION_ARGS)
*** 431,437 
   *	including heap data, index data, toast data, FSM, VM.
   */
  static int64
! calculate_total_relation_size(Oid Relid)
  {
  	int64		size;
  
--- 446,452 
   *	including heap data, index data, toast data, FSM, VM.
   */
  static int64
! calculate_total_relation_size(Relation Rel)
  {
  	int64		size;
  
*** calculate_total_relation_size(Oid Relid)
*** 439,450 
  	 * Aggregate the table size, this includes size of the heap, toast and
  	 * toast index with free space and visibility map
  	 */
! 	size = calculate_table_size(Relid);
  
  	/*
  	 * Add size of all attached indexes as well
  	 */
! 	size += calculate_indexes_size(Relid);
  
  	return size;
  }
--- 454,465 
  	 * Aggregate the table size, this includes size of the 

Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 01:05 PM, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstanand...@dunslane.net  wrote:

The simple solution I originally proposed to put a line feed and some space
before every target field in pretty print mode. This is a two line patch.
The downsides are a) maybe not everyone will like the change and b) it will
produce superfluous newlines, e.g. before CASE expressions.

With regard to (a), specifically, you won't like this change if your
column names are things like bob and sam, because you'll burn
through an inordinate amount of vertical space.

Yeah.  I'm not exactly thrilled with (b), either, if it's a consequence
of a change whose only excuse for living is to make the output look
nicer.  Random extra newlines don't look nicer to me.


It has always seemed to me that a sensible strategy here would be to
try to produce output that looks good in 80 columns,

Maybe, though I fear it might complicate the ruleutils code a bit.
You'd probably have to build the output for a column first and then
see how long it is before deciding whether to insert a newline.

In short, I don't mind trying to make this work better, but I think it
will take more work than a two-line patch.




OK. Let me whip something up. I had already come to the conclusion you 
did about how best to do this.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] disable prompting by default in createuser

2011-12-22 Thread Peter Eisentraut
On lör, 2011-11-26 at 01:28 +0200, Peter Eisentraut wrote:
 I propose that we change createuser so that it does not prompt for
 anything by default.  We can arrange options so that you can get prompts
 for whatever is missing, but by default, a call to createuser should
 just run CREATE USER with default options.  The fact that createuser
 issues prompts is always annoying when you create setup scripts, because
 you have to be careful to specify all the necessary options, and they
 are inconsistent and different between versions (although the last
 change about that was a long time ago), and the whole behavior seems
 contrary to the behavior of all other utilities.  I don't think there'd
 be a real loss by not prompting by default; after all, we don't really
 want to encourage users to create more superusers, do we?  Comments?

Patch attached.  I'll add it to the next commitfest.

diff --git i/doc/src/sgml/ref/createuser.sgml w/doc/src/sgml/ref/createuser.sgml
index 4cbfd69..90b0fd4 100644
--- i/doc/src/sgml/ref/createuser.sgml
+++ w/doc/src/sgml/ref/createuser.sgml
@@ -102,7 +102,8 @@ PostgreSQL documentation
   termoption--no-createdb//term
   listitem
para
-The new user will not be allowed to create databases.
+The new user will not be allowed to create databases.  This is the
+default.
/para
   /listitem
  /varlistentry
@@ -153,6 +154,19 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption--interactive//term
+  listitem
+   para
+Prompt for the user name if none is specified on the command line, and
+also prompt for whichever of the
+options option-d/option/option-D/option, option-r/option/option-R/option, option-s/option/option-S/option
+is not specified on the command line.  (This was the default behavior
+up to PostgreSQL 9.1.)
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-l//term
   termoption--login//term
   listitem
@@ -215,7 +229,8 @@ PostgreSQL documentation
   termoption--no-createrole//term
   listitem
para
-The new user will not be allowed to create new roles.
+The new user will not be allowed to create new roles.  This is the
+default.
/para
   /listitem
  /varlistentry
@@ -235,7 +250,7 @@ PostgreSQL documentation
   termoption--no-superuser//term
   listitem
para
-The new user will not be a superuser.
+The new user will not be a superuser.  This is the default.
/para
   /listitem
  /varlistentry
@@ -287,11 +302,6 @@ PostgreSQL documentation
   /para
 
   para
-   You will be prompted for a name and other missing information if it
-   is not specified on the command line.
-  /para
-
-  para
applicationcreateuser/application also accepts the following
command-line arguments for connection parameters:
 
@@ -422,6 +432,14 @@ PostgreSQL documentation
 server:
 screen
 prompt$ /promptuserinputcreateuser joe/userinput
+/screen
+   /para
+
+   para
+To create a user literaljoe/literal on the default database
+server with prompting for some additional attributes:
+screen
+prompt$ /promptuserinputcreateuser --interactive joe/userinput
 computeroutputShall the new role be a superuser? (y/n) /computeroutputuserinputn/userinput
 computeroutputShall the new role be allowed to create databases? (y/n) /computeroutputuserinputn/userinput
 computeroutputShall the new role be allowed to create more new roles? (y/n) /computeroutputuserinputn/userinput
@@ -430,7 +448,7 @@ PostgreSQL documentation
 
para
 To create the same user literaljoe/literal using the
-server on host literaleden/, port 5000, avoiding the prompts and
+server on host literaleden/, port 5000, with attributes explicitly specified,
 taking a look at the underlying command:
 screen
 prompt$ /promptuserinputcreateuser -h eden -p 5000 -S -D -R -e joe/userinput
diff --git i/doc/src/sgml/ref/dropuser.sgml w/doc/src/sgml/ref/dropuser.sgml
index 724fe40..bc6feaf 100644
--- i/doc/src/sgml/ref/dropuser.sgml
+++ w/doc/src/sgml/ref/dropuser.sgml
@@ -62,7 +62,9 @@ PostgreSQL documentation
   listitem
para
 Specifies the name of the productnamePostgreSQL/productname user to be removed.
-You will be prompted for a name if none is specified on the command line.
+You will be prompted for a name if none is specified on the command
+line and the option-i/option/option--interactive/option option
+is used.
/para
   /listitem
  /varlistentry
@@ -83,7 +85,8 @@ PostgreSQL documentation
   termoption--interactive//term
   listitem
para
-Prompt for confirmation before actually removing the user.
+Prompt for confirmation before actually removing the user, and prompt
+for the user name if none is specified on the command line.

Re: [HACKERS] Escaping : in .pgpass - code or docs bug?

2011-12-22 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue dic 22 15:03:36 -0300 2011:
 
 On Mon, Dec 19, 2011 at 2:42 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Yeah, it is a pretty old bug -- this code was clearly written by some
  rookie that didn't know very well what he was doing.
 
 Hey, I got that joke.
 
 I fixed this in master.  I'm not going to bother with anything else
 unless someone else feels motivated to make it happen.

Thanks! :-)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-22 Thread Alvaro Herrera

Excerpts from Nikhil Sontakke's message of mar dic 20 12:03:33 -0300 2011:

  Apologies, I did not check this particular scenario.
 
  I guess, here, we should not allow merging of the inherited constraint
  into an only constraint. Because that breaks the semantics for only
  constraints. If this sounds ok, I can whip up a patch for the same.
 
 
 PFA, patch which does just this.
 
 postgres=# alter table a add constraint chk check (ff1  0);
 ERROR:  constraint chk for relation b is an ONLY constraint. Cannot
 merge

I think the basic idea is fine -- the constraint certainly cannot be
merged, and we can't continue without merging it because of the
inconsistency it would create.

The error message is wrong though.  I suggest

ERROR:  constraint name %s on relation %s conflicts with non-inherited 
constraint on relation %s
HINT:  Specify a different constraint name.

The errmsg seems a bit long though -- anybody has a better suggestion?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 2:43 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Nikhil Sontakke's message of mar dic 20 12:03:33 -0300 2011:

  Apologies, I did not check this particular scenario.
 
  I guess, here, we should not allow merging of the inherited constraint
  into an only constraint. Because that breaks the semantics for only
  constraints. If this sounds ok, I can whip up a patch for the same.
 
 
 PFA, patch which does just this.

 postgres=# alter table a add constraint chk check (ff1  0);
 ERROR:  constraint chk for relation b is an ONLY constraint. Cannot
 merge

 I think the basic idea is fine -- the constraint certainly cannot be
 merged, and we can't continue without merging it because of the
 inconsistency it would create.

I was just looking at this as well.  There is at least one other
problem.  Consider:

rhaas=# create table a (ff1 int, constraint chk check (ff1  0));
CREATE TABLE
rhaas=# create table b (ff1 int, constraint chk check (ff1  0));
CREATE TABLE
rhaas=# alter table b inherit a;
ALTER TABLE

This needs to fail if chk is an only constraint, but it doesn't,
even with this patch.

I think that part of the problem here is fuzzy thinking about the
meaning of the word ONLY in ALTER TABLE ONLY b.  The word ONLY
there is really supposed to mean that the operation is performed on b
but not on its descendents; but that's not what you're doing: you're
really performing a different operation.  In theory, for a table with
no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to
be identical, but here they are not.  I think that's probably bad.

Another manifestation of this problem is that there's no way to add an
ONLY constraint in a CREATE TABLE command.  I think that's bad, too:
it should be possible to declare any constraint at table creation
time, and if the ONLY were part of the constraint definition rather
than part of the target-table specification, that would work fine.  As
it is, it doesn't.

I am tempted to say we should revert this and rethink.  I don't
believe we are only a small patch away from finding all the bugs here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote:
 In the double write implementation, every checkpoint write is double
 writed,

Unless I'm quite thoroughly confused, which is possible, the double
write will need to happen the first time a buffer is written following
each checkpoint.  Which might mean the next checkpoint, but it could
also be sooner if the background writer kicks in, or in the worst case
a buffer has to do its own write.

Furthermore, we can't *actually* write any pages until they are
written *and fsync'd* to the double-write buffer.  So the penalty for
the background writer failing to do the right thing is going to go up
enormously.  Think about VACUUM or COPY IN, using a ring buffer and
kicking out its own pages.  Every time it evicts a page, it is going
to have to doublewrite the buffer, fsync it, and then write it for
real.  That is going to make PostgreSQL 6.5 look like a speed demon.
The background writer or checkpointer can conceivably dump a bunch of
pages into the doublewrite area and then fsync the whole thing in
bulk, but a backend that needs to evict a page only wants one page, so
it's pretty much screwed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 2:02 PM, Phil Sorber p...@omniti.com wrote:
 On Thu, Dec 22, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm wondering if we oughta just return NULL and be done with it.

 +1.  There are multiple precedents for that sort of response, which we
 introduced exactly so that SELECT some_function(oid) FROM some_catalog
 wouldn't fail just because one of the rows had gotten deleted by the
 time the scan got to it.  I don't think it's necessary for the
 relation-size functions to be any smarter.  Indeed, I'd assumed that's
 all that Phil's patch did, since I'd not looked closer till just now.

 Here it is without the checking for recently dead. If it can't open
 the relation it simply returns NULL.

I think we probably ought to make pg_database_size() and
pg_tablespace_size() behave similarly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote:
 In the double write implementation, every checkpoint write is double
 writed,

 Unless I'm quite thoroughly confused, which is possible, the double
 write will need to happen the first time a buffer is written following
 each checkpoint.  Which might mean the next checkpoint, but it could
 also be sooner if the background writer kicks in, or in the worst case
 a buffer has to do its own write.



Logically the double write happens for every checkpoint write and it
gets fsynced.. Implementation wise you can do a chunk of those pages
like we do in sets of pages and sync them once and yes it still
performs better than full_page_write. As long as you compare with
full_page_write=on, the scheme is always much better. If you compare
it with performance of full_page_write=off it is slightly less but
then you lose the the reliability. So for performance testers like me
who always turn off  full_page_write anyway during my benchmark run
will not see any impact. However for folks in production who are
rightly scared to turn off full_page_write will have an ability to
increase performance without being scared on failed writes.

 Furthermore, we can't *actually* write any pages until they are
 written *and fsync'd* to the double-write buffer.  So the penalty for
 the background writer failing to do the right thing is going to go up
 enormously.  Think about VACUUM or COPY IN, using a ring buffer and
 kicking out its own pages.  Every time it evicts a page, it is going
 to have to doublewrite the buffer, fsync it, and then write it for
 real.  That is going to make PostgreSQL 6.5 look like a speed demon.

Like I said implementation detail wise it depends on how many such
pages do you sync simultaneously and the real tests prove that it is
actually much faster than one expects.

 The background writer or checkpointer can conceivably dump a bunch of
 pages into the doublewrite area and then fsync the whole thing in
 bulk, but a backend that needs to evict a page only wants one page, so
 it's pretty much screwed.


Generally what point you pay the penalty is a trade off.. I would
argue that you are making me pay for the full page write for my first
transaction commit  that changes the page which I can never avoid and
the result is I get a transaction response time that is unacceptable
since the deviation of a similar transaction which modifies the page
already made dirty is lot less. However I can avoid page evictions if
I select a bigger bufferpool (not necessarily that I want to do that
but I have a choice without losing reliability).

Regards,
Jignesh



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] atexit vs. on_exit

2011-12-22 Thread Peter Eisentraut
Are there any supported platforms that have only on_exit() but not
atexit()?

It would be good in some cases to rewrite custom arrangements such as
exit_nicely() or die_horribly() using those exit hooks, but supporting
both through ifdefs makes the code more ugly than before.  I dug around
the buildfarm logs and documentation around the internet, and it appears
that all non-ancient platforms support atexit().  The use of on_exit()
in PostgreSQL source code appears to come from the original Postgres95
code.

I would like to get rid of on_exit().  Alternatively, atexit() could be
implemented in libpgport as a wrapper around on_exit().



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LibreOffice driver 3: pg_config and linking statically to libpq

2011-12-22 Thread Robert Haas
On Tue, Dec 13, 2011 at 6:05 AM, Lionel Elie Mamane lio...@mamane.lu wrote:
  * On the one hand, it gives too much since LIBS is filtered to only a
   subset in src/interface/libpq/Makefile.

What is it excluding that it ought to include?  I am not quite clear
on why that code is like that, but it appears to be intended that the
filter-list include everything that might be needed.

  * On the other hand, it does not give enough, since it does not give
   the value of LDAP_LIBS_FE anywhere, nor say if it is necessary to
   add PTHREAD_LIBS.

 This is not an immediate problem for LibreOffice: I export the value
 of SHLIB_EXPORTS from src/interface/libpq/Makefile as a Makefile
 snippet that gets imported in our build system or (on Microsoft
 Windows) we just proceeded by trial and error until the link
 succeeds.

 However, I suggest it would be cleaner to give that kind of
 information in pg_config, so that one can basically do something like:

  $LINK_COMMAND -lpq $(pg_config --libpq-dep-libs)

 and have it work automatically. You could also provide a pq.pc file
 for pkgconfig, which would give nice nearly-automatic integration for
 projects using e.g. autoconf and friends.

Care to propose a patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix Leaky View Problem

2011-12-22 Thread Robert Haas
On Mon, Dec 12, 2011 at 12:00 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The v8.option-2 add checks around examine_simple_variable, and
 prevent to reference statistical data, if Var node tries to reference
 relation with security-barrier attribute.

I adopted this approach, and committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Simon, does it sound like I understand your proposal?

Yes, thanks for restating.

 Now, on to the separate-but-related topic of double-write.  That
 absolutely requires some form of checksum or CRC to detect torn
 pages, in order for the technique to work at all.  Adding a CRC
 without double-write would work fine if you have a storage stack
 which prevents torn pages in the file system or hardware driver.  If
 you don't have that, it could create a damaged page indication after
 a hardware or OS crash, although I suspect that would be the
 exception, not the typical case.  Given all that, and the fact that
 it would be cleaner to deal with these as two separate patches, it
 seems the CRC patch should go in first.  (And, if this is headed for
 9.2, *very soon*, so there is time for the double-write patch to
 follow.)

It could work that way, but I seriously doubt that a technique only
mentioned in dispatches one month before the last CF is likely to
become trustable code within one month. We've been discussing CRCs for
years, so assembling the puzzle seems much easier, when all the parts
are available.

 It seems to me that the full_page_writes GUC could become an
 enumeration, with off having the current meaning, wal meaning
 what on now does, and double meaning that the new double-write
 technique would be used.  (It doesn't seem to make any sense to do
 both at the same time.)  I don't think we need a separate GUC to tell
 us *what* to protect against torn pages -- if not off we should
 always protect the first write of a page after checkpoint, and if
 double and write_page_crc (or whatever we call it) is on, then we
 protect hint-bit-only writes.  I think.  I can see room to argue that
 with CRCs on we should do a full-page write to the WAL for a
 hint-bit-only change, or that we should add another GUC to control
 when we do this.

 I'm going to take a shot at writing a patch for background hinting
 over the holidays, which I think has benefit alone but also boosts
 the value of these patches, since it would reduce double-write
 activity otherwise needed to prevent spurious error when using CRCs.

I would suggest you examine how to have an array of N bgwriters, then
just slot the code for hinting into the bgwriter. That way a bgwriter
can set hints, calc CRC and write pages in sequence on a particular
block. The hinting needs to be synchronised with the writing to give
good benefit.

If we want page checksums in 9.2, I'll need your help, so the hinting
may be a sidetrack.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] atexit vs. on_exit

2011-12-22 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 Are there any supported platforms that have only on_exit() but not
 atexit()?

Trolling the git logs shows that configure's support for on_exit was
added here:

commit df247b821d811abcfc0ac707e1a3af9dfce474c9
Author: Tatsuo Ishii is...@postgresql.org
Date:   Tue Feb 27 08:13:31 2001 +

Massive commits for SunOS4 port.

SunOS4 is definitely pretty dead.  According to the info I have,
atexit() is required by ANSI C (that would be C89 not C99), so it
certainly seems unlikely that we'd ever see platforms without it
anymore.

A slightly different issue is that atexit is less functional than
on_exit (the former does not tell the callbacks the exit() code).
There is one place in our code where it would be nice to have that.
But since we're preferring atexit where available, we seem to be
getting along fine without it anyway.

+1 for simplifying.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 It could work that way, but I seriously doubt that a technique
 only mentioned in dispatches one month before the last CF is
 likely to become trustable code within one month. We've been
 discussing CRCs for years, so assembling the puzzle seems much
 easier, when all the parts are available.
 
Well, double-write has been mentioned on the lists for years,
sometimes in conjunction with CRCs, and I get the impression this is
one of those things which has been worked on out of the community's
view for a while and is just being posted now.  That's often not
viewed as the ideal way for development to proceed from a community
standpoint, but it's been done before with some degree of success --
particularly when a feature has been bikeshedded to a standstill. 
;-)
 
 I would suggest you examine how to have an array of N bgwriters,
 then just slot the code for hinting into the bgwriter. That way a
 bgwriter can set hints, calc CRC and write pages in sequence on a
 particular block. The hinting needs to be synchronised with the
 writing to give good benefit.
 
I'll think about that.  I see pros and cons, and I'll have to see
how those balance out after I mull them over.
 
 If we want page checksums in 9.2, I'll need your help, so the
 hinting may be a sidetrack.
 
Well, VMware posted the initial patch, and that was the first I
heard of it.  I just had some off-line discussions with them after
they posted it.  Perhaps the engineers who wrote it should take your
comments as a review an post a modified patch?  It didn't seem like
that pot of broth needed any more cooks, so I was going to go work
on a nice dessert; but I agree that any way I can help along the
either of the $Subject patches should take priority.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typed hstore proposal

2011-12-22 Thread Johann 'Myrkraverk' Oskarsson
Andrew Dunstan and...@dunslane.net writes:

 On 12/22/2011 10:44 AM, Tom Lane wrote:
 Johann 'Myrkraverk' Oskarssonjoh...@2ndquadrant.com writes:
 I mean to create a typed hstore, called tstore for now.
 Um ... what is the point of this, exactly?  From what I've seen,
 most applications for hstore are pretty happy with the fact that
 hstore is only weakly typed, and if an entry *is* an integer, or a
 float, or whatever else, it's not hard to cast to and from text as
 needed.  So this idea looks like a solution in search of a problem,
 which is going to need a whole lot more work before it even gets to
 the point of being as useful as hstore.  It's not for instance
 apparent what is the use of iterating over only entries that were
 supplied as integers --- there is no reason to think that they're
 related just because of that.

No, which is why each( tstore ) returns the whole thing, as tvalues.

Also, it can be quite helpful in some cases to ask what is the type
of this key rather than cast to an integer and hope it works.

Typed in this case means each value is typed.  There are (as yet) no
reason nor facility to add type constraints for a given key within the
implementation itself.

 Yeah, the thing that's annoying in some cases about hstore is not
 that it's untyped but that it's flat.

As I already pointed out (well, implied) is that it's trivial to allow
tstore to be recursive.

 That's what a JSON type would buy us, a lightweight tree structured
 type, and moreover one that is widely and increasingly used and well
 understood.

While I have not meant tstore to be a JSON type, it's not hard to make
it fully compatible with JSON by providing such input/output
functions.

Here it's noteworthy that I mean tstore to be richer than JSON.  Some
type ideas:

* boolean

* bytea

* float4

* float8

* int2

* int4

* int8

* null (or some provision to have unvalued keys)

Not all of the above may be supported by the first implementation.
Notably bytea may be skipped.

And later on, possibly some subset or all of the time types:

* timestamp with time zone

* timestamp without time zone

* interval

* date

* time with time zone

* time without time zone

For JSON compatibility and tree structures:

* tstore (nested)

* tvalue arrays (or another way to have JSON compatible arrays)

It might also be worthwhile to have a specific JSON type, possibly
using the same underlying structure just with different input/output
functions.


-- 
   Johann Oskarssonhttp://www.2ndquadrant.com/|[]
   PostgreSQL Development, 24x7 Support, Training and Services  --+--
  |
   Blog: http://my.opera.com/myrkraverk/blog/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 02:17 PM, Andrew Dunstan wrote:



On 12/22/2011 01:05 PM, Tom Lane wrote:

Maybe, though I fear it might complicate the ruleutils code a bit.
You'd probably have to build the output for a column first and then
see how long it is before deciding whether to insert a newline.

In short, I don't mind trying to make this work better, but I think it
will take more work than a two-line patch.




OK. Let me whip something up. I had already come to the conclusion you 
did about how best to do this.





Here's a WIP patch. At least it's fairly localized, but as expected it's 
rather more than 2 lines :-). Sample output:


regression=# \pset format unaligned
Output format is unaligned.
regression=# select pg_get_viewdef('shoelace',true);
pg_get_viewdef
 SELECT s.sl_name, s.sl_avail, s.sl_color, s.sl_len, s.sl_unit,
s.sl_len * u.un_fact AS sl_len_cm
   FROM shoelace_data s, unit u
  WHERE s.sl_unit = u.un_name;
(1 row)
regression=#


I just had an idea. We could invent a flavor of pg_get_viewdef() that 
took an int as the second param, that would be the wrap width. For the 
boolean case as above, 80 would be implied. 0 would mean always wrap. 
psql could be made to default to the window width, or maybe window width 
- 1, but we could provide a psql setting that would override it.


cheers

andrew




 




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reprise: pretty print viewdefs

2011-12-22 Thread Andrew Dunstan



On 12/22/2011 06:11 PM, Andrew Dunstan wrote:



On 12/22/2011 02:17 PM, Andrew Dunstan wrote:



On 12/22/2011 01:05 PM, Tom Lane wrote:

Maybe, though I fear it might complicate the ruleutils code a bit.
You'd probably have to build the output for a column first and then
see how long it is before deciding whether to insert a newline.

In short, I don't mind trying to make this work better, but I think it
will take more work than a two-line patch.




OK. Let me whip something up. I had already come to the conclusion 
you did about how best to do this.





Here's a WIP patch. At least it's fairly localized, but as expected 
it's rather more than 2 lines :-). Sample output:


regression=# \pset format unaligned
Output format is unaligned.
regression=# select pg_get_viewdef('shoelace',true);
pg_get_viewdef
 SELECT s.sl_name, s.sl_avail, s.sl_color, s.sl_len, s.sl_unit,
s.sl_len * u.un_fact AS sl_len_cm
   FROM shoelace_data s, unit u
  WHERE s.sl_unit = u.un_name;
(1 row)
regression=#


I just had an idea. We could invent a flavor of pg_get_viewdef() that 
took an int as the second param, that would be the wrap width. For the 
boolean case as above, 80 would be implied. 0 would mean always wrap. 
psql could be made to default to the window width, or maybe window 
width - 1, but we could provide a psql setting that would override it.





This time with patch.

cheers

andrew

*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***
*** 3013,3018  get_target_list(List *targetList, deparse_context *context,
--- 3013,3019 
  	char	   *sep;
  	int			colno;
  	ListCell   *l;
+ 	boollast_was_multiline = false;
  
  	sep =  ;
  	colno = 0;
***
*** 3021,3028  get_target_list(List *targetList, deparse_context *context,
  		TargetEntry *tle = (TargetEntry *) lfirst(l);
  		char	   *colname;
  		char	   *attname;
  
! 		if (tle-resjunk)
  			continue;			/* ignore junk entries */
  
  		appendStringInfoString(buf, sep);
--- 3022,3033 
  		TargetEntry *tle = (TargetEntry *) lfirst(l);
  		char	   *colname;
  		char	   *attname;
+ 		StringInfoData targetbuf;
+ 		int leading_nl_pos =  -1;
+ 		char   *trailing_nl;
+ 		int i;
  
!  		if (tle-resjunk)
  			continue;			/* ignore junk entries */
  
  		appendStringInfoString(buf, sep);
***
*** 3030,3035  get_target_list(List *targetList, deparse_context *context,
--- 3035,3049 
  		colno++;
  
  		/*
+ 		 * Put the new field spec into targetbuf so we can
+ 		 * decide after we've got it whether or not it needs
+ 		 * to go on a new line.
+ 		 */
+ 
+ 		initStringInfo(targetbuf);
+ 		context-buf = targetbuf;
+ 
+ 		/*
  		 * We special-case Var nodes rather than using get_rule_expr. This is
  		 * needed because get_rule_expr will display a whole-row Var as
  		 * foo.*, which is the preferred notation in most contexts, but at
***
*** 3063,3070  get_target_list(List *targetList, deparse_context *context,
  		if (colname)			/* resname could be NULL */
  		{
  			if (attname == NULL || strcmp(attname, colname) != 0)
! appendStringInfo(buf,  AS %s, quote_identifier(colname));
  		}
  	}
  }
  
--- 3077,3139 
  		if (colname)			/* resname could be NULL */
  		{
  			if (attname == NULL || strcmp(attname, colname) != 0)
! appendStringInfo(targetbuf,  AS %s, quote_identifier(colname));
! 		}
! 
! 		/* Does the new field start with whitespace plus a new line? */
! 
! 		for (i=0; i  targetbuf.len; i++)
! 		{
! 			if (targetbuf.data[i] == '\n')
! 			{
! leading_nl_pos = i;
! break;
! 			}
! 			if (targetbuf.data[i]  ' ')
! break;
! 		}
! 
! 		context-buf = buf;
! 
! 		/* Locate the start of the current  line in the buffer */
! 
! 		trailing_nl = (strrchr(buf-data,'\n'));
! 		if (trailing_nl == NULL)
! 			trailing_nl = buf-data;
! 		else 
! 			trailing_nl++;
! 
! 		/*
! 		 * If the field we're adding already has a newline
! 		 * don't add one.  Otherwise, add one, plus some 
! 		 * indentation. if either the new field would 
! 		 * cause an 80 column overflow or the last field 
! 		 * had a multiline spec.
! 		 */
! 
! 		if (leading_nl_pos == -1  
! 			((strlen(trailing_nl) + strlen(targetbuf.data)  79) ||
! 			 last_was_multiline))
! 		{
! 			appendContextKeyword(context, , -PRETTYINDENT_STD, 
!  PRETTYINDENT_STD, PRETTYINDENT_VAR);
  		}
+ 
+ 		/* Add the new field */
+ 
+ 		appendStringInfoString(buf, targetbuf.data);
+ 
+ 
+ 		/* Keep track of this fieLd's status for next interation */
+ 
+ 		if (strchr(targetbuf.data + leading_nl_pos + 1,'\n') != NULL)
+ 			last_was_multiline = true;
+ 		else
+ 			last_was_multiline = false;
+ 
+ 		/* cleanup */
+ 
+ 		pfree (targetbuf.data);
  	}
  }
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-22 Thread Phil Sorber
On Thu, Dec 22, 2011 at 3:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 22, 2011 at 2:02 PM, Phil Sorber p...@omniti.com wrote:
 On Thu, Dec 22, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm wondering if we oughta just return NULL and be done with it.

 +1.  There are multiple precedents for that sort of response, which we
 introduced exactly so that SELECT some_function(oid) FROM some_catalog
 wouldn't fail just because one of the rows had gotten deleted by the
 time the scan got to it.  I don't think it's necessary for the
 relation-size functions to be any smarter.  Indeed, I'd assumed that's
 all that Phil's patch did, since I'd not looked closer till just now.

 Here it is without the checking for recently dead. If it can't open
 the relation it simply returns NULL.

 I think we probably ought to make pg_database_size() and
 pg_tablespace_size() behave similarly.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

Changes added.
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 2ee5966..8c30dc4
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
*** calculate_database_size(Oid dbOid)
*** 120,131 
  
  	FreeDir(dirdesc);
  
- 	/* Complain if we found no trace of the DB at all */
- 	if (!totalsize)
- 		ereport(ERROR,
- (ERRCODE_UNDEFINED_DATABASE,
-  errmsg(database with OID %u does not exist, dbOid)));
- 
  	return totalsize;
  }
  
--- 120,125 
*** Datum
*** 133,140 
  pg_database_size_oid(PG_FUNCTION_ARGS)
  {
  	Oid			dbOid = PG_GETARG_OID(0);
  
! 	PG_RETURN_INT64(calculate_database_size(dbOid));
  }
  
  Datum
--- 127,140 
  pg_database_size_oid(PG_FUNCTION_ARGS)
  {
  	Oid			dbOid = PG_GETARG_OID(0);
+ 	int64		size;
  
! 	size = calculate_database_size(dbOid);
! 
! 	if (!size)
! 		PG_RETURN_NULL();
! 
! 	PG_RETURN_INT64(size);
  }
  
  Datum
*** pg_database_size_name(PG_FUNCTION_ARGS)
*** 142,149 
  {
  	Name		dbName = PG_GETARG_NAME(0);
  	Oid			dbOid = get_database_oid(NameStr(*dbName), false);
  
! 	PG_RETURN_INT64(calculate_database_size(dbOid));
  }
  
  
--- 142,155 
  {
  	Name		dbName = PG_GETARG_NAME(0);
  	Oid			dbOid = get_database_oid(NameStr(*dbName), false);
+ 	int64		size;
  
! 	size = calculate_database_size(dbOid);
! 
! 	if (!size)
! 		PG_RETURN_NULL();
! 
! 	PG_RETURN_INT64(size);
  }
  
  
*** calculate_tablespace_size(Oid tblspcOid)
*** 184,193 
  	dirdesc = AllocateDir(tblspcPath);
  
  	if (!dirdesc)
! 		ereport(ERROR,
! (errcode_for_file_access(),
!  errmsg(could not open tablespace directory \%s\: %m,
! 		tblspcPath)));
  
  	while ((direntry = ReadDir(dirdesc, tblspcPath)) != NULL)
  	{
--- 190,196 
  	dirdesc = AllocateDir(tblspcPath);
  
  	if (!dirdesc)
! 		return -1;
  
  	while ((direntry = ReadDir(dirdesc, tblspcPath)) != NULL)
  	{
*** Datum
*** 226,233 
  pg_tablespace_size_oid(PG_FUNCTION_ARGS)
  {
  	Oid			tblspcOid = PG_GETARG_OID(0);
  
! 	PG_RETURN_INT64(calculate_tablespace_size(tblspcOid));
  }
  
  Datum
--- 229,242 
  pg_tablespace_size_oid(PG_FUNCTION_ARGS)
  {
  	Oid			tblspcOid = PG_GETARG_OID(0);
+ 	int64		size;
  
! 	size = calculate_tablespace_size(tblspcOid);
! 
! 	if (size  0)
! 		PG_RETURN_NULL();
! 
! 	PG_RETURN_INT64(size);
  }
  
  Datum
*** pg_tablespace_size_name(PG_FUNCTION_ARGS
*** 235,242 
  {
  	Name		tblspcName = PG_GETARG_NAME(0);
  	Oid			tblspcOid = get_tablespace_oid(NameStr(*tblspcName), false);
  
! 	PG_RETURN_INT64(calculate_tablespace_size(tblspcOid));
  }
  
  
--- 244,257 
  {
  	Name		tblspcName = PG_GETARG_NAME(0);
  	Oid			tblspcOid = get_tablespace_oid(NameStr(*tblspcName), false);
+ 	int64		size;
  
! 	size = calculate_tablespace_size(tblspcOid);
! 
! 	if (size  0)
! 		PG_RETURN_NULL();
! 
! 	PG_RETURN_INT64(size);
  }
  
  
*** pg_relation_size(PG_FUNCTION_ARGS)
*** 289,295 
  	Relation	rel;
  	int64		size;
  
! 	rel = relation_open(relOid, AccessShareLock);
  
  	size = calculate_relation_size((rel-rd_node), rel-rd_backend,
  			  forkname_to_number(text_to_cstring(forkName)));
--- 304,313 
  	Relation	rel;
  	int64		size;
  
! 	rel = try_relation_open(relOid, AccessShareLock);
! 
! 	if (rel == NULL)
! 		PG_RETURN_NULL();
  
  	size = calculate_relation_size((rel-rd_node), rel-rd_backend,
  			  forkname_to_number(text_to_cstring(forkName)));
*** calculate_toast_table_size(Oid toastreli
*** 339,352 
   * those won't have attached toast tables, but they can have multiple forks.
   */
  static int64
! calculate_table_size(Oid relOid)
  {
  	int64		size = 0;
- 	Relation	rel;
  	ForkNumber	forkNum;
  
- 	rel = relation_open(relOid, AccessShareLock);
- 
  	/*
  	 * heap size, including FSM and VM
  	 */
--- 357,367 
   * those won't have attached toast 

[HACKERS] WIP: explain analyze with 'rows' but not timing

2011-12-22 Thread Tomas Vondra
Hi all,

most of the time I use auto_explain, all I need is duration of the query
and the plan with estimates and actual row counts. And it would be handy
to be able to catch long running queries with estimates that are
significantly off (say 100x lower or higher compared to actual row numbers).

The gettimeofday() calls are not exactly cheap in some cases, so why to
pay that price when all you need is the number of rows?

The patch attached does this:

1) adds INSTRUMENT_ROWS, a new InstrumentOption

   - counts rows without timing (no gettimeofday() callse)
   - if you want timing info, use INSTRUMENT_TIMER

2) adds new option TIMING to EXPLAIN, i.e.

EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

3) adds auto_explain.log_rows_only (false by default)

   - if you set this to 'true', then the instrumentation will just
 count rows, without calling gettimeofday()


It works quite well, except one tiny issue - when the log_rows_only is
set to false (so that auto_explain requires timing), it silently
overrides the EXPLAIN option. So that even when the user explicitly
disables timing (TIMING OFF), it's overwritten and the explain collects
the timing data.

I could probably hide the timing info, but that'd make the issue even
worse (the user would not notice that the timing was actually enabled).

Maybe the right thing would be to explicitly disable timing for queries
executed with EXPLAIN (TIMING OFF). Any other ideas how to make this
work reasonably?

The patch does not implement any checks (how far is the estimate from
the reality) yet, that'll be round two.

regards
Tomas
diff --git a/contrib/auto_explain/auto_explain.c 
b/contrib/auto_explain/auto_explain.c
index b320698..34015b8 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -23,6 +23,7 @@ static intauto_explain_log_min_duration = -1; /* msec or 
-1 */
 static bool auto_explain_log_analyze = false;
 static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_rows_only = false;
 static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
 
@@ -133,6 +134,17 @@ _PG_init(void)
 NULL,
 NULL);
 
+   DefineCustomBoolVariable(auto_explain.log_rows_only,
+Do not collect timing 
data, just row number.,
+NULL,
+
auto_explain_log_rows_only,
+false,
+PGC_SUSET,
+0,
+NULL,
+NULL,
+NULL);
+
EmitWarningsOnPlaceholders(auto_explain);
 
/* Install hooks. */
@@ -170,7 +182,11 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* Enable per-node instrumentation iff log_analyze is required. 
*/
if (auto_explain_log_analyze  (eflags  
EXEC_FLAG_EXPLAIN_ONLY) == 0)
{
-   queryDesc-instrument_options |= INSTRUMENT_TIMER;
+   if (auto_explain_log_rows_only)
+   queryDesc-instrument_options |= 
INSTRUMENT_ROWS;
+   else
+   queryDesc-instrument_options |= 
(INSTRUMENT_TIMER | INSTRUMENT_ROWS);
+
if (auto_explain_log_buffers)
queryDesc-instrument_options |= 
INSTRUMENT_BUFFERS;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e38de5c..4e3f343 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -133,6 +133,8 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString,
es.costs = defGetBoolean(opt);
else if (strcmp(opt-defname, buffers) == 0)
es.buffers = defGetBoolean(opt);
+   else if (strcmp(opt-defname, timing) == 0)
+   es.timing = defGetBoolean(opt);
else if (strcmp(opt-defname, format) == 0)
{
char   *p = defGetString(opt);
@@ -229,6 +231,7 @@ ExplainInitState(ExplainState *es)
/* Set default options. */
memset(es, 0, sizeof(ExplainState));
es-costs = true;
+   es-timing = true;
/* Prepare output buffer. */
es-str = makeStringInfo();
 }
@@ -355,8 +358,11 @@ ExplainOnePlan(PlannedStmt *plannedstmt, ExplainState *es,
int eflags;
int   

Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-22 Thread Nikhil Sontakke
Hi,


 There is at least one other
 problem.  Consider:

 rhaas=# create table a (ff1 int, constraint chk check (ff1  0));
 CREATE TABLE
 rhaas=# create table b (ff1 int, constraint chk check (ff1  0));
 CREATE TABLE
 rhaas=# alter table b inherit a;
 ALTER TABLE

 This needs to fail if chk is an only constraint, but it doesn't,
 even with this patch.


As you rightly point out, this is because we cannot specify ONLY
constraints inside a CREATE TABLE as of now.


 I think that part of the problem here is fuzzy thinking about the
 meaning of the word ONLY in ALTER TABLE ONLY b.  The word ONLY
 there is really supposed to mean that the operation is performed on b
 but not on its descendents; but that's not what you're doing: you're
 really performing a different operation.  In theory, for a table with
 no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to
 be identical, but here they are not.  I think that's probably bad.


ONLY has inheritance based connotations for present/future children. ALTER
ONLY combined with ADD is honored better now with this patch IMO (atleast
for constraints I think).


 Another manifestation of this problem is that there's no way to add an
 ONLY constraint in a CREATE TABLE command.  I think that's bad, too:
 it should be possible to declare any constraint at table creation
 time, and if the ONLY were part of the constraint definition rather
 than part of the target-table specification, that would work fine.  As
 it is, it doesn't.


Well, the above was thought about during the original discussion and
eventually we felt that CREATE TABLE already has other issues as well, so
not having this done as part of creating a table was considered acceptable:

http://postgresql.1045698.n5.nabble.com/Check-constraints-on-partition-parents-only-tt464.html#a4647144


 I am tempted to say we should revert this and rethink.  I don't
 believe we are only a small patch away from finding all the bugs here.


Sure, if we all think that CREATE TABLE should support ONLY CONSTRAINT type
of syntax, then +1 for reverting this and a subsequent revised submission.

Regards,
Nikhils


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-22 Thread Nikhil Sontakke
And yeah, certainly there's a bug as you point out.

postgres=# create table a1 (ff1 int, constraint chk check (ff1  0));
postgres=# create table b1 (ff1 int);
postgres=# alter table only b1 add constraint chk check (ff1  0);
postgres=# alter table b1 inherit a1;

The last command should have refused to inherit.

Regards,
Nikhils

On Fri, Dec 23, 2011 at 8:55 AM, Nikhil Sontakke nikkh...@gmail.com wrote:

 Hi,


 There is at least one other
 problem.  Consider:

 rhaas=# create table a (ff1 int, constraint chk check (ff1  0));
 CREATE TABLE
 rhaas=# create table b (ff1 int, constraint chk check (ff1  0));
 CREATE TABLE
 rhaas=# alter table b inherit a;
 ALTER TABLE

 This needs to fail if chk is an only constraint, but it doesn't,
 even with this patch.


 As you rightly point out, this is because we cannot specify ONLY
 constraints inside a CREATE TABLE as of now.


 I think that part of the problem here is fuzzy thinking about the
 meaning of the word ONLY in ALTER TABLE ONLY b.  The word ONLY
 there is really supposed to mean that the operation is performed on b
 but not on its descendents; but that's not what you're doing: you're
 really performing a different operation.  In theory, for a table with
 no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to
 be identical, but here they are not.  I think that's probably bad.


 ONLY has inheritance based connotations for present/future children. ALTER
 ONLY combined with ADD is honored better now with this patch IMO (atleast
 for constraints I think).


 Another manifestation of this problem is that there's no way to add an
 ONLY constraint in a CREATE TABLE command.  I think that's bad, too:
 it should be possible to declare any constraint at table creation
 time, and if the ONLY were part of the constraint definition rather
 than part of the target-table specification, that would work fine.  As
 it is, it doesn't.


 Well, the above was thought about during the original discussion and
 eventually we felt that CREATE TABLE already has other issues as well, so
 not having this done as part of creating a table was considered acceptable:


 http://postgresql.1045698.n5.nabble.com/Check-constraints-on-partition-parents-only-tt464.html#a4647144


 I am tempted to say we should revert this and rethink.  I don't
 believe we are only a small patch away from finding all the bugs here.


 Sure, if we all think that CREATE TABLE should support ONLY CONSTRAINT
 type of syntax, then +1 for reverting this and a subsequent revised
 submission.

 Regards,
 Nikhils



Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-22 Thread Alvaro Herrera

Excerpts from Nikhil Sontakke's message of vie dic 23 00:25:26 -0300 2011:
 Hi,
 
  There is at least one other
  problem.  Consider:
 
  rhaas=# create table a (ff1 int, constraint chk check (ff1  0));
  CREATE TABLE
  rhaas=# create table b (ff1 int, constraint chk check (ff1  0));
  CREATE TABLE
  rhaas=# alter table b inherit a;
  ALTER TABLE
 
  This needs to fail if chk is an only constraint, but it doesn't,
  even with this patch.

 As you rightly point out, this is because we cannot specify ONLY
 constraints inside a CREATE TABLE as of now.

No, it's not related to that.  You could do it even without that, by
creating a table then adding an ONLY constraint and only later doing the
ALTER TABLE / INHERIT.  The problem we need to fix here is that this
command needs to check that there are no unmergeable constraints; and if
there are any, error out.

  I think that part of the problem here is fuzzy thinking about the
  meaning of the word ONLY in ALTER TABLE ONLY b.  The word ONLY
  there is really supposed to mean that the operation is performed on b
  but not on its descendents; but that's not what you're doing: you're
  really performing a different operation.  In theory, for a table with
  no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to
  be identical, but here they are not.  I think that's probably bad.
 

 ONLY has inheritance based connotations for present/future children. ALTER
 ONLY combined with ADD is honored better now with this patch IMO (atleast
 for constraints I think).

I agree with Robert that this usage of ALTER TABLE ONLY is slightly
different from other usages of the same command, but I disagree that
this means that we need another command to do what we want to do here.
IOW, I prefer to keep the syntax we have.

  I am tempted to say we should revert this and rethink.  I don't
  believe we are only a small patch away from finding all the bugs here.

 Sure, if we all think that CREATE TABLE should support ONLY CONSTRAINT type
 of syntax, then +1 for reverting this and a subsequent revised submission.

I don't think this is a given ...  In fact, IMO if we're only two or
three fixes away from having it all nice and consistent, I think
reverting is not necessary.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-22 Thread Nikhil Sontakke
 I don't think this is a given ...  In fact, IMO if we're only two or
 three fixes away from having it all nice and consistent, I think
 reverting is not necessary.


FWIW, here's a quick fix for the issue that Robert pointed out. Again it's
a variation of the first issue that he reported. We have two functions
which try to merge existing constraints:

MergeWithExistingConstraint() in heap.c
MergeConstraintsIntoExisting() in tablecmds.c

Nice names indeed :)

I have also tried to change the error message as per Alvaro's suggestions.
I will really try to see if we have other issues. Really cannot have Robert
reporting all the bugs and getting annoyed, but there are lotsa variations
possible with inheritance..

Regards,
Nikhils


only_constraint_no_merge_v2.0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: explain analyze with 'rows' but not timing

2011-12-22 Thread Pavel Stehule
Hello

2011/12/23 Tomas Vondra t...@fuzzy.cz:
 Hi all,

 most of the time I use auto_explain, all I need is duration of the query
 and the plan with estimates and actual row counts. And it would be handy
 to be able to catch long running queries with estimates that are
 significantly off (say 100x lower or higher compared to actual row numbers).

 The gettimeofday() calls are not exactly cheap in some cases, so why to
 pay that price when all you need is the number of rows?

 The patch attached does this:

 1) adds INSTRUMENT_ROWS, a new InstrumentOption

   - counts rows without timing (no gettimeofday() callse)
   - if you want timing info, use INSTRUMENT_TIMER

 2) adds new option TIMING to EXPLAIN, i.e.

    EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

 3) adds auto_explain.log_rows_only (false by default)

   - if you set this to 'true', then the instrumentation will just
     count rows, without calling gettimeofday()


 It works quite well, except one tiny issue - when the log_rows_only is
 set to false (so that auto_explain requires timing), it silently
 overrides the EXPLAIN option. So that even when the user explicitly
 disables timing (TIMING OFF), it's overwritten and the explain collects
 the timing data.

 I could probably hide the timing info, but that'd make the issue even
 worse (the user would not notice that the timing was actually enabled).

 Maybe the right thing would be to explicitly disable timing for queries
 executed with EXPLAIN (TIMING OFF). Any other ideas how to make this
 work reasonably?

 The patch does not implement any checks (how far is the estimate from
 the reality) yet, that'll be round two.

It is interesting idea - but maybe we can have a have a different
metric than time - this is very unstable quantity - mainly on
production overloaded servers.

It is good idea - we need a tool for bad statistic searching that is
relative cheap.

Regards

Pavel


 regards
 Tomas


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow substitute allocators for PGresult.

2011-12-22 Thread Kyotaro HORIGUCHI
Hello, thank you for taking the time for comment.

At Wed, 21 Dec 2011 11:09:59 -0500, Robert Haas robertmh...@gmail.com wrote
 I find the names of the functions added here to be quite
 confusing and would suggest renaming them.  I expected
 PQgetAsCstring to do something similar to PQgetvalue, but the
 code is completely different,

To be honest, I've also felt that kind of perplexity. If the
problem is simply of the naming, I can propose the another name
PQreadAttValue... This is not so good too...

 But...

 and even after reading the documentation I still don't
 understand what that function is supposed to be used for.  Why
 as cstring?  What would the other option be?

Is it a problem of the poor description? Or about the raison
d'être of the function?

The immediate cause of the existence of the function is that
getAnotherTuple internally stores the field values of the tuples
sent from the server, in the form of PGresAttValue, and I have
found only one route to store a tuple into TupleStore is
BuildeTupleFromCStrings() to tupelstore_puttuple() which is
dblink does in materializeResult(), and of cource C-string is the
most natural format in C program, and I have hesitated to modify
execTuples.c, and I wanted to hide the details of PGresAttValue.

Assuming that the values are passed as PGresAttValue* is given
(for the reasons of performance and the extent of the
modification), the adding tuple functions should get the value
from the struct. This can be done in two ways from the view of
authority (`encapsulation', in other words) and convenience, one
is with the knowledge of the structure, and the other is without
it. PQgetAsCstring is the latter approach. (And it is
inconsistent with the fact that the definition of PGresAttValue
is moved into lipq-fe.h from libpq-int.h. The details of the
structure should be hidden like PGresult in this approach).

But it is not obvious that the choice is better than the another
one. If we consider that PGresAttValue is too simple and stable
to hide the details, PQgetAsCString will be taken off and the
problem will go out. PGresAttValue needs documentation in this
case.

I prefer to handle PGresAttValue directly if no problem.

 I also don't think the add tuple terminology is particularly good.
 It's not obvious from the name that what you're doing is overriding
 the way memory is allocated and results are stored.

This phrase is taken from pqAddTuple() in fe-exec.c at first and
have not been changed after the function is integrated with other
functions.

I propose tuple storage handler for the alternative.

- typedef void *(*addTupleFunction)(...);
+ typedef void *(*tupleStorageHandler)(...);

- typedef enum { ADDTUP_*, } AddTupFunc;
+ typedef enum { TSHANDLER_*, } TSHandlerCommand;

- void *PQgetAddTupleParam(...);
+ void *PQgetTupleStrageHandlerContext(...);

- void PQregisterTupleAdder(...);
+ void PQregisterTupleStoreHandler(...);

- addTupleFunction PGresult.addTupleFunc;
+ tupleStorageHandler PGresult.tupleStorageHandlerFunc;

- void *PGresult.addTuleFuncParam;
+ void *PGresult.tupleStorageHandlerContext;

- char *PGresult.addTuleFuncErrMes;
+ void *PGresult.tupelStrageHandlerErrMes;

 Also, what about the problem Tom mentioned here?
 
 http://archives.postgresql.org/message-id/1042.1321123...@sss.pgh.pa.us

The plan that simply replace malloc's with something like
palloc's is abandoned for the narrow scope.

dblink-plus copies whole PGresult into TupleStore in order to
avoid making orphaned memory on SIGINT. The resource owner
mechanism is principally applicable to that but practically hard
for the reason that current implementation without radically
modification couldn't accept it.. In addition to that, dblink
also does same thing for maybe the same reason with dblink-plus
and another reason as far as I heard.

Whatever the reason is, both dblink and dblink-plus do the same
thing that could lower the performance than expected.

If TupleStore(TupleDesc) is preferable to PGresult for in-backend
use and oridinary(client-use) libpq users can handle only
PGresult, the mechanism like this patch would be reuired to
maintain the compatibility, I think. To the contrary, if there is
no significant reason to use TupleStore in backend use - it
implies that existing mechanisms like resource owner can save the
backend inexpensively from possible inconvenience caused by using
PGresult storage in backends - PGresult should be used as it is.

I think TupleStore prepared to be used in backend is preferable
for the usage and don't want to get data making detour via
PGresult.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers